Over the past week I had an issue where one of my Cisco 5545 with a Sourcefire module went down and failed and I couldn’t get it restarted. When I looked at the console for the SFR Module I saw disk errors and I opened a ticket with Cisco to have them look at it. One thing that I found appalling was the quality of Cisco TAC engineers has dropped dramatically. I spent more time on the phone with these guys not knowing what to do and and my showing them commands that I had just googled and what needed to be done. If these guys are supposed to be the experts in the device and technology I am not impressed. Especially since Cisco keeps raising my rates and the quality seems to get lower rather than better.
Back to the issue:
The Cisco 5545 Sourcefire unit has two SSDs in a Raid 1 configuration, so you would think that if one failed the other would take over. At least that is what I thought, however it turns out that both of the SSDs had failed and there was no notification at all on the unit itself or in the logs as to one of the units being bad, let alone both of them. The only place I found it was running the “sh raid” command on the terminal. After seeing the failure of this unit, I then went through the rest of my 5545s with Sourcefire modules and found two others that had a failed drive and there was no warning, no error lights on the drive or the firewall itself. I had to run the command to find the issue.
Here is what a healthy raid set looks like:
/dev/md0:
Version : 1.2
Creation Time : Fri Feb 19 18:27:16 2021
Raid Level : raid1
Array Size : 124969216 (119.18 GiB 127.97 GB)
Used Dev Size : 124969216 (119.18 GiB 127.97 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Jun 2 20:05:01 2021
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : ciscoasa:0 (local to host ciscoasa)
UUID : 244baa9a:b6e40506:f7384510:fcb42706
Events : 12123
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
2 8 16 1 active sync /dev/sdb
Here’s what an unhealthy raid set looks like:
/dev/md0:
Version : 1.2
Creation Time : Mon May 25 12:42:13 2020
Raid Level : raid1
Array Size : 124969216 (119.18 GiB 127.97 GB)
Used Dev Size : 124969216 (119.18 GiB 127.97 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Jun 2 20:01:32 2021
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : ciscoasa:0 (local to host ciscoasa)
UUID : 0ed2ca7c:260897dd:f183f4bf:c0f15bfb
Events : 12258234
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
2 0 0 2 removed
2 8 16 - faulty /dev/sdb
I can’t believe there is no logs or notifications, Solarwinds didn’t pick up the hardware issues. You would think there would be some sort of notification sent out or the HD light on the firewall would turn orange, what a novel concept to notify people of a failed hardware item before it causes major problems.
So if you run any of the modules in your ASA firewalls, make sure to check the raid level and that the drives are in a healthy state, if not get the ticket open with TAC. Where they can give you such brilliant ideas as move the faulty drive to another ASA, or swap the drives(which causes the firewall to crash, so don’t do it).