Check your Raid: Cisco ASA and Sourcefire

Over the past week I had an issue where one of my Cisco 5545 with a Sourcefire module went down and failed and I couldn’t get it restarted. When I looked at the console for the SFR Module I saw disk errors and I opened a ticket with Cisco to have them look at it. One thing that I found appalling was the quality of Cisco TAC engineers has dropped dramatically. I spent more time on the phone with these guys not knowing what to do and and my showing them commands that I had just googled and what needed to be done. If these guys are supposed to be the experts in the device and technology I am not impressed. Especially since Cisco keeps raising my rates and the quality seems to get lower rather than better.

Back to the issue:

The Cisco 5545 Sourcefire unit has two SSDs in a Raid 1 configuration, so you would think that if one failed the other would take over. At least that is what I thought, however it turns out that both of the SSDs had failed and there was no notification at all on the unit itself or in the logs as to one of the units being bad, let alone both of them. The only place I found it was running the “sh raid” command on the terminal. After seeing the failure of this unit, I then went through the rest of my 5545s with Sourcefire modules and found two others that had a failed drive and there was no warning, no error lights on the drive or the firewall itself. I had to run the command to find the issue.

Here is what a healthy raid set looks like:

/dev/md0:
Version : 1.2
Creation Time : Fri Feb 19 18:27:16 2021
Raid Level : raid1
Array Size : 124969216 (119.18 GiB 127.97 GB)
Used Dev Size : 124969216 (119.18 GiB 127.97 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed Jun  2 20:05:01 2021
      State : clean

Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

   Name : ciscoasa:0  (local to host ciscoasa)
   UUID : 244baa9a:b6e40506:f7384510:fcb42706
 Events : 12123

Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
2 8 16 1 active sync /dev/sdb

Here’s what an unhealthy raid set looks like:

/dev/md0:
        Version : 1.2
  Creation Time : Mon May 25 12:42:13 2020
     Raid Level : raid1
     Array Size : 124969216 (119.18 GiB 127.97 GB)
  Used Dev Size : 124969216 (119.18 GiB 127.97 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Jun  2 20:01:32 2021
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           Name : ciscoasa:0  (local to host ciscoasa)
           UUID : 0ed2ca7c:260897dd:f183f4bf:c0f15bfb
         Events : 12258234

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       2       0        0        2      removed

       2       8       16        -      faulty   /dev/sdb

I can’t believe there is no logs or notifications, Solarwinds didn’t pick up the hardware issues. You would think there would be some sort of notification sent out or the HD light on the firewall would turn orange, what a novel concept to notify people of a failed hardware item before it causes major problems.

So if you run any of the modules in your ASA firewalls, make sure to check the raid level and that the drives are in a healthy state, if not get the ticket open with TAC. Where they can give you such brilliant ideas as move the faulty drive to another ASA, or swap the drives(which causes the firewall to crash, so don’t do it).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.