Since I ran into this issue and wasn’t really able to find anyone posting on this I thought I should put something together for anyone else that runs into it. I had an issue with a stack of 3750x switches where there was unicast flooding to all of the ports in the same VLAN. While doing research I came across suggestions of asymmetric l2 routes and timeout values for the arp tables and tcam table overruns. My issue turned out to be none of these, the arp timeout values where all increased and that didn’t solve the problem. My network if farily simple with a collapsed core and l2 asymmetric routing wasn’t the issue. The tcam tables were different not being overrun on this switches as it can handle 8K arp entries and I am no where near that.
So what did that leave me with? An issue where the ARP tables of all members of the stack were not getting updated in a timely manner. As seen below with the following command:
remote command all sh mac add count | i Total
Switch : 3 : (Master)
———————
Total Mac Addresses : 152
Total Mac Addresses : 585
Total Mac Addresses : 39
Total Mac Addresses : 381
Total Mac Addresses : 384
Total Mac Addresses : 22
Total Mac Addresses : 28
Total Mac Addresses : 178
Total Mac Addresses : 0
Total Mac Address Space Available: 6402
Switch : 1 :
————
Total Mac Addresses : 152
Total Mac Addresses : 585
Total Mac Addresses : 162
Total Mac Addresses : 22
Total Mac Addresses : 39
Total Mac Addresses : 381
Total Mac Addresses : 384
Total Mac Addresses : 28
Total Mac Addresses : 0
Total Mac Address Space Available: 6418
Switch : 2 :
————
Total Mac Addresses : 152
Total Mac Addresses : 585
Total Mac Addresses : 165
Total Mac Addresses : 22
Total Mac Addresses : 39
Total Mac Addresses : 381
Total Mac Addresses : 384
Total Mac Addresses : 28
Total Mac Addresses : 0
Total Mac Address Space Available: 6415
Switch : 4 :
————
Total Mac Addresses : 152
Total Mac Addresses : 585
Total Mac Addresses : 39
Total Mac Addresses : 381
Total Mac Addresses : 384
Total Mac Addresses : 22
Total Mac Addresses : 28
Total Mac Addresses : 140
Total Mac Addresses : 0
Total Mac Address Space Available: 6440
After many hours of troubleshooting with TAC, they finally came to the conclusion that we were hitting bug:
CSCut64281 Ports on Member of the stack takes long time to learn/age MAC addr
This was only evident in the 15.1x code train, this issue didn’t exist in 15.0 which is why some of my older switches weren’t seeing it. Only the brand new shiny ones I had installed last year. The fix was finally available in the last couple of months in 15.2.(2)E3. I finally finished testing the release on some slightly non prod switches and then decided to roll out to my campus, now I am seeing the following in the upgraded switches:
Switch : 3 : (Master)
———————
Total Mac Addresses : 142
Total Mac Addresses : 536
Total Mac Addresses : 44
Total Mac Addresses : 70
Total Mac Addresses : 30
Total Mac Addresses : 21
Total Mac Addresses : 27
Total Mac Addresses : 150
Total Mac Addresses : 0
Total Mac Address Space Available: 7151
Switch : 1 :
————
Total Mac Addresses : 142
Total Mac Addresses : 535
Total Mac Addresses : 44
Total Mac Addresses : 70
Total Mac Addresses : 30
Total Mac Addresses : 21
Total Mac Addresses : 27
Total Mac Addresses : 151
Total Mac Addresses : 0
Total Mac Address Space Available: 7151
Switch : 2 :
————
Total Mac Addresses : 142
Total Mac Addresses : 534
Total Mac Addresses : 44
Total Mac Addresses : 70
Total Mac Addresses : 30
Total Mac Addresses : 20
Total Mac Addresses : 27
Total Mac Addresses : 150
Total Mac Addresses : 0
Total Mac Address Space Available: 7154
Switch : 4 :
————
Total Mac Addresses : 142
Total Mac Addresses : 535
Total Mac Addresses : 44
Total Mac Addresses : 70
Total Mac Addresses : 30
Total Mac Addresses : 21
Total Mac Addresses : 27
Total Mac Addresses : 146
Total Mac Addresses : 0
Total Mac Address Space Available: 7156
While not perfect it definitely seems to be a lot better than the previous reports. I keep looking for the bug to be posted on Cisco’s site, but it is still private at this point.