Understanding Network Loops
A network loop occurs when there is more than one path exists between the source and destination. Consider the figure below, in which two switches are connected to each other with multiple path. Imagine that Layer 2 loop prevention mechanism is not enabled or is broken on the switches somehow. The broadcast or multicast frames created by any end point in the network will be received by the switches and flooded out of every port except the port that the frame was received on, creating a layer 2 loop between two switches. Even if you disconnect every end point from the network, there will still be an infinite broadcast and multicast storm between both switches.
What is broadcast, multicast, unicast and how does it work?
In packet switching network, there are 3 types of frames; broadcast, multicast and unicast.
1. Broadcast Frame
Broadcast is the term used to describe communication where a frame is sent from one end point to all other end points at the same time. In this type of communication, there is only one sender that sends data frames to all connected receivers.
2. Multicast Frame
Multicast is the term used to describe communication where a frame is sent from one end point to one or multiple end points at the same time. In this type of communication, there is only one sender that sends data frame to one or multiple connected receivers.
3. Unicast Frame
Unicast is the term used to describe communication where a frame is sent from one end point to the only one end point. In this type of communication, there is only one sender that sends data frame to one connected receiver.
How does a switch know if a frame is broadcast, multicast or unicast?
It is all about how mac address format is designed. MAC address is 48 bits (6 bytes) unique identifier. Each byte is called an “octet”. See more details below.
The first 24 bits (3 bytes) of mac address is called organisationally Unique Identifier (OUI), which identifies a vendor, manufacturer, or other organization. See some OUI below.
- 00:12:1e: represents Juniper Networks.
- 00:19:06: represents Cisco Systems, Inc.
- 00:1d:60: represents ASUSTek COMPUTER INC.
- 52:54:00: represents Realtek.
- 08:00:27: represent PCS Systemtechnik GmbH.
Individual/Group (IG) bit is used to differentiate unicast frames from multicast frames. When the bit is set to zero (0), it means it is a unicast frame. When it is set to one (1), it means the frame is a multicast. The bit is located in the most significant byte of mac address. In the figure above, “b0” is the IG bit. When it comes to broadcast frames, a special mac address (FF-FF-FF-FF-FF-FF) is used to distinguish broadcast frames from the other traffics.
What are the reasons behind a network loop?
Network loops occur due to many reasons. The most common causes are below.
- Human error in cabling.
- Unidirectional link failure in a fiber cable.
- Buggy spanning tree.
- Buggy network devices (IP Phones).
As a network administrator, I mostly experience the first and last one.
For learning and testing purpose, the easiest way to create a loop is just disabling spanning tree on the switch and plugging network cable from one port to the other one. But it is a rare case in a network to have that kind of loop. It happens only with hubs and dummy switches that do not run a spanning tree. A second method to easily create a loop is using buggy IP phones. Here is a list from Cisco: https://quickview.cloudapps.cisco.com/quickview/bug/CSCvd03371
If you have one of the phones in the list, you can create a loop. (I am sure there are some other models that have the bug but not listed there)
When you accidentally plug both PC Port cable and Network Port cable of the phone into the network switch, it'll cause a network loop that brings down the network. The reason is that the STP packet sent by the switch through the Network Port gets filtered by the phone. Since the switch does not get the STP packet back through the PC Port, it will not block that port, leaving the network prone to a broadcast or multicast storm. In a short time, a loop occurs and the network goes down.
How to protect your network from loop?
It actually depends on what kind of equipment you have and what sort of loop prevention mechanisms it has. Protection method commonly used are below.
- Enabling Storm control
- Enabling Spanning Tree Protocol (STP, RSTP, MSTP, etc)
- Enabling the other proprietary loop prevention mechanism
Find a loop with Wireshark
Use “unicast / (broadcast +multicast)” formula which gives you a great idea. Let’s test it on my packets I captured during the loop. We will create a filter (eth.dst.lg == 0
) that shows the packets contain IG bit of zero (0), which displays unicast packets. See the details below.
Number of unicast packets is 510. The number of total packets is 1870829.
Broadcasts + multicast = total packets – unicasts = 1870829-510 = 1870319
As you see both in the screenshot and calculation, unicast packet ratio is pretty low which indicate that a loop has occurred. The number of broadcast and multicast packets can be found with this filter: (eth.dst.lg == 1 ) or (eth.addr == ff:ff:ff:ff:ff:ff)
Header Checksum or Identification fields in IP header can be used to check if a loop has happened. Since every time both fields change for each packet, when you see multiple the same Header Checksum or Identification field in other packets, you can easily say there has been a loop. Remember that Header Checksum and Identification are 2 bytes fields. Even if it is low, there is always a chance there would be a collision of the fields. Collision simply means that the same identifier or calculation to be assigned to the different packets.
1) Select a broadcast or multicast packet and go to IP header section.
2) Right click on the “Header Checksum” and a menu appears.
3) Click on “Apply as Colum”
4) Do the same steps for “Identification” filed too.
5) At this point you should have columns like below
In the figure above, we can see in a short time frame the packet with the same Identification and Header Checksum is seen multiple times. It proves that there has been a loop.
Alternative ways to find network loops
The simplest way is to use “Task Manager” in any Windows Operating System. Open your desktop, right click on the task bar and select “Task Manager” from the context menu and navigate to the “Performance” tab. You will see CPU, Memory and Network utilizations. See below my CPU and Network utilization during the loop.
When the loop occurred, my network utilization increased instantly, causing my computer to freeze. When I disabled the loop, the traffic went to the normal level. Beside excessive network utilization, high CPU usage was observed.
Checking logs on the switches may give useful clue. The logs below produced on the switch during the loop.
*Mar 9 13:01:43.740: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.295a.2291 in vlan 1 is flapping between port Gi1/0/4 and port Gi1/0/3 *Mar 9 13:01:45.569: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.295a.2291 in vlan 1 is flapping between port Gi1/0/4 and port Gi1/0/3 *Mar 9 13:02:00.744: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.295a.2291 in vlan 1 is flapping between port Gi1/0/4 and port Gi1/0/3 *Mar 9 13:02:16.036: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.295a.2291 in vlan 1 is flapping between port Gi1/0/4 and port Gi1/0/3 *Mar 9 13:02:31.144: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.295a.2291 in vlan 1 is flapping between port Gi1/0/4 and port Gi1/0/3
As you see, the same mac address is flapping between port Gi1/0/4 and port Gi1/0/3, which is a great sign of a loop.
Another way is to use network monitoring tools like SolarWinds Network Performance Monitor, Nagios Core, Cacti, Observium etc. The crucial point is that you should look at inbound traffic where there has been excessive traffic recently. That is probably the source of the loop.
Final Thoughts
Introduction of a network loop can severely impact network performance. The prevention mechanisms should always be enabled. When it occurs, tools like Wireshark or network monitoring tools can be handy.