Recently, I have been sent a network trace file to analyze. The common complains were related poor (slow) TCP performance. After examining the trace file in detail, I found the culprit was TCP Zero Window packets.
What is TCP Zero Window?
In previous article (https://www.golinuxcloud.com/tcp-receive-window/), I have explained what a TCP receive window is. TCP zero window is a mechanism used to control the data flow. When receiver gets overwhelmed by sender, it can reduce its receive window. If it fills, the receiver reduces the receive window to zero which simply means that “I am full, please do not send any more data”. The sender will not send any data until the receiver frees (processes) the data in the receive window (TCP buffer). Once the buffer gets empty, the receiver increases (updates) the receive window and informs the sender that it has a free buffer to store some data.
Assume we have a sender and a receiver exchanging data like figures below. I will overview TCP zero window in a couple of steps.
Step-1: Zero window only happens when TCP is used as the transport protocol. There is no such thing in UDP. During TCP 3way handshaking both the sender and the receiver advertise their receive window. As seen below, the receiver sets its receive window to 4 bytes. The same goes for the sender. For sake of simplicity, each cell represents a byte. At the start, both of the buffers are fully free.
Step-2: The application on the sender delivers 4 bytes to the sender’s buffer and the sender sends one of them to the receiver like below. The receiver stores the data in its buffer and acknowledges the sender with the window size of 3.
Step-3: The sender puts another byte on the fly, freeing its buffer one more byte. As soon as the receiver gets the data, it places it in one of the free cells and lets the sender know there are only two cells left free with setting receive window to 2.
Step-4: The same pattern happens here as well.
Step-5: In this step, the sender sends the last byte in the buffer to the receiver. The receiver stores the data in its buffer. As seen below, the receiver’s application has not emptied the buffer. Because of that, there is no room for storing more data in the buffer, the receiver has to inform the server to stop sending more data with setting its receive window to zero. Once the sender sees the zero window, it stops sending data until the receiver increasing its receive window.
Step-6: The application on the server delivers some data to TCP and the data is stored in the buffer. Since the sender has not gotten any window update from the receiver, it has to wait without sending any data. This causes a delay in the network. As a result, the clients experience a slow network.
Step-7: The receiver frees 3 cells (bytes) and lets the server know that. After this step, the server will continue to send the data.
Analyzing TCP Zero Window in Wireshark
Zero window generally happens when there is heavy data exchange between the parts and low resources to consume. It is not easy to reproduce a TCP window zero event in a lab, so I will use Curl tool to cause a TCP zero window. It provides rate limiting by specifying the maximum upload and download transfer rate.
Step-1: For demonstration, I will use Curl to download a web page with setting rate limit 1K for each direction (download and upload). After starting Wireshark, apply the command below for rate limiting. The tool will optimize TCP receive window accordingly.
┌──(kali㉿kali)-[~] └─$ wget http://info.cern.ch/hypertext/WWW/TheProject.html --limit-rate=1K --2022-06-03 15:00:05-- http://info.cern.ch/hypertext/WWW/TheProject.html Resolving info.cern.ch (info.cern.ch)... 188.184.21.108, 2001:1458:d00:34::100:125 Connecting to info.cern.ch (info.cern.ch)|188.184.21.108|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2217 (2.2K) [text/html] Saving to: ‘TheProject.html.7’ TheProject.html.7 100%[=================================================>] 2.17K 1023 B/s in 2.2s 2022-06-03 15:00:07 (1023 B/s) - ‘TheProject.html.7’ saved [2217/2217] ┌──(kali㉿kali)-[~]
Following screenshot shows the packets I captured. Notice that there is some nice expert info related to “[TCP ZeroWindow]” and “[TCP Window Full]”.
Step-2: The receiver advertises its receive window of 1152 bytes to the sender in the packet number 1.
Step-3: The sender advertises its receive window of 28960 bytes to the receiver in the packet number 2.
Step-4: After completing TCP 3way handshaking, the receiver sends a http GET request in packet number 4.
Step-5: After receiving the request, the sender (the server) responses with a packet size of 576 bytes in packet number 6.
Step-6: The sender sends another packet with size of 576 bytes in packet number 8. The total number of bytes sent to the receiver is equal to 576+576=1152. Since it is equal to receive window, Wireshark predicts that the receiver receive window has been filled.
Step-7: The receiver ACKs the sender with packet number 9, which means the receiver has freed the receive window.
Step-8: The sender sends a packet size of 1152 bytes, which fills the receiver receive window. Wireshark shows that in packet number 10.
Step-9: Since the receiver window is full, the receiver notifies the server to stop sending data with setting window size to zero in packet number 11.
Step-10: with packet number 12, the sender ACKs the receiver and tells it to keep the connection alive.
Step-11: The receiver keeps setting its window to zero in packet number 13.
Step-12: The receiver and sender keep repeating the same pattern for the next two packets.
Step-13: After freeing the buffer, the receiver updates its window to 1152 and the data transfer starts again. When we look at the time column, we can see that the zero window caused almost 2 seconds delay.
Final thoughts
TCP zero window plays a great role in experiencing a slow network. There can be variety of reason behind it. To address some common reason:
Check if both side the sender and the receiver advertise window scaling, which lets use of much larger receive window. The larger buffers mean less overwhelm. It is important that both side supporting window scaling, otherwise window scaling will not be utilized. Window scaling only works if both sides include it as an option during the initial 3-way handshake.
To investigate the culprit behind TCP zero window, one should take a look at the receiver resources to find out why the receiving buffer does not get emptied in the time. The receiver resources need to be observed, including counters for logical disk, physical disk, RAM, page file, all the TCP counters, processes and CPU.
References
https://accedian.com/blog/tcp-receive-window-everything-need-know/
https://www.linkedin.com/pulse/tcp-retransmits-window-size-0-problem-maybe-larry-brasher-brasher
https://packetpioneer.com/wireshark-graphing-tcp-zero-windows/
https://support.f5.com/csp/article/K35612380