Troubleshooting With TTL
The generic definition of Time to Live (TTL) refers either life time of a network packet or expiration time of the cached data in a computer. Since it is a broad definition, I will break it into 3 categories to explain it better.
TTL in Internet Protocol (IP)
It refers to the number of hops that a packet can travel through before it gets discarded by a layer 3 device. The purpose of TTL in IP headers is to prevent a network packet from circulating indefinitely. TTL is 1-byte field which can be set to maximum 255. The TTL value is decreased by 1 every time the packet passes through a layer 3 device. When it reduces to 0 (zero), the packet gets dropped and an ICMP packet is sent back to the sender.
Following figure shows a DNS response with TTL of 55 in IP header coming from 8.8.8.8. The TTL value most likely set to 64 when the DNS sent the response packet in the beginning. The default TTL value varies between different operating system. It even can be used to fingerprint a device.
TTL in Domain Name System (DNS)
It is set by DNS server, informing the client (resolver) how long it should keep the DNS record in its cache before requesting a new one. In the figure below, the client is querying the DNS server, asking about www.google.com. The DNS server (8.8.8.8) sends a DNS response to the client (192.168.1.52) with multiple “A” record inside the packet. Each record includes a TTL with value of 4 which means that the client should cache the record for 4 seconds.
TTL in Hyper Text Transfer Protocol (HTTP)
TTL in HTTP lets the client know how long a resource (an image, a html, a media, etc.) should be stored locally before the client (browser) makes a new request for a copy of the resource. HTTP uses “Expires” and “Cache-Control: max-age” header fields to control caching. I have a http request and response from one of our previous articles. Let’s take a look at the packets. The following figure summarizes the flow.
I used “curl” as a client to create a GET request for www.example.com. The server responded with “Expires” and “Cache-Control: max-age” fields, notifying me how long I should keep the resource in the cache. Expires field refers a time after which the response is considered stale. Cache-Control: max-age measured in seconds. When both of the fields set in a response, the “Expires” gets ignored. The actual packets are below.
Troubleshooting scenario
Before we dive into packets and fields with Wireshark, take a look at the topology below.
In the topology above, there are some network devices between two parts and the client can reach the web server without any problem. Take a not on the TTL times in the incoming and outgoing packets.
- Gateway: It serves as a DHCP server and a gateway with simple routing functionality.
- Firewall: It has multiple module like Firewall, IPS, Proxy, etc. It simply does packet inspection.
- NAT Router: As the name states, it does network address translation.
Step 1
Imagine that one of your client (kali) comes across a failure while trying to reach the web server and the client contacts you to solve the problem.
Step 2
You ask the client to reproduce the failure while you capture the packets between two parts. For sake of learning, I assume you already captured the packets before the failure. Because we will make a comparison of TTL values for before and after. I captured some and they look fine as below.
Step 3
We will gather some information on the working connection packets. Select a packet and expand its IP header. Right click on the “Time to Live” field and next “Apply as column”. At this point you have TTL as a column like below.
Step 4
The packets arrive the client (kali) with TTL value of 40 while it sends with 64. It is great clue for troubleshooting.
Step 5
In step 2, you reproduced the issue and captured the packets below.
When we analyze the packets, the reset packet with red color draws our attention. It appears that we have got a reset from the server. The first 3 packets show that TCP 3-way handshaking looks fine. In the packet number 4, the client sends a http GET request and wait for a response. Instead of a http response, the connection gets reset.
Step 6
Examine the TTL time in the reset packet carefully. The TTL time is set 254 while we expect 40, more or less. It means the packet is coming from one hop away from the client. No way, it can not be the server who sent the packet. Because it is far more away from the client. When we look at the topology, the firewall is one hop away. It is obvious that the firewall did not like the response packet and reset the packet. Security devices often reset connection due to security reasons. The topology below summarizes the case.
Emulate the Scenario in GNS3
I emulated the scenario in GNS3 with Cisco routers. The old cisco routers have a feature called TCP intercept, which analyses the connection and when it notices the packet poses a threat, it drops the packet and reset the connection. All Configurations for each devices are below.
Configuration for Gateway
! hostname GATEWAY ! ip dhcp pool golinuxcloud_clients network 192.168.2.0 255.255.255.0 dns-server 8.8.8.8 default-router 192.168.2.1 ! username celal privilege 15 password 0 dogan ! interface Loopback0 ip address 3.3.3.3 255.255.255.255 ip ospf 100 area 0 ! interface FastEthernet0/0 ip address 192.168.23.3 255.255.255.0 ip ospf 100 area 0 speed auto duplex auto ! interface FastEthernet0/1 ip address 192.168.2.1 255.255.255.0 speed auto duplex auto ! router ospf 100 redistribute connected !
Configuration for Firewall
! hostname Firewall ! ip tcp intercept list intercept ip tcp intercept connection-timeout 1 ! interface FastEthernet0/0 ip address 192.168.12.2 255.255.255.0 ip ospf 100 area 0 speed auto duplex auto ! interface FastEthernet0/1 ip address 192.168.23.2 255.255.255.0 ip ospf 100 area 0 speed auto duplex auto router ospf 100 ! ip access-list extended intercept permit ip host 192.168.2.3 any !
Configuration for Nat Router
! hostname NAT_Router ! username celal secret 5 $1$WNlY$CTfihi7W3ZR6.Q6co0r4m/ ! ! class-map match-all http match protocol http ! policy-map DROP_HTTP class http police 8000 conform-action drop exceed-action drop violate-action drop ! ! interface Loopback0 ip address 1.1.1.1 255.255.255.255 ip ospf 100 area 0 ! interface FastEthernet0/0 ip address 192.168.12.1 255.255.255.0 ip nat inside ip ospf 100 area 0 speed auto duplex auto service-policy output DROP_HTTP ! interface FastEthernet0/1 ip address dhcp ip nat outside speed auto duplex auto ! router ospf 100 default-information originate always ! ip nat inside source list NAT_LIST interface FastEthernet0/1 overload ! ip access-list extended NAT_LIST permit ip any any !
Since I did not capture the packets from a real life (production environment), I had to emulate the scenarios with dropping the packets using QoS rules, triggering the Firewall to drop the connection after one second later. When the firewall intercepted the connection, it reset the TTL time which was not compatible with real life experience. Thus, I had to modify the TTL field and some packets with Wireshark to make it look like it was from production.
Final thoughts
Due to security reasons, some devices in your network can interrupt or reset TCP connection. Being able to examine TTL helps you to troubleshoot in cases like that.