After reading through the beginning of Andrew Blum’s, Tubes last night, I decided to spend some time today playing with
traceroute. I was initially exposed to the tool via Pingdom’s root cause analysis feature, which captures a
traceroute when it detects a downtime incident for diagnostic purposes. In the past, I’ve always been a bit confused about how to actually interpret the
traceroute results. But, as of today, I feel pretty good about my understanding. In this post, I’ll share what I’ve learned for anyone else struggling to make sense of
The goal of
traceroute is to…
If you have a website that’s down or slow,
traceroute is a useful tool for diagnosing if the issue is network related.
traceroute takes advantage of the TTL field found in IP packets. If you’re like me, you think of “TTL” as an indicator for cache expiry such as with DNS. However, with IP packets, TTL refers to the maximum number of “hops” a packet can take on it’s way to the requested host.
Each gateway on the network that processes the packet reduces the TTL value by 1. Once the TTL has been decremented to 0, the link that terminates the connection (usually) sends an ICMP TIME_EXCEEDED response. Traceroute sends out a sequence of packets (typically UDP) increasing the TTL each time and measuring the response time until it reaches the intended host.
One thing that always confused me in the past looking at
traceroute results was seeing the time go down between hops.
The key to understanding why this happens is that, as stated above,
traceroute sends a separate request for each hop on the way to the destination. In other words, in the screenshot above, the request that took 114.163 ms to reach the 3rd hop has nothing to do with the request that took 18.110ms to reach the 4th hop. Due to constantly varying network conditions it is entirely possible for time to go down from earlier to later hops. This is definitely a source of confusion as evidenced by this Stack Overflow thread.
The asterisks mean ICMP TIME_EXCEEDED response was not received within the allowed wait time. Wait time for
traceroute defaults to 5 seconds, but can be modified by supplying the
-w flag when running the command.
It is not uncommon to see hops with asterisks in the middle of a
traceroute as there are many gateways that simply don’t return the ICMP TIME_EXCEEDED response.
tracerouteNever Reach The Host?
traceroute just ends with asterisks over and over. You’ll see something like this…
In this case, a firewall is likely blocking the UDP packets
traceroute is sending.
traceroute offers the
-P flag for using the
ICMP protocol and the
-p flag for specifying the port. In some cases, I’ve also observed that
tcptraceroute is more successful at getting through firewalls.
You may have noticed the
-q flag in the screenshot above. By setting
-q 1 I am telling
traceroute to only trace the route to the host once. By default it will do it 3 times. The idea behind defaulting to 3 is that it can be averaged to filter out any network anomalies.
Again, each line and each query is a unique request and there is no guarantee that the request to the host will take the exact same path to through the network. If you see multiple lines for a single hop it means there was some variance in the path taken to reach the host.
I hope that this article was helpful to for understanding
traceroute results. If you have any comments, feel free to drop a note comments below. Of course, as always, you can reach me on Twitter as well.
Hi, I'm Max!
I'm a software developer who mainly works in PHP, but also dabbles in Ruby and Go. Technical topics that interest me are monitoring, security and performance.
During the day I solve challenging technical problems at Something Digital where I mainly work with the Magento platform. I also blog about tech, work on open source and hunt for bugs.
If you'd like to get in touch with me the best way is on Twitter.