You wake up, make coffee, sit down by the computer and start reading your favorite web sites. You fire up your favorite browser and type www.site.com on the address bar, hit enter and continue sipping on that coffee. You wait for the page to load, sipping some more coffee – a few seconds later you get the Google search results for “www.site.com”. You scratch your head, sip some more coffee, and start wondering if you did a typo, but no it is correct – Google is not correcting your spelling. Obviously you are online, you got the Google search page. By now the coffee is gone, you are frustrated and wonder what in the world just happened to you favorite site?
We all have had the above experience, or have dealt with parents and family that had the above problem, and struggle to understand what happened to their favorite site!
As most of you know, computers do not understand “www.site.com”, they rely on DNS resolution to resolve the name into an IP address which the computer can connect to. DNS is like a giant phone book that associates memorable names to humanly incoherent IP numbers.
This blog has plenty of articles on Web Performance and Web Performance Optimization. But one item that we should have covered a long time ago is the impact of DNS to Web Performance. DNS is the Achille’s heel of the web, is often forgotten and its impact on site performance ignored until it breaks down.
We will not cover how DNS works, Wikipedia has a very good overview on how DNS works and there is plenty of content covering every side of the DNS.
Let’s look at how DNS is impacting web performance.
To illustrate our point let’s look at Techcrunch.com, its homepage has 323+ objects and relies on 72+ hostnames. A browser has to lookup 72 domains, most of which do not belong to TechCrunch. Here are two charts displaying each of the domains on the page, how many times they were called and how long it took to resolve them (Measurements taken from our Chicago Global Crossing Node).
As you can clearly see it takes time to resolve every domain. Ideally the resolution time would be very small, as the domain information would be on the DNS Cache of the computer, or at least that of the ISP. But more often than not, they are not in cache because they have very short cache TTL. Because they are not in the cache the DNS servers of the ISPs will have to resolve them by querying the system. In some cases the queries will be blazing fast and other cases they will be very slow, or even fail, because the NS servers are not distributed or not properly setup, and this will impact performance.
A web page is an ecosystem where multiple objects contribute to the web page content, and as a result to its slowness or even unavailability. Some of the slowness can be due to poor server performance but at times is due to DNS related issues.
So here are some tips to ensure that DNS is not a bottleneck in your website performance
- Do not use unnecessary Low TTL (time to live). The shorter the TTL the sooner the user’s machine will have to do the query again for the domain. By having high TTLs you ensure that the information caches locally on the users machine, and also at the ISPs DNS – which impact multiple users. Short TTL also has an impact on your DNS servers, shorter TTL translates in increased number of queries to your servers. So if a domain does not need a short TTL, do not set it to 5 minutes – think about specifying 60 minutes or even one day.
- Capacity: If you get a lot of requests (you are very popular), either make sure you have plenty of capacity on your servers or let someone else manage DNS for you. Here is an example of a DNS server that is having issues keeping up with demand during critical business hours.
- Too Many CNAMES (Canonical Name record) or Aliases: example Expedia.com .Here is the output from Catchpoint DNS Experience Monitor – it took querying 7 servers sequentially to get the IP address of www.expedia.com: Here is the same example using the popular Dig DNS tool :Dig-Expedia
- Exotic Domains: be careful with the exotic domain names, .ly, .tv… these domains have authoritative servers that are often far away from you end user ISPs. The records will have almost always 2 day TTL, however you never know when someone will be impacted because the query has to go to the authoritative servers and they fail. Example “.ly”, 2 authoritative servers are in Libya, 2 in the US, and 1 in the Netherlands.
- Geographical Distribution: If you are a popular site, make sure you have a distributed DNS system from a geographical location perspective or again let someone else manage DNS for you.
- Unreliable Registrars: If you are planning to build the next Facebook and will get +100K hits a seconds make sure your registrar DNS servers can handle the load or once again delegate your NS records to someone that has the capacity / reliability.
- Backup Plans: Make sure you have a backup plan… Even if you are using a major DNS management company, they also have failures, have a disaster plan ready to kick in.
- Do not mix Private & Public IP addresses, separate your corp domains from your external domains. Recently a clients of ours was having DNS lookup failures, upon investigation we realized that 25% of the queries were directed to the internal NS server (192.168.1.X).
- Disable Recursion on your Public DNS infrastructure if you do not need it.