For our frequent readers, it is no surprise that we firmly believe DNS has a huge impact on Performance. One of the biggest challenges with DNS is understanding what impact various DNS issues can have on end users. DNS resolution is complex and we lack insight on how end users’ DNS resolvers are configured to properly calculate impact.
It is easy to understand the impact of a hard DNS failure – all authoritative servers for the domain are down, the domain will be unreachable by any user. But what happens on a more common case where one particular authoritative server is slow, or unreachable?
Here is an example of such case a client experienced recently. They monitored the website via a browser based agent (IE 8) and monitored their Managed DNS provider through our DNS monitor. On December 27th 2011, for about 1 hour two of the four authoritative servers were un-reachable from the East Coast.
In this case there was a 50% chance that a resolver would reach a server with a problem and eventually timeout. The resolver would than retry the query on the next authoritative server which hopefully responded correctly. Worst case scenario resolver would go through 3 requests (two timeouts sequentially) to get the proper answer (16.67% chance).
Certain companies / individuals suggest that this timeout / retry scenario comes at no cost to the end user experience, ZERO milliseconds impact. We have an on-going debate with a major CDN provider regarding this issue, you know one of those “No! The earth is not flat!” arguments.
The answer is not simple, probably to some end users there is no impact because the DNS was cached or the DNS resolver has smart handling built in to avoid servers timing out for periods of time. However, not all domains are that popular to be in a resolver’s cache, and not all resolvers can avoid unreachable servers always.
So did this DNS failure, caused by dead Name Servers have an impact on web performance? You bet it did!
Here is the response time and DNS lookup time as captured by the Internet Explorer 8 agent which relies on a commercial DNS resolver located at the same location as the agent. During the DNS outage, the response time (time to load the base page URL) spiked due to higher DNS resolution.
When your DNS or CDN’s DNS servers experience problems, there will be a CLEAR impact for at least some of the users (if not all users). A DNS timeout/retry bears a cost that varies based on the features and configuration of the DNS resolver (Bind, Microsoft, etc). In a world where everyone is trying to be as fast as possible and a competitor is a click away, no one should take such a gamble and hope no one noticed! Your users will notice, and they will be frustrated.
How can you mitigate such DNS problems:
Mehdi – Catchpoint