Waiting for a webpage to load can be painful—but not knowing how to proceed when the page is “stuck” is a nightmare. User satisfaction is dependent on website performance and ease of use – a slow loading website has a negative impact on end-user experience and directly affect conversion rates. To ensure high availability and performance, companies often turn to CDNs to speed up the delivery to the last mile by reducing latency; however, CDNs rely on a very complex infrastructure thus prone to failure as well.
Performance monitoring has expanded well beyond just webpage performance – to understand the big picture, monitoring must track every component that drives the digital experience of a user which includes the Content Delivery Network. The common belief is that using a CDN serves the content faster based on the visitor’s geographic location to reduce latency and improve the performance of the website. Any issue with a CDN (DNS, outage, DDoS) can lead to several points of failure in the content delivery process.
Our support team had recently handled an issue affecting one of the top travel aggregator websites and the root cause analysis of the issue illustrates exactly why you MUST have a CDN monitoring strategy in place.
The travel website uses a major CDN for acceleration; our tests started reporting failures on 02/07/2017 00:30 PT. When users navigated to the site and searched for available flights, the page was stuck in a continuous loop.
End users located in Portland and Seattle were affected whenever the content was served from this particular Edge Server. During this time frame, availability of the search page from Portland and Seattle were 11.11% and 89.9%, respectively. The users from Portland could not search for flights on this major travel website.
During the same timeframe, cities like Dallas, Atlanta, and New York experienced availability of 99.12%, 99.47%, and 98.19%, respectively.
The graph below shows the drastic dip in the number of script files and the corresponding script bytes, which was clearly causing the issue on the website.
Example of a response header captured:
This was a classic example showcasing the importance of monitoring critical user journeys to understand the impact/influence of CDN performance on your web assets’ functionalities and, once again, highlights the importance of monitoring all components.
It is said, “every failure is simply the opportunity to begin again, this time more intelligently.”
Here are some key CDN monitoring strategy tips:
- Always monitor the origin and the CDN to quickly identify the issue (is it the CDN or is it the origin?)
- Monitor common JS, CSS, Image (pick the top 3 resources – top 3 CSS files, 3 top js files…) and monitor both origin and CDN version.
- On the CDN side, use a random number to keep the CDN honest (example: http://www.company.com/logo.jpg?$randomnumber)
- Capture Headers and pass the appropriate debug headers for faster troubleshooting
- Hit the robots.txt on both origin and CDN (example http://origin-abc.com/robots.txt and http://www.abc.com/robots.txt) http://blog.catchpoint.com/2012/01/19/best_friend_robots/
- Ensure that your origin is properly configured to talk to the Edge (is chunked-encoding enabled? Bad idea!)