Navigation
shutterstock_481613305

Why Web Performance Matters – Comprehensive CDN Monitoring

Waiting for a webpage to load can be painful—but not knowing how to proceed when the page is “stuck” is a nightmare. User satisfaction is dependent on website performance and ease of use – a slow loading website has a negative impact on end-user experience and directly affect conversion rates. To ensure high availability and performance, companies often turn to CDNs to speed up the delivery to the last mile by reducing latency; however, CDNs rely on a very complex infrastructure thus prone to failure as well.

Performance monitoring has expanded well beyond just webpage performance – to understand the big picture, monitoring must track every component that drives the digital experience of a user which includes the Content Delivery Network. The common belief is that using a CDN serves the content faster based on the visitor’s geographic location to reduce latency and improve the performance of the website. Any issue with a CDN (DNS, outage, DDoS) can lead to several points of failure in the content delivery process.

Our support team had recently handled an issue affecting one of the top travel aggregator websites and the root cause analysis of the issue illustrates exactly why you MUST have a CDN monitoring strategy in place.

The travel website uses a major CDN for acceleration; our tests started reporting failures on 02/07/2017 00:30 PT. When users navigated to the site and searched for available flights, the page was stuck in a continuous loop.

screen-shot-2017-02-28-at-9-38-56-am

End users located in Portland and Seattle were affected whenever the content was served from this particular Edge Server. During this time frame, availability of the search page from Portland and Seattle were 11.11% and 89.9%, respectively. The users from Portland could not search for flights on this major travel website.

screen-shot-2017-02-28-at-9-39-55-am

During the same timeframe, cities like Dallas, Atlanta, and New York experienced availability of 99.12%, 99.47%, and 98.19%, respectively.

screen-shot-2017-02-28-at-9-40-36-am

Using our capability to override DNS for specific hosts, we could confirm that the issue was specific to a CDN server. Further investigation helped us narrow down the issue to their West Coast Edge Server. The server was returning zero-length Javascript files, which rendered the search page blank and unable to pull/display results.

The graph below shows the drastic dip in the number of script files and the corresponding script bytes, which was clearly causing the issue on the website.

screen-shot-2017-02-28-at-9-41-23-am

The dip in script bytes was observed from a specific IP alone. This helped isolate the issue to one of their nodes. The issue was resolved and normal function was restored to the website once the CDN purged all the Javascript files.

Example of a response header captured:

screen-shot-2017-02-28-at-9-42-14-am

This was a classic example showcasing the importance of monitoring critical user journeys to understand the impact/influence of CDN performance on your web assets’ functionalities and, once again, highlights the importance of monitoring all components.

It is said, “every failure is simply the opportunity to begin again, this time more intelligently.”

Here are some key CDN monitoring strategy tips:

  • Always monitor the origin and the CDN to quickly identify the issue (is it the CDN or is it the origin?)
  • Monitor common JS, CSS, Image (pick the top 3 resources – top 3 CSS files, 3 top js files…) and monitor both origin and CDN version.
  • On the CDN side, use a random number to keep the CDN honest (example: http://www.company.com/logo.jpg?$randomnumber)
  • Capture Headers and pass the appropriate debug headers for faster troubleshooting
  • Hit the robots.txt on both origin and CDN (example http://origin-abc.com/robots.txt and http://www.abc.com/robots.txt) http://blog.catchpoint.com/2012/01/19/best_friend_robots/
  • Ensure that your origin is properly configured to talk to the Edge (is chunked-encoding enabled? Bad idea!)