Navigation

Monitoring 101: Peeling The Onion.

I am often asked by customers and prospects: “What should we be monitoring?” This is a billion dollar question and it seems like everyone has their own answers. I have seen different approaches, some better than others.

In this blog post I want to share with you what I think is a “good” methodology for monitoring. To illustrate it I will use a simple webpage, but you can easily apply it to web-based applications.

Before we talk about what to monitor, lets quickly cover why you should monitor in the first place:

  • Availability – Is your site or application up and running?
  • Speed – Is the site or application operating at the desired speed (this is not about optimization, this is about is it running as fast as is supposed to run or not).
  • Reliability or Integrity – Great; we accessed it, and it was fast. Now, is it giving me what is intended, and working as it is supposed to?

In a webpage there are multiple components that determine availability, speed and integrity. If we were to dissect a webpage we would see the following:

  • Primary URL of the page – the URL the user has to type/click to access the webpage.
  • HTML response from the Server – this is essentially what the browser will render.
  • External JavaScript and CSS files – these will build the display and the functionality of the page.
  • External Objects – images, ads, beacons and widgets. All different web technologies, but each could impact your webpage.

All of the above rely on HTTP requests to one or more hosts, and the browser executing the HTTP responses properly. If we were to analyze the loading of the webpage we would see all these requests being issued, answered, and executed – and some of them will have a major impact while others might be very limited. So how do we go about monitoring this rich ecosystem?

It is obviously important to monitor your webpage from an actual browser to get a clear picture of the availability, speed, and integrity of a webpage. This will help you answer questions like:

  • How long did it take to download the page?
  • How long it takes for the webpage to start rendering? This metric is important as it affects the end user perception of how fast your page loads.
  • How long it took the document to complete? Another very important metric, as it maps to what most users think as page-finished-loading, and most importantly, page interactivity could be dependent on this event firing.

Through such monitoring you can also understand JavaScript Blocking, impact of 3rd party ads (with their never ending redirects), who is setting cookies, and what hosts are being requested. Getting the raw data of web pages will allow you to apply Web Performance Optimization tips that have been described over and over by various organizations and groups, including projects like Google PageSpeed, and web performance evangelists like Steve Souders.

However, while monitoring the webpage on a browser is important, it will not be sufficient. The main reason is that the complexity of the webpage, with its different hosts or requests, brings complexity to troubleshooting. If you rely on third-party vendors and partners then you will want to monitor them independently of each other – to avoid finger-pointing between them.

Therefore we recommend that you not only monitor the webpage itself in a browser, but also individual requests and hosts that have an impact on your webpage. Thus we suggest the following three-step process to identify what to monitor:

  • Identify which hosts and requests have an impact on availability, speed, and integrity. Obviously the primary URL has the biggest impact on your webpage; if it is slow by one second – your webpage will be slow by one second. If the URL delivers a 500 error, the webpage is unavailable, and so on. Other hosts or requests that have an impact are:
    • CDN
    • Any inline or external JavaScript requests. These can have the biggest impact on your user experience
    • Any requests that deliver key content or functionality to the webpage. For example if your webpage is delivering Stock Market prices via Ajax and JSON calls, any failures of those requests would result in not displaying the data.
  • Identify any sources of impact on these requests. Sometimes these additional hosts or requests rely on sources not visible to end users, or the browser. For example if you are relying on a CDN, you might have configured the CDN to rely on an Origin Server – that would be a server you own and maintain that is the original source of the content the CDN caches and serves from the edge. A misconfiguration of the Origin Server can have a major impact on CDN performance! Also any slowness or availability issue with the origin server could result in performance and availability issues affecting the CDN requests.
  • Monitor each identified host, request, and most importantly the webpage itself. The key here is to avoid duplication of monitoring. So if you have asset1.site.com and asset2.site.com both pointing to the same CDN and their DNS configured on the same DNS provider – you need to just monitor one of them.
  • If you have multiple datacenter all serving www.xyz.com make sure you monitor each one of them too.

Up to now we have described why to monitor, where to monitor, and next we describe what to monitor for each of the requests.

Obviously you have to monitor the HTTP requests, and the webpage or webpage-like content (widget or Ad) performance on the browser.

Additionally we recommend you monitor your DNS Servers or DNS Providers. DNS is often forgotten by individuals, but it is the one thing that can make a huge difference to users from different geographic locations – or to the availability of the requests for your webpage. It is best that you monitor the DNS Servers directly (this is why we have a separate DNS monitoring solution), versus relying on the browser/http monitors. The main reason is that DNS can be resolved by one of multiple servers. If you have two DNS servers resolving your domain and one has a response time of 100ms while the other is 500ms, there would be 1 in 2 chance you see DNS at 500ms – and because of DNS TTL and caching you might have an even harder time discovering DNS performance problems.

Summary:

Monitor not only the webpage but also the key hosts and requests that impact the performance of your webpage. Don’t be limited to HTTP monitoring; expand to DNS monitoring for the key domains to ensure speed and availability. Don’t just monitor to keep baselines, observe where you can act and save some milliseconds or bytes; reducing delay by 100 milliseconds every release can be very empowering. You are making end users happy and making your company money.

I apologize for the length of the post, but the topic can not deserve less…

Mehdi – Catchpoint