Navigation

Relying on Web Performance Monitoring to Discover Release Problems

In the 1990s the websites were quite simple, served by a single server talking to a single database, JavaScript and Flash had just been introduced, AJAX was being developed, and HTTP 1.0 protocol was prevalent across the World Wide Web. Now, years later and that same webpage has turned into a complicated “web” of services, servers, and applications all working together to serve content to the end-user.

Most websites rely on 2+ servers and services just to get the content of the base URL! Once the base URL is loaded, its HTML has calls to even more internal and third party services like adservers, CDNs, Content Personalization, Page Optimization, Tracking Pixels, Widgets, etc. The smallest mistake from any of these services, internal or external, and the end user pays the price of bad experience and frustration. The bad news for the company is that unlike in the 90s, when a user might not have a choice to get the content elsewhere, today that same user can go to one of the 100s of competitors out there – in a blink of an eye.

Therefore optimizing the webpages and services for faster website performance and better fallback in case of failure, has become very important, however, it is not enough. Continuous performance monitoring of all the services involved in delivering you website has become a must for all companies. Any un-expected performance degradations needs to be analyzed carefully and action taken before there is any impact to business.

Case Study: New Website Release Impacts Web Performance for IE Users

We recently observed a major performance degradation with a very popular website in US, which we were monitoring. The website performed a release on the night of March 22nd, during which time it was down for about 2 hours. The day after the release the performance of the webpage slowed down by 100% – going from 4.5 seconds to 9 seconds.

Website Performance Monitoring Data

Response for the Base URL and the Webpage (Hourly Average)

Not only the response for entire webpage doubled, but also the base URL response slowed down by 80%. Looking at the requests and connections the webpage made, there was a jump in the number of connections, however no increase in number of the items loaded on the page.

Website Performance - Number of HTTP Connections

HTTP Connections and Hosts (Hourly Average)

Website Performance - Number of Items Requested

Number of Items Requested (Hourly Average)

This was a clear sign that the hosts on the webpage were closing connections on every request. We also confirmed the cause by looking at the waterfall charts – which showed 11 requests (including base URL) utilized HTTP 1.0 and resulted in 11 different connections.

Domain Statistics in Waterfall Chart

Number of Requests and Connections by Host

The issue is also clear from the http headers of the request and of the response we can clearly see that the site is utilizing HTTP 1.0 and closing the connection with the “Connection: close” HTTP header:

GET /layout/css/style-8194-1234.css?v=1234 HTTP/1.1
Accept: */*
Referer: https://www.SomeSite.com/
Accept-Language: en-us
User-Agent: Mozilla/5.0 (Windows; MSIE 9.0; Windows NT 6.1;
Trident/5.0; BOIE9;ENUS)
UA-CPU: x86
Accept-Encoding: gzip, deflate
Host: www.SomeSite.com
Connection: Keep-Alive
Cookie: VisitorId=002.......

HTTP/1.0 200 OK
Date: Fri, 25 Mar 2011 14:26:18 GMT
Server: Apache-Coyote/1.1
Last-Modified: Fri, 25 Mar 2011 08:13:57 GMT
Content-Type: text/css
Vary: Accept-Encoding
Content-Encoding: gzip
Expires: Thu, 15 Apr 2020 20:00:00 GMT
Cache-Control: private
Connection: close

The use of “Connection: close” had a bigger impact on the website performance, because the site was utilizing HTTPS. As a result on every HTTP request the browser not only had to open a TCP connection, but also had to establish an SSL handshake.

The other interesting fact we noticed was that the problem occurred only on Catchpoint’s Internet Explorer agent, but not in the other agents we were testing from! The same requests were made by all agents, however for IE the site used HTTP 1.0 while for the other browsers HTTP 1.1 .

We repeated the test on the IE agent and modified the user agent to exclude the “MSIE” string – and voilà the server went back to using HTTP1.1 .

GET /layout/css/style-8194-1234.css?v=1234 HTTP/1.1
Accept: */*
Referer: www.SomeSite.com
Accept-Language: en-us
User-Agent: Mozilla/5.0 (Windows; Windows NT 6.1;
Trident/5.0; BOIE9;ENUS)
UA-CPU: x86
Accept-Encoding: gzip, deflate
Host: www.SomeSite.com
Connection: Keep-Alive
Cookie: VisitorId=002.......

HTTP/1.1 200 OK
Date: Fri, 25 Mar 2011 14:35:10 GMT
Server: Apache-Coyote/1.1
Last-Modified: Fri, 25 Mar 2011 05:32:33 GMT
Content-Type: text/css
Vary: Accept-Encoding
Content-Encoding: gzip
Expires: Thu, 15 Apr 2020 20:00:00 GMT
Cache-Control: private
Keep-Alive: timeout=15, max=94
Connection: Keep-Alive
Transfer-Encoding: chunked

It seems like one of the applications rolled-out on their latest release either contained a bug, miss-configuration, or unintended feature – which disable HTTP 1.1 for User Agents containing “MSIE” string. The issue was due to an old Apache configuration which forced HTTP 1.0 and No Keep Alive for browser containing “MSIE” in the user agent string.

Summary

Websites have become more and more complicated relying on multiple services, servers, application – both managed by the website owner or outsourced to other third parties. These internal an external dependencies have a direct impact on the web performance of the pages with varying impact. Monitoring the web performance of the website continuously is key to ensuring its reliability.

Update 1: The website example detailed in this blog post was utilizing HTTPS or secure connections. Eric Law from Microsoft wrote an article a day after this post detailing the impact of “Connection: Close” on the performance of HTTPS websites. The impact of “Connection: close” on HTTPS websites is higher than on HTTP websites, because the browser is required to establish the SSL handshake on every request, adding extra time over the traditional HTTP requests.

Written by

Posted on: March 25th, 2011

Catagory: Outage

Tags: