On one of our monthly check-in calls, the Director of Infrastructure and Operations at one of our largest customers, was telling me how the one big problem he was still trying to solve is optimizing their OPEX or at least make it more predictable. He cited a popular Content Delivery Network (CDN) that they spend millions with as an example. The company had launched a new service and he spoke about how the CDN cost had quadrupled. Having worked at a CDN company for 4 years, I am always enthusiastic to look into CDN-related queries. I was curious to dig deeper into the cost aspect this time.

Before I get into the problem and solution, let us understand what a Content Delivery Network does, and the high-level cost associated with CDN usage. Let’s say you are in India and trying to open a cute cat GIF your friend sent. Let’s say this GIF is stored and served from a server in the United States. You are relying on the laws of physics (speed of light) to determine how soon you get to see the cute cat. CDNs help solve this problem by adopting two concepts –

1. Bring content close to you: You can go to Costco every week, but it is more time-efficient if you buy in bulk and store it in your refrigerator. The refrigerator is equivalent to what we call a CDN edge.
2. Help fetch the content faster: Now when going to Costco, you can take the normal lane or the express lane. The express lane costs you \$\$.

CDNs make a huge share of their profits by billing you when you take the express lane. They charge you for the Edge (or in our Costco example, the refrigerator) as a part of the platform fee, but the variable and hence the unpredictable cost is based on the number of times you take the express lane.

There are nuances, of course. Milk gets stale sooner than say rice. So, you find yourself taking the express lane often to get milk. That’s definitely wise. BUT, for content that doesn’t change often the BEST thing to do is store it in the refrigerator (CDN edge) and optimize your costs.

Now that I have given you a bird’s-eye view of what a CDN does and what you mostly pay for, back to my curiosity. I asked him for the service onboarded and did a quick check using Catchpoint and found out immediately that they weren’t making use of the CDN cache (refrigerator) at all. Checking United States, Europe, Asia, Australia – nowhere were they storing content close to the user.

They were taking the express lane for ALL content from everywhere in the world.

Now, for a service requested by 10 million users daily, this means –

1. A HUGE CDN bill.
2. Takes users longer to load content == bad customer experience (faster to fetch from the fridge vs taking the express lane and going to the store.)

What was surprising and astonishing is that the Director said “WHAT! I didn’t know that at all.”

With advances in technology, CDNs are enabled with a click of a button but optimizing for cost and performance takes effort and more importantly visibility into what’s going on. You don’t know what you don’t know – “the unknown unknown.”

He immediately called his operations and CDN team and asked them to start caching the data. It was honestly a VERY easy change – click of a checkbox and an entry for how long the content had to be cached.

Almost immediately, we saw an improvement in response times. From 450 ms to 100ms. A 77% improvement in the time taken (Fig1).

This was the result we saw immediately but what was more exciting (for me) is the improvement we saw on the CDN bill for that month.

They get an average of 10 million requests every day. Each of these requests is 8Mb. They pay \$0.075 per GB as a part of their CDN contract. Since they weren’t caching and going to the CDN always, they were spending –

10,000,000 * 0.008 * 0.075 * 30 = \$180,000 for just ONE request every month.

The cache hit ratio increased to 98% after the change to cache content for 7 days. This meant, going forward, they were going to be spending JUST \$3600.

A 98% COST SAVING!

My biggest takeaway from that day was, you have too many unknowns as a part of your technology stack. Unless you start probing every component and looking beneath the hood, you are missing out on optimizing your user experience and a lot of times paying HUGE hidden costs!

Now you might think this is a corner case, happens scarcely.

My response – Have you monitored your service with Catchpoint yet?