Consider this scenario, your internal application and system health is in green, but your end users are reporting accessibility/availability issues. Does such a monitoring strategy serve an actual purpose?

A well-built APM and infrastructure monitoring solution like Dynatrace is great for internal system health. But if you are monitoring end-user experience exclusively from the cloud and focusing on web browser simulation using real browsers, then you are essentially implementing a monitoring strategy that lacks end-to-end visibility. There is only a 50-50 chance that the monitoring solution will help you detect, diagnose, and resolve real issues. In this blog, we focus on the importance of monitoring from the end user’s perspective comprehensively and how Catchpoint compares to monitoring tools such as Dynatrace.

In 2020, the monitoring needs of an enterprise are much more advanced and complicated than what it was 10 years ago. The technology landscape has changed over the decade. New technologies, practices, and methodologies were introduced bringing in significant changes to how software is built and delivered. Let’s look at a few key milestones: 

DevOps:  The first conference for DevOps was held in 2009 and the first-ever “State of DevOps” report was launched in 2012. The adoption of DevOps has taken off since then. 

Microservices: After Devops was introduced and started gaining momentum, microservices gained popularity in a service-oriented architecture to support continuous deployment.  

Single Page Applications: The adoption of Single Page Applications picked up with the arrival of Angular and React frameworks sometime between 2013-2015. The goal was to offer faster transitions when the user’s interacted with the application so as to ensure a great end-user experience.

Serverless Computing: Cloud Computing took off with Amazon announcing EC2 in 2006, followed by Google’s App Engine (2008), Microsoft Azure (2010), and IBM SmartCloud (2011). However, the adoption of Cloud Computing really accelerated when Amazon launched AWS Lambda in 2014. As usual, the rest followed suit and further accelerated the cloud migration journey. 

Multi and Hybrid Cloud: Early 2015 saw enterprises around the world adopt a multi-cloud or hybrid cloud model as part of their push to make the best of available technologies. 

Content Delivery Networks: At the start of this decade CDNs were mostly used for delivering content with high availability and performance. However, the role of CDN has transformed drastically along with the content, technology, and methodology trends. They are now responsible for your: 

  • DNS management including DNSSEC.
  • Load balancing. 
  • Image optimization and management. 
  • Video optimization and management.
  • Serverless computing – now the developers can deploy directly to the edge.
  • Web operation – Redirects, Visitor Prioritization, Input validation to avoid abuse.
  • API gateways. 
  • IoT updates. 
  • WAF management. 
  • DDOS defense. 

With all these changes, if there is one thing that has increased drastically then it is the ratio of Unknown Unknowns. 

Monitoring the Knowns and the Unknowns

The Rumsfeld concept is often applied to the monitoring industry and practice. Monitoring is a combination of Knowns and Unknowns. 

A picture containing diagram

Description automatically generated

The first step in defining a good monitoring strategy for any organization starts with grouping the following 4-technology landscapes into the four combinations above: 

  • Application 
  • Technology 
  • Tools 
  • Vendors 

This helps an organization understand how much visibility and control they have over each of these and in turn helps them plan, prepare, and respond if and when things go wrong. 

These four concepts are grouped into two types of monitoring: 

  1. Whitebox Monitoring
  2. Blackbox Monitoring  

Whitebox monitoring is where you have control of the systems and the applications, you have visibility and you know what to expect. On the other hand, with Blackbox monitoring, you don’t have control over the systems, you lack visibility and you don’t know what to expect. 

Now, when we recall all the changes that have occurred over the last decade, one thing that becomes more and more evident is that the number of Unknown Knowns and Unknown Unknowns has increased significantly creating a lot of blind spots for the organization. And this has amplified the importance of Black Box monitoring more than ever.

The Dynatrace Platform

Dynatrace was probably one of the first monitoring vendors to consolidate both these approaches: 

  • Whitebox monitoring– Application & Infrastructure
  • Blackbox monitoring – Synthetics & Real User Monitoring (RUM). 

Dynatrace is probably one of the best Whitebox monitoring solutions provider available in the market today. They do a pretty good job with Application Performance and Infrastructure Monitoring. After all, Compuware acquired them for these capabilities. 

However, their Digital Experience offering (Blackbox) that includes Synthetics and Real User Monitoring (the diluted consolidation of  Gomez and Keynote) does not really qualify for a true End User or Blackbox Monitoring offering, especially considering the changes and challenges we discussed earlier in this article.

I say this with complete responsibility, and I will elaborate further below. But before that, let’s understand the two goals of monitoring: 

With these two goals in mind, let’s take a look at a typical architecture of an organization and the common failures that can impact an end user. The image below illustrates some high-level issues that can impact the end user. 

All these problems that impact an end user can be grouped under the four pillars of monitoring: 

Graphical user interface, application

Description automatically generated

A good End User or Blackbox monitoring solution is one that can cater to the two goals of monitoring and all the four pillars of end-user experience monitoring. 

A picture containing website

Description automatically generated

A good Blackbox monitoring solution should have: 

1. Monitoring Nodes that helps monitor from where the end users are, that is

  • Last Mile Providers 
  • Backbone of the internet
  • Actual Wireless Nodes 3G, 4G 

2. Test types that simulate all applications and services from an end users perspective.

  • Web
  • Transactions 
  • SSL 
  • DNS 
  • API with API transaction monitoring capabilities 
  • BGP
  • Traceroute 
  • TCP 
  • SMTP 
  • WebSocket’s 
  • All the critical protocols 

3. Capture standard and custom metrics

  • Metrics critical to monitor all CDNs, Cloud, Proxy 
  • Metrics critical for different types of application frameworks 

4. Comprehensive alerting capabilities

  • Ability to alert on all standard as well as Custom Metrics, crucial for quickly detecting and restoring services. 

5. Data Science/Analytics plays a crucial role in continuously optimizing and maintain good performance.

  • Access to raw data that can be sliced and diced 
  • Ability to look at historical data by every metric and dimension 

In addition to these basic capabilities, there are other bells and whistles such as Dashboards, API, etc. that are a critical part of every monitoring solution. However, the five capabilities we discussed above are must-haves.

Dynatrace offers only: 

  1. Cloud nodes:  
    • End Users do not originate from the cloud. 
    • Reachability issues cannot be detected. 
    • Most DNS, Load Balancer, CDN, or Cloud issues will not be detected.
  2. Test Types: 
    • Web and transactions only. 
  3. Standard Metrics only. 
  4. No custom metrics for CDN monitoring.
  5. Alerting is limited to specific metrics which doesn’t help with DNS, CDN, or SSL use cases. 
  6. Lacks advanced slicing and dicing capabilities. Not ideal for the second goal of monitoring – gradual improvement or optimization of services. 

The limitations in these six critical areas make it almost impossible to detect some of the modern challenges that can impact the end-user experience.  

If the goal is to detect an application problem alone, then the need for an end-user/Blackbox monitoring does not arise. Catchpoint works with a number of large enterprises who were previously relying on Dynatrace for end to end monitoring. Because these enterprises were monitoring only to detect issues, it was not long before an Unknown Unknown created a major performance issue that went undetected till it was reported by the end users and ended up impacting revenue and customer satisfaction.

There were also instances when RUM alerted customers of an issue that was impacting the end users but it was not possible to diagnose or troubleshoot effectively as they were unable to provide insights relevant to the performance of the network, ISP, DNS, CDN, Cloud, third-party, etc. 

The end-user/Blackbox monitoring is used to detect anything that could impact an end user. All the Unknown Unknowns – last mile ISPs, the internet backbone, CDN edge servers, CDN config, DNS, Load Balancer, Hybrid cloud, third-party, or the application itself.  

Catchpoint offers: 

  • 900+ nodes spread across the world on last mile, backbone, wireless etc. 
  • 20+ synthetic monitoring or test types. 
  • 50+ standard metrics.
  • Any number of custom metrics.
  • Customized CDN monitoring for every single CDN used by an enterprise. 
  • Ability to store granular data for 3+ years.
  • Ability to alert on every single metric. 

Delivering a great end-user experience starts with proactively detecting any problem before it impacts the end user. So it is imperative to invest in a true end-user monitoring solution. 

The Role of Monitoring Mindsets

When we analyzed every customer that switched from Dynatrace (APM + Synthetics) to Catchpoint Synthetics + Dynatrace APM, one thing that stood out was – these were mostly organizations transitioning from an application/developer first mindset to an end user/customer first mindset. 

This transition typically happens when stakeholders push organizations to move from a  “fixed” to “growth” mindset. I have discussed the different monitoring mindsets elaborately in this article

It is important to ensure your application and the systems involved are working as expected for your developers, operations team, and network engineers within your organization. 

But it is even more important to ensure that your application and the systems involved are Reachable, Available, Performant, and Reliable from your customers, the actual users of the application.