When we started Catchpoint ten years ago, the cloud was just emerging. Google and Amazon were using their own cloud infrastructure. 2008 was all about Virtualization, the first step towards abstracting the physical layer. Yet, there wasn’t a lot of discussion of how all of this change was impacting the user experience or monitoring. People didn’t care. Systems were just available, and there was plenty to go around. We started Catchpoint because we believed that IT monitoring was going to focus more and more on the end user experience.
Our goals and mission were clear: measure and monitor availability and performance to:
• Ensure the best quality of experience
• Reduce troubleshooting time and costs
• Protect revenue and brands
• Analyze bottom-line impact
We’ve seen a lot of infrastructure changes in the past ten years. Cloud is now mainstream. Newer technologies are emerging, including containers, and serverless applications. Tech leaders are trying to make it simpler to deliver applications and software to end users.
Even with all of this infrastructure evolution, one thing has remained the same – the importance of understanding how infrastructures, networks, and internet related subsystems impact end user experience.
Today, the customer experience very often defines the product. It’s at the forefront of both the customer and business provider’s minds. Experience includes not only the hardware and the software but also the services that support a product or solution. And, end user can mean many things, with experiences being delivered to many devices, from a computer or a tablet to a thermostat.
Bad experiences are now amplified via social media networks introducing new challenges for companies when users have a bad experience.
In the last ten years, we’ve learned you need to make sure that user experience is at the center of everything you do.
Synthetic Monitoring is Not Dead Yet
We’ve learned that RUM and other monitoring technologies can’t replace synthetic monitoring’s robust ability to predict outages before they impact real users. Synthetic monitoring allows companies to proactively combat outages and service interruptions before their 10,000 customers call to say they can’t bank online or complete a business transaction.
Many companies, including Catchpoint, continue to innovate, enriching synthetic’s capabilities. Now, companies may monitor user experiences around the globe, with synthetic’s reach ever expanding. Plus, we’re expanding a company’s breadth and scope of apps and components it may monitor, as not all user experiences are with a web page.
Besides the rich data synthetic monitoring provides, products like Catchpoint offer advanced data analysis, alerting capabilities, and integrations with other solutions within the IT ecosystem. Between strong data and robust toolsets, synthetic isn’t going anywhere.
Real User Monitoring is in the Early Stages
However, we do need to address some challenges.
RUM’s biggest challenge is with noise—how do we decrease RUM’s signal-to-noise ratio? Leaders in IT telemetry will tell you, a signal is actionable, but noise is not. Unfortunately, RUM collects data on everything, which creates a lot of noise.
Our promise to our customers is that we are going to keep innovating to reduce noise and make RUM more actionable.
SaaS is Eating the World
SaaS adoption is penetrating the enterprise–we’re living more and more in a software world. Previously, a business’s biggest digital experience concern was with its website. Now, digital experience concerns include internal customers, from every department. For example, ten years ago, Microsoft Exchange or a CRM was hosted on premises. Today, there’s Office365 and Salesforce.com. The latest Microsoft earnings call claimed 60 million organizations are using Office 365 which is 1.2 billion people. Are those 1.2 billion people having a good experience?
Organizations are wanting to monitor the user experience no matter if the user is internal or external, whether dealing with internal or external systems, company owned or SaaS. The look inward to how employees are experiencing SaaS application is going to fuel monitoring’s next generation of innovation.
SRE and DevOps Practices are Driving Automation
Site reliability engineers (SREs) and DevOps engineers have been busy automating tasks. For example, tools like OpsGenie and PagerDuty offer streamlined alerting for quicker MTTRs. But, there’s still room for further automation. For example, progress in monitoring has meant an increase in telemetry data, fueling the need for further automation.
Some of the leading technology companies like Google, Facebook, Apple have automated everything as they own everything in their ecosystems. But not all companies can build their own hardware, software, data center management, and their IT monitoring and management systems. With the shortage of human resources, and the emergence of AI we are going to see a lot of organizations push for automation.
AI Ops is a growing field. Companies are aggregating multiple signals from many sensors and monitoring tools, creating huge headaches around not only the amount of data but how to correlate the various signals. IT teams need to still look for that needle in more haystacks and denser one. AI Ops looks at how to deduplicate the data in order to make the signals stronger, ushering in an era where event correlation is going to have a renaissance moment thanks to more powerful systems and machine learning. We cannot have event correlations based on rule books from the late 1990s and early 2000s.
What to do with all the signals?
The next step in monitoring is contextualizing. Companies are finding that you need to do more than gather data, cross your fingers, and hope that someone catches something important.
Companies are starting to look at telemetry through the lens of business KPIs and metrics. There is nothing more powerful than being able to understand the correlation between the technology and business telemetry. To achieve this, businesses must start by monitoring things in ways that answer business driven questions.
I recently saw this strategy in action. A customer built a dashboard to answer the question, “Can people buy products online?” The infrastructure team was addressing business questions. The team tied business metrics, synthetic, RUM, and system data into a single dashboard to give them a picture of the customer’s experience.
The future is not a technology dashboard or a business dashboard, it’s a single view that can tell multiple stakeholders (business, tech, sre…) there is a problem, real users are impacted, and this is the $ amount.
Going back 20 years ago to 1998, I walked into our NOC at DoubleClick. Our monitoring tool was throwing an error. When presented with the data, I was asked: “So what?, How many customers are impacted?” We were unable to tell who was impacted, if it was just the monitoring system throwing the error, or if there was a real business impact.
IT is here to support the business. The telemetry we collect needs to support the business and answer business questions.