To ensure applications meet customer expectations, organizations deploy numerous solutions to monitor, alert, analyze, and share information. Finding the best solution for the job often results in different teams using different tools, causing confusion over what tools are best or where to look first when trying to resolve an incident. For this reason, we’re introducing this blog series to discuss the related solutions in the monitoring landscape. Each blog will cover a description of the category, vendors, common features of the tools, and relevant Catchpoint integrations. Categories to be covered include:
- Application Performance Monitoring
- Alerting & Notification
- IT Operations & Analytics
- Configuration & Release
To kick things off, we’ll start with alerting and notification solutions. The ultimate goal of the monitoring industry is to help ensure the quality of service of systems and applications. A vital part of achieving that goal is a robust alerting system that is capable of informing you of the widest possible variety of conditions including:
- Availability of service and content
- Delivery speed and user experience
- Page size and number of requests
- Content validation
- Thresholds applied to custom data sources
In many instances, an alert isn’t generated by a single tool, but rather multiple solutions generate alerts regarding the same incident. A recent DEJ study found that 41% of organizations are using 10 or more monitoring tools. Having multiple monitoring tools that are generating alerts creates additional noise that can slow down resolution time.
Each tool has its own notification methods and preferences. Having to maintain alert notification and preferences across multiple tools can lead to administrative challenges and has introduced unnecessary failure points. What happens when an email address isn’t updated in one of the many monitoring tools?
The tools are operating in different silos across the organization. If three different individuals on three different teams receive an alert from their tool of choice, they aren’t seeing the complete picture and are potentially missing context that can help resolve the incident faster. Or, if the same individual receives alerts from more than 10 systems when an incident occurs, that can quickly result in alert fatigue as well.
Being able to collect and correlate alerts in a single place, route them to the correct teams, and escalate when necessary can help identify and resolve problems faster. This is where alerting and notification solutions come in.
Alerting and notification solutions automate large portions of the incident resolution process to help organizations resolve incidents faster and reduce administrative overhead. These systems consume alerts from a wide variety of monitoring tools for consolidation, correlation, and enrichment.
Alerts from disparate systems are gathered and related alerts are correlated. Events can be further enriched by adding insights and context. All of this information is then routed to key stakeholders based on their defined communication preferences and the on-call schedule. Instead of sifting through multiple alerts, a single alert is delivered to the on-call team with all relevant information to quickly identify and resolve the issue.
Alerts generated by Catchpoint are one piece of the puzzle when an incident occurs. Correlating and enhancing these events with other monitoring alerts can help cut through the noise. Matching a series of alerts with known patterns and kicking off appropriate workflows and notifications pre-empts problems before customers are affected.
The Alert Webhook from Catchpoint can push data to an alerting & notification system when a test triggers an alert enabling customers to take advantage of the correlation, notification management, on-call scheduling and escalation functionality. Any tool supporting Webhooks or providing a URL to POST data can be used. Alert Webhook templates can be customized to fit a tool’s format and content-type using Macros. Catchpoint customers can get more details on configuring alert webhooks from the Alert Webhook Guide.
A sampling of alerting and notification vendors that Catchpoint integrates with:
BigPanda aggregates and correlates IT alerts to create high-level IT incidents using clustering algorithms to aggregate and normalize data across multiple sources eliminating noise.
OpsGenie offers an Incident Response Orchestation Platform that enables organizations to consolidate IT alerts, send notifications via multiple channels depending on characteristics of the alert, and define on-call schedules and escalation procedures.
PagerDuty provides advanced analytics & visibility via their Digital Operations Management Platform. Automatically trigger workflows with dynamic routing, response automation, and on-call management.
ServiceNow is a cloud-based platform that contextualizes workflows using a single data model. The ServiceNow platform uses machine learning and automated actions to route alerts to the appropriate teams with associated risks.
VictorOps focuses on incident management software built for DevOps. By streamlining the incident management process and aligning teams to collaborate incidents can be rapidly resolved.
Integration guides for the alert and notification vendors listed above can be found here: