My head is still spinning with all of the great content, amazing speakers, new tools and technologies that were covered at Monitorama 2016 and Velocity Santa Clara; however, there are still many challenges to overcome.
To get the most out of any experience, sometimes you have to take a step back, get back down to earth, wash away all the spin and hype, and condense the information into something that is actionable and applicable to your business.
Since 1997—when I was given the responsibility of performance monitoring at DoubleClick—I have been chasing the dream of finding the single tool, the single pane, the single screen that will show me in a very simple way my entire infrastructure (which at the time consisted of 17 datacenters, 5000 servers, 2000 network devices, databases, storage arrays, ETL, 90 different applications in 7 different languages, 68+ internet uplinks, etc.); unfortunately, that dream never materialized during my tenure.
After realizing that such a tool did not exist at the time, we wound up using a bunch of tools to try to solve the same problem, which was fine.
And all of this fed into SMARTS, a really cool event correlation solution.
In my 10 years of being in that role, I spent so much money in software licenses and FTE costs regarding monitoring, it’s not even funny.
Perhaps one day we will get there; however what we do is complex, what we monitor is complex, what we want to know is not simple and is getting more complex every year, what we ask of monitoring tools is not easy. And the complexity of all of these things is increasing, not decreasing.
What monitoring tools (both commercial and open source) have to stop doing is selling a pipe dream. That Swiss army knife—the one tool that can do everything, and can do it the best—does not exist in the monitoring world. IT professionals must stop chasing that pipe dream, as well. I almost lost my job one year after spending $3 million on one of those tools, without including the implementation, which at the time was a cost of $3 for consultants for every $1 spent on software; and a year later, I was left with nothing to show besides a bunch of consultants taking up space.
One of the best themes from this year’s Monitorama was around the “human” factor. Placing a premium on the people that consume the monitoring data and implement monitoring tools. We need to do a better job in certain areas.
These key takeaways include:
One interesting trend in IT monitoring is the emergence of a “size contest.” People are so proud to be collecting millions of metrics per second and monitoring databases that require petabytes.
When I built our own agentless APM solution at DoubleClick in 2000, we started collecting 500,000 metrics per hour and I remember going back to the entire team asking them to find a way to cut it down. I thought it was insane—how useful could that much data be, and for how long should it be kept? Everyone wanted to store the data for three to five years, leaving me to justify a 1 PTb storage system. The point here is that ROI matters! It’s not about the biggest monitoring systems, but the most efficient and cost effective system. You cannot have a monitoring system so large and complex that it requires its own monitoring system.
A monitoring tool—or tools—must be fast, reliable, easy to deploy, and easy to repair. They also need to be simple and inexpensive to run; the cost of buying or building tends to be at the forefront of our minds, but keep in mind there is also a cost of running. One company mentioned that their AWS storage cost for just the monitoring data was seven figures a year!
During one of the sessions at Monitorama, Pinterest described their monitoring system evolution and it was incredible. The system became just as complex to run and monitor than the actual applications being monitored. When a monitoring systems requires front end and back end load balancers, that is when it’s time to stop and ask yourself if it’s worth it.
I really enjoyed Monitorama 2016; it was my first time there, but certainly not my last. I strongly encourage everyone to attend next year.