Microsoft Office 365 has an upper hand over other tools. The Office 365 ecosystem enables users to easily collaborate while working remotely. Users have a range of solutions to choose from and Office 365 offers an additional advantage – easy integration between these tools. Microsoft Office 365 has over 200 million monthly active users. The COVID-19 pandemic has only increased the number of users relying on Office 365, Microsoft Teams handles 115 million daily active users, a huge increase from 75 million in April this year.

MS Teams helps individuals collaborate over chat, media, video conference, screenshare, file sharing, and built-in integrations to drive productivity. Certain features, such as virtual meetings, are critical for businesses and are used heavily by employees on daily basis.

The pandemic situation has made MS Teams indispensable for hosting business meetings, customer trainings, conferences, and more. It has become important to ensure a good employee experience for the remote workforce. To help businesses monitor the performance of business-critical tools such as MS Teams, we have built a new custom monitor. The custom monitor tracks the health of MS Teams VoIP calls and notifies when performance degradation is detected.

Vital VoIP Performance Metrics

Before we dig deeper into the setup and inner workings of this custom monitor, we need to have a clear understanding of the core performance metrics required to gain visibility into VoIP call performance. These metrics should allow us to evaluate quality of a VoIP call and should answer questions such as:

  • Are we able to join the meetings/call?
  • Are both parties able to communicate over the call?
  • How much of disturbance/noise is experienced over the call?

With these questions in mind, we have selected three vital metrics for measuring VoIP call quality – Round Trip Time, Jitter, and Packet Loss.

Round Trip Time (RTT)

Time taken for a packet to travel from the source to its destination and back is called the RTT and is reported in milliseconds. We need to ensure that the RTT has a lower value and is stable across the audio session.

Fig 1. Round Trip Time (RTT) calculation


Jitter is calculated based on the delay between packets that were expected to be delivered at a particular time. In an audio session, jitter is vital, even though the packets are delivered the end-user experience is impacted. When there is a high level of jitter and packets are misplaced the receiver will not be able to understand the message. Jitter is reported in milliseconds.

Fig 2. Jitter due to network congestion

Packet Loss

Packet loss occurs when one or more data packets traveling across a network fails to reach the destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion. Packet loss is measured as number of packets lost.

Fig 3. Packet loss

Understand the Custom Monitor Implementation

MS Teams VoIP custom monitor relies on three main components – initiate an audio session, accept and participate, measure the quality of the session. Let’s look at each one of these components and understand the technical side.

Initiate an audio session

We rely on MS Teams’ calling bot to participate in an audio session. The bot is triggered using Microsoft Teams web version in a Chrome browser. We use Google Puppeteer to simulate the whole user journey – from logging into the web version of Microsoft Teams and initiating an audio session. This enables us to replicate an end user actions and initiate an audio session over webRTC. webRTC is an open-source project for real-time audio, video, data communications in web and native apps. Microsoft Teams web client and native app rely on this WebRTC for audio and video communications.

Answer and participate in the audio session

To auto-accept calls from the Catchpoint script, we use a bot on Azure, Linux environment with Node 12 LTS. To handle all the Incoming calls the apps use Microsoft Graphs Communications API. This bot will ensure that the calls initiated from the Catchpoint script are answered and will enable us to capture audio session metrics to analyze the call quality.

Measure the quality of an audio session

Once the audio session is established, we rely on chrome://webrtc-internals for collecting audio session performance information. This allows us to collect metrics about ongoing WebRTC sessions, like round trip time, packet loss, and Jitter.

Execution Flow for Custom Monitor

Fig 4. Custom Monitor Architecture
  1. The Catchpoint Portal initiates the custom script on an enterprise node. It passes script related details to the node for execution, for example – script file name.
  2. The Catchpoint Linux node is where the custom script is placed. All related dependencies are installed in advance.
  3. A NodeJS Script is invoked to use Google Puppeteer and it initiates a Microsoft Teams call. This call is auto accepted by a bot hosted on Azure.
  4. The script also launches chrome://webrtcinternals in a new chrome tab that holds metrics of the ongoing WebRTC sessions. These metrics are reported back to Catchpoint where they are captured with insight feature and plotted in various charts.
  5. The Azure Nodejs App uses Microsoft’s Bot Builder framework for constructing bots. To auto-accept calls from the Bot, the app uses Microsoft graph APIs.

Once the setup is complete, a comprehensive dashboard can help to visualize the data collected by the custom monitor. In the screenshot below, the top-right displays three vital metrics showing the average value for each. And others highlight each metric over time so we can quickly see if there are any spikes or dips. Other than the three metrics discussed above, we also collect total bytes sent and received and total test time.

Fig 5. Catchpoint Dashboard for Microsoft Teams Monitoring


There have been amazing advances in VoIP that have improved audio quality over the internet. Even with all advances, packet loss remains a major concern impacting performance. High packet loss decreases the quality of the audio session and interferes with the communication.

Jitter is another culprit, the packets are delivered but not in the expected time and order. This can be very annoying to the end users as it makes it distorts the audio and makes it difficult to understand.

Round trip time (RTT) is important when measuring real-time communications. For high audio quality, the entire audio session needs to have a low and stable RTT time.

The custom monitor for Microsoft Teams is a great way to evaluate end-user experience and measure the quality of audio sessions. It helps to understand and analyze performance degradation and trigger alerts when there is a change in end-user experience.