Monitoring of product quality

Monitoring is very important, especially in DevOps. We not only monitor the system with all his characteristics, we also monitor team behavior to support improvements.

Areas of monitoring are:

  • Team telemetry
    Understand the maturity of the team and how the team can improve. One could measure and report many different aspects (KPIs) per team.
  • Live site telemetry
    Measure how the system runs, and the platform behaves and log management. The different platforms offer capabilities to capture these metrics. Some platforms also offer the needed reporting capabilities and notification mechanisms. When using a multi or hybrid cloud implementation it is recommenced to assess the used tools if the data can be aggregated and reported as a single view.
  • Cognitive monitoring
    Automate and improve IT operations by applying machine learning to the log data. A next step is the optimization of operations of systems and the ability to scale. Adopting artificial intelligence for IT operations can cover two aspects. The first is analyzing the telemetry data to understand the default behavior of the system and to be notified of anomalies. The other aspect is the proactive interaction between the AI model and the system regarding behavior predictions.
  • Security monitoring
    Security monitoring involves collecting and analyzing information to detect suspicious behavior or unauthorized system changes on your network, defining which types of behavior should trigger alerts, and acting on alerts as needed. Often, commercial state-of-the-art tools to measure, report and notify are used in DevOps.
  • User telemetry
    User sentiment and behavior in DevOps are the most important informative aspects of success. Measuring user interaction is very important and often forgotten.

A recently introduced term related to monitoring is observability [Bangser 2019]. Observability is all about asking new questions while monitoring is about cementing in existing understanding.

Observability is in essence the testability of post-release testing by giving us the tools to be more creative and productive with our exploration. And why should we care about post-release testing? Well, first of all, we release to many environments, so this could mean better support in development and staging environments. But, of course, it also means empowering our ability to understand production. It has been said that we all test in production, just some of us listen to the outputs. This really hits home how there are always interesting new behaviors uncovered by real users under real load, and we all should be listening for them.