IT systems are indispensable to society. If you are not delivering your services digitally you’re missing out. Banks, government agencies, and educational institutions are perfect examples. All of their business processes are increasingly digitized.
Previously, the supporting IT organizations comprised various silos with their own responsibilities, database, web, and network team. Each team is responsible for part of the IT infrastructure on which the process is built. Thus, the problem of shared responsibility arises, as each team focuses on its own piece of the IT infrastructure where the responsibility lies on the whole. This way of organizing an IT organization is certainly still seen in somewhat larger organizations.
Avoid silo monitoring
So, what if there is a problem? For example, impact on the responsiveness of a web application used to deliver digital services to customers. Performance degradation of digital services can cost you money. Customers may drop out, go to another provider that offers the same services or perhaps staff might complain due to bad experience with internal IT systems and you lose productivity. Because when it takes some time to find the cause of the problem, this can quickly impact the company itself.
While investigating the cause of the bad experience with an application, the different teams may not be able to find the cause of the problem, as each team looks at their own piece of the IT infrastructure. Often, monitoring tools look at a specific part of the overall IT system. For example, a server that the application is installed on. They look at whether the server is still on and whether the applications are still running. But a business process does not consist of the server alone but of a complex set of various IT components. The monitoring of a server with an application installed on it is called silo monitoring. The chances are that there are probably 5 teams that all have their own piece of responsibility in the complex chain of IT components. Independently of each other, there may be no indication that a problem has arisen in the silo itself, but in the connectivity between them. It often results in issues being noticed too late and corrective action being taken insufficiently.
An example from my own experience…
…I once worked at an organization where performance issues were reported with a Business Intelligence(BI) tool. The complaint was that reports took very long and were unworkable. Two IT teams were active in solving the problem, one responsible for the databases (DBA team) and another responsible for the servers (Systems team) containing the BI tool.
They worked on it for months and there was a lot of back and forth discussion. DBA and Systems both gave green light. In the end, it turned out that a DB query was not correctly written, which made it take a long time to retrieve specific information from the databases. Because they did not look at the whole picture and only used metrics that showed that the various IT components functioned within the margins, this problem could persist for months. As already mentioned, this form of looking at your own piece of responsibility is called silo monitoring.
Observability: tracking user behavior
Imagine leaving an external customer with poor digital performance for months. Nowadays, there is so much choice in the various online services that if your service is underperforming, you lose customers and revenue. This while there is a solution to signal that the quality of your digital service is declining or at risk of being disrupted.
The solution is to use observability tooling. What is observability, you say? In essence, it uses all available data to spot performance degradation quickly and effectively.
Standard IT component monitoring often uses metrics to measure if the system works. However, as we can see in the picture above, various components exist between the user and the App. As said, if we measure a server’s CPU utilization, for example, this is only one part of the full stack of components between the user and the App. By monitoring all components separately, you don’t have insight into a customer’s actual experience.
To get the complete picture, we need to consolidate all component monitoring and deploy an observability tool that all teams can use. Each team has a single pane of glass and the same information at its disposal, and such a tool also fits much better with DevOps adoption. For a DevOps team, it is also important to continuously deploy new code. Therefore, it is essential to detect abnormal behavior as soon as possible.
But what do we measure with an observability tool? This comes down to 3 pillars: metrics, traces and logs.
Combining all this information makes it possible to get an insight into the actual user experience and, if necessary, take action. Think of a combination of the traditional metrics from IT component monitoring complemented with app stack traces, logs and continuous measurement from the end-user perspective to get a complete overview of the actual user experience.
Data overload, break the silos
We collect a lot of data! And every year we generate even more data. How do we keep an overview in the haystack of data? The answer lies in an observability tool that analyzes all available data, creates a baseline and notifies you of relevant issues or deviations from the baseline. Most tools do this through machine learning algorithms.
Once you have your baseline set you can start breaking internal silos:
- Shift left to right: As a DevOps team, you want to deploy new code faster and more often. An observability tool allows you to detect any issue fast and resolve them fast, thus allowing you to make mistakes with confidence.
- Do more with fewer people. In these tight labour markets, it is hard to find personnel – and there is really no time to do extensive research on disruption, especially in the era of microservices and the continuously growing complex IT landscape.
- Resolve before your users complain… By using observability tooling, you will better understand the state of your digital services, and you will be able to take corrective actions faster before it becomes noticeable to a customer.
Using observability will help you bringing order to chaos and focus on what matters… Running your IT operation to deliver the best user experience possible..
Get Fast, Stay Fast!
Using such tooling and implementing and interpreting the output can be complex. The mission and vision of MeasureWorks is: Get Fast, Stay Fast. So if there is a challenge in your organization in terms of continuity or performance of applications, do not hesitate to contact us!