A sudden performance crisis just before the holidays: a nightmare scenario for any digital platform. The week before Christmas 2024, MeasureWorks received an urgent call from a client. A recent deployment had caused severe delays in their API, resulting in about 160,000 interactions per day getting stuck. Users had to wait up to two minutes for a page that would normally load immediately, or received no response at all. The customer had been searching for a solution for weeks, without success. It was time for MeasureWorks to step in.
Delay without insight
This platform runs on a complex architecture in which APIs retrieve data and display it on external websites through an iFrame framework. When this process slows down, not only their own site, but dozens of other platforms get into trouble.
Because this client was not using an observability platform, we could not perform a thorough analysis. The challenge? Deployment was causing unpredictable delays with no apparent cause. This meant we had to dive deep into the infrastructure to figure out the problem.
Quick detection with Kubernetes Monitoring
Fortunately, the application was running within a Kubernetes cluster, a flexible, scalable environment for containerized applications. Using Dynatrace’s latest Kubernetes monitoring technology, we fully mapped the cluster in just a few clicks.
By using:
✔ Automatic monitoring of the Kubernetes control plane, deployments and pods.
✔ Full Stack Observability at the infrastructure and application level.
✔ Distributed Tracing to track the exact transaction flow.
Within hours, we had a detailed picture of the problem area.
From problem to solution in 48 hours
The client submitted their request on Tuesday evening. Wednesday morning at 8:30 a.m. the kick-off took place with their administrators and engineers. The application landscape consisted of a Kubernetes cluster with front-end and back-end workloads and various technologies and databases, including:
- Cloudflare
- HAProxy
- Cassandra databases
- MySQL databases
Our objective at MeasureWorks was to coordinate the business and engineers and fully onboard Dynatrace on the same day. Thanks to the new Cloud Native Full Stack injection deployment strategy, this succeeded without a hitch. Under MeasureWorks’ guidance, the engineers met the requirements we had set up at the time.
The Kubernetes cluster was fully up and running after three hours, and all technologies and databases were easily integrated via extensions, so we finished setup by the end of the day.
On Thursday, we began troubleshooting. Thanks to Dynatrace’s Distributed Tracing technology and the expertise of the MeasureWorks staff, the cause could be found within hours: an over-engineered MySQL database architecture! After MeasureWorks analyzed the Distributed Tracing function, it became clear that some SQL commits were taking an extremely long time. As a solution, a new MySQL database architecture was rolled out Thursday night.
On Friday morning, MeasureWorks performed a comprehensive analysis and concluded that the problem was completely resolved.
The Power of Observability
Where the client’s team spent two weeks unsuccessfully searching for a solution, MeasureWorks was able to identify and solve the problem in two days. Thanks to Dynatrace’s observability and our expertise, the client’s team was able to go into the holidays without worry.
At MeasureWorks, we believe that monitoring goes beyond reactive resolution: prevention is better than cure. By deploying observability strategically, we help companies make their digital environment more robust and predictable. Want more control over your application performance? Get in touch with us!