Observability: from a rock to a hard place: observing the micro universe

More and more companies migrate their applications towards cloud, either through a private cloud or a public cloud provider. Customer facing services that used to be based upon applications with a monolith architecture are redesigned or already built and deployed by using a microservices architecture.

Monolith Rock – one big stone
which can be used as a reference point.

Microservices Rock – a lot of small stones, used as a reference point.

With the rise of microservices, new features can be deployed faster. As the adoption of microservices scaled also new complexity was introduced and it became more difficult to track all dependencies between the microservices itself. Most significant impact is with the deployment of new features into customer facing services. With the adoption of microservices bi-weekly or daily is becoming the new standard, allowing no more or not enough time anymore for rigid testing procedures.

In that sense, an error is easily made and by now we all know that performance degradation – when deploying new code – is not acceptable anymore for our customers.

So, with the rise of microservices and the increased velocity, it changes the necessity to understand what the dependencies are between running services whilst also knowing what the impact is on user experience is becoming crucial. How to fix it?

The microservices “chaos”

A plant used to produce hydrogen is very complex, with many pipes and valves and so on. We can compare this to IT services utilizing a microservices architecture.

For example, we have many redundant components in this architecture, to adapt to failures. Just like a plant that is producing hydrogen, a lot of redundancies and safeguards are built in. If you ever saw a design of such a plant, it is very complex. A hydrogen producing plant therefore isn’t releasing new features bi-weekly as is done with IT architecture that utilize microservices. This means that in the microservices architecture we have a similar complex system of services but one that happens to change fast and on a regular basis.

How do we control the quality of our services whilst deploying bi-weekly or faster?

Well, there are several options. The first one would be to simply deploy new code and hope for the best. Most likely, your users will start complaining as I never have seen that new code was deployed without any effects on the quality of the service. While there are still companies that work like this, it will ultimately take more time to re-deploy. A lot of people will be annoyed – including the DEVs that need to fix the issue while it was easily preventable before deployment. This option is neither fast nor consequent in quality.

The second one would be deploying code in a Quality Assurance environment and asking a test team to perform the required tests to validate the quality of the service and wait until the team is finished. Most likely, this will take some time and the release of new features is delayed due to the time it will take before the test team is finished. Especially when they don’t have the tests automated. This option improves quality, but lacks speed

The preferred option would be to include observability into the pipeline when new code is deployed. Companies that have adopted a CI/CD approach for software development in most pipeline testings are already included, such as unit testing or integration tests. These kinds of tests are necessary but don’t test the complete end-to-end experience for a customer. This option provides both speed and quality.

To implement observability, we would need to add some tests that will observe the performance of an (web)application, for example by adding performance- and load or stress tests into the pipeline. If we measure performance, we know what the expected behavior is of our application, and if we add load and stress tests, we will know what to expect when our application is under heavy load.

How can we utilize the load testing to observe the performance in our CI/CD pipeline?

Well, we would need some tools for this, such as:

Gitlab CI/CD.
JMeter.
Observability tooling.

We will use Gitlab CI/CD to add tests when a new deployment is done, the actual tests will be executed by JMeter and an observability tool where we have our baselines and historic performance data.

In Gitlab we would need to define the test we would like to run, for example like below:

test:
stage: test
script:
- echo 'Start JMeter Test'
- /path/to/your/jmeter/bin/jmeter -n -t observability.jmx -l output.jtl

In JMeter, we define our load and the URLs we will test. A test can be defined as a mimic of our average users or if we want to know how our application behaves under heavy load, we can define a stress test.

With observability, we can measure the performance of the application and the underlying infrastructure continuously and in real time. We can distinguish the test from normal user traffic with JMeter based on an attribute which we define up front. There are several possibilities to define an attribute. For example, we could use the attributes below:

Request data such as the HTTP header or a Cookie and so on.
Metadata that is added to the new deployment.
Payload that is present in the request.

Request data such as the HTTP header or a Cookie and so on.
Metadata that is added to the new deployment.
Payload that is present in the request.

By using such an attribute, we can measure what the impact is on the application which new code has on the application. For example, we could measure what the performance impact is. Or if we executed a stress or load test, we could measure what the impact is on the components that are used for the application. Below is an example of performance improvement of a component in an application which functionality is to process orders that are made by customers.

On the left, we see that the performance of ordering was often slow. On the right – which is our new deployment – we see that we have had a significant improvement in the response time.

From a rock to a safe place: Observability for microservices

If we combine CI/CD, test- and observability tooling into our pipeline, we can see what the impact is when new code is deployed automatically. With historic performance data, we can compare the new release with the current running code within hours and not weeks to validate if we see the improvements we expect. And, if we notice significant performance degradation, we can rollback our deployment. If we need to rollback, we can use our observability tooling to quickly pinpoint which part of the newly deployed code is responsible for the degradation of our performance and fix it.

Of course, this approach is not limited to only performance. We can also use this approach to validate if functional logic of our application is still working and so on. By utilizing observability tooling and automated testing in our development process, we create a direct feedback loop for developers. If new code is deployed the result and quality will directly be visible. Time used on troubleshooting or testing code by another team is saved and this additional time can be used to work on new improvements and more features.

So, are you using observability in your pipeline?

Place a suiting CTA right here

Etiam rhoncus. Maecenas tempus, tellus eget condimentum rhoncus, sem quam semper libero, sit amet adipiscing sem neque sed ipsum. Nam quam nunc, blandit vel, link within text

Read other insights