Metrics are a great way to find useful information about our application and infrastructure. Before we start using metrics and preparing reports for the business, we need to integrate our app with some tools. That’s why today our focus is on how to measure, which means that we’re going to delve deep into how our application code works in the first place. We’re going to take a look at the Prometheus monitoring system and get deep into what is Prometheus monitoring.
In the previous article, we described how it is possible to uncover various problems with an app by finding bottlenecks in the project, why metrics are great to do this, and why we went for the Prometheus server monitoring to accomplish our objectives. Today, we’re going to actually start measuring the app.
Try using Prometheus metrics yourself!
Most of us prefer to check things by ourselves rather than implementing them without knowing if this whole thing works as expected. Luckily, there is a live demo that contains the default Prometheus dashboard and is integrated with Grafana.
What is Prometheus? All you need to know about integration with Prometheus
One of the best things about the Prometheus monitoring tool is the support for many languages. In most cases, we don’t need to worry about how to write integration and handle all the best practices. Here you can find a list of client libraries to work with the Prometheus software.
Currently, there are around 20 libraries for different technologies! If your technology is not on the list, you can follow a special guide that explains how to write your own integration.
The application which we needed to optimize and integrate metrics for was written in PHP/Symfony. We’re going to present examples of these technologies. Still, it would be easy enough to transfer it to something else.
Let’s get back to the application.
We wanted to save time and deliver a solution as soon as possible. We used the existing bundle instead of writing a new one. Our choice was the tweedgolf bundle. It’s a nice small library which has got everything that we needed.
Define the Prometheus data collection method
How does Prometheus work? Let’s take a closer look at monitoring Prometheus and Prometheus architecture.
Prometheus offers two different ways to collect data. Make sure you choose the one most suitable for your needs. More on that below:
- Push-gateway – the application pushes data to the special gateway (collector). It’s useful when we want to capture information for short-lived jobs.
- Scraped – the application is asked by a Prometheus instance for data that was collected.
In our case, we needed to focus on the general flow and behavior of an application. The second method was the best for us. We’re now going to talk about it in detail.
Prometheus implementation
To achieve our main goal – measuring how our code works – we need to start measuring data just before we call the controller and stop it just after the code was executed.
Luckily, the Symfony Framework uses events so we’re able to create subscribers that handle two events.
Here is the basic concept for our code:
- validate data that we get from request/event (e.g. sometimes Symfony doesn’t return the correct action),
- check if we want to collect data for the specific endpoint (we don’t want to measure some actions because it doesn’t add any values),
- start collecting data,
- do the controller action,
- prepare the collected data,
- save the collected data to the storage.
Let’s take a look at a code example that shows how to start measuring the execution time for endpoints:
Now we need to add the configuration to know how we want to save everything:
And some additional config for subscriber:
Share the collected data with the Prometheus instance
In-memory storage is temporary. It’s not recommended to use it for metrics data because we want to be able to search the data many times. To that end, we’re going to use the Prometheus instance.
By default, storage such as Redis removes old data. It will cause problems – incorrect results after some time. If your storage doesn’t remove old data, metrics will kill an application. We had this problem. It took only a month for our Redis storage to be full. The configuration didn’t allow overriding data. Also, too much data brings optimization problems.
To fix all these problems, we have to remove metrics after each controller call!
The Prometheus instance will ask our application for data collected in the storage. To do that, we need to add an entry point. Remember to add a configuration for your server to only allow the Prometheus instance to ask for data. Otherwise, our metrics will not be relevant.
Basic concept:
- get data from storage,
- remove collected metrics from storage,
- return collected data.
Time to explain the data returned by our controller. The first time that we saw it, we were a little confused about its meaning. But as strange as it might look, it’s actually very easy to read. We think of it as a small curiosity. In the normal flow, we won’t have to worry about it. Grafana will read this data and prepare graphs.
Currently, our controller returns data as follows:
# HELP app_request_execution_time_seconds Request duration
# TYPE app_request_execution_time_seconds histogram
app_request_execution_time_seconds_count{router="api_get_second_example",env="PROMETHEUS_",application="example_app"} 1
app_request_execution_time_seconds_sum{router="api_get_second_example",env="PROMETHEUS_",application="example_app"} 0.07
app_request_execution_time_seconds_bucket{le="+Inf",router="api_get_second_example",env="PROMETHEUS_METRICS",application="example_app"} 1
Here is the legend
# HELP _comment_ is our description to know what specific metric count
# TYPE _name_ _type_ is a information about specific collector
_name_{_label-name_=_label-value_} _value_ Information about collected data
We can read the result like this:
We got a histogram that collects the request duration. Histogram is named app_request_execution_time_seconds
. It has additional labels to filter data such as routing called api_get_second_example
. The environment is called PROMETHEUS_METRICS
and the application’s name is example_app
. Currently, there is only 1 request logged and it took 0.07 seconds.
OK, our application collects data. We’re able to share this data with the Prometheus instance. Time to tackle the last challenge – we don’t have infinite space for metrics on the disk. To solve this problem, we need to make a decision regarding the precision of the data we want.
We had many arguments about it before we found the Golden mean. We can’t collect data too often because it will be hard to read. We can’t collect data too rarely because we will not see current values.
We wanted to show metrics using Grafana. Grafana has an interesting feature called step time. It defines how often it should take the collected data (to be presented in a graph form). For one day it takes data every minute, for two days every 2 minutes, for 30 days once per day. It’s good because it prevents a situation when we grab too much data and CPU/RAM is not able to handle it. For us, it’s problematic because data is not summarized. It’s just a peak from the specified time.
To better understand this, consider these examples:
Example 1.
When we select data from the last 30 days to show and we collect data every 15 seconds, we will see 30 results (not 4 * 60 * 24 * 30 = 172 800 results) in a graph taken at a specific hour; e.g.
Date | Quantity |
21.02.2020 12:12:12 | 23 |
22.02.2020 12:12:12 | 25 |
23.02.2020 12:12:12 | 28 |
… | … |
Example 2.
We want to see the last 5 minutes with a step time of one minute and application-collected data every 15 seconds.
Collected data:
Date | Quantity |
24.02.2020 12:12:00 | 12 |
24.02.2020 12:12:15 | 1 |
24.02.2020 12:12:30 | 4 |
24.02.2020 12:12:45 | 16 |
24.02.2020 12:13:00 | 8 |
24.02.2020 12:13:15 | 1 |
24.02.2020 12:13:30 | 3 |
24.02.2020 12:13:45 | 11 |
24.02.2020 12:14:00 | 5 |
… | … |
We will see 5 results with data from the specified peak:
Date | Quantity |
24.02.2020 12:12:00 | 12 |
24.02.2020 12:13:00 | 8 |
24.02.2020 12:14:00 | … |
24.02.2020 12:15:00 | … |
24.02.2020 12:16:00 | … |
not summary like this
Date | Quantity | Explanation |
12:12:00 | 33 | because we collected in 60 seconds 12+1+4+16 |
12:13:00 | 23 | because we collected in 60 seconds 8+1+3+11 |
… | … | |
If we collected data between these taken peaks, the graph will not show it.
Don’t use these metrics to check if a specific endpoint/place in the code is used or not.
Just put there a normal log (and check after a minimum of one month if it was logged) and talk with the business if the functionality is supported. Also, remember that some jobs may be executed once per year etc.
Usually, you need to check data for the last 7 days. We decided to collect data every 2 minutes. To clarify how it worked:
…
request 17:11:11 counter 1
request 17:11:12 counter 2
request 17:11:12 counter 3
…
request 17:12:13 counter 58
…
If we ask the controller for data we will get counter 58. So we have only one data row which represents data for 58 requests in two minutes. It’s much easier to prepare a report for the business on how their application works.
That’s it! We have everything that is needed in the application’s code to get data using Prometheus.
Summary & Example Prometheus monitoring integration
As we promised in the first part of the series, we prepared a simple application to show you how it should all be implemented.
It contains all required validations, blacklists for endpoints, more metrics, and other important stuff required to set up metrics specified for PHP with the Symfony implementation.
This repository contains two ways to make the setup. The first is based on the Docker configuration file. If you work as a developer, it should be very easy to start and check how to integrate your application. The other way is set up with Kubernetes and kind (the Kubernetes Prometheus combination) – we are talking about it in the next part. On the whole, the third installment of the series focuses on Prometheus DevOps’ work. We will explain how to configure the Prometheus server and Prometheus database which will collect all our data in one place. You can also go right to the fourth part, which is about creating custom dashboards in Grafana for both devs and business folks.
So, what do you think about monitoring with the Prometheus tool and this kind of monitoring solution in general?