11 February 2021
Metrics in optimization process: Grafana custom dashboard tutorial (4/4)

You may know already how important it is to measure the performance of your app, or even how to integrate with tools such as Prometheus. But if you can’t visualize your data in an easy-to-read organized manner, you won’t get much out of it. In the last part of the Metrics in optimization process series, we’re taking you to a Grafana custom dashboard crash course so that you can produce the kind of dashboards even the Business department will understand.
In the previous three articles, we talked about why you should measure your app, how you can integrate your app with Prometheus and how to actually go about collecting data.
Since we have now a lot of data to show we should think about the presentation layer. The true purpose of it is to present data in a way that gives us what we need, rather than just as a disorganized pile of trash. It will not just make pretties, but also far more useful. Another important thing is that the data, which we collected with Prometheus, doesn’t have an easy-to-read format. Today it’s time to show ugly metrics data in a beautiful way.
As you know from previous articles, we collect data from Kubernetes and our application. We are going to show an example of how to configure a dashboard, what is important, and why we did it this way. We are not going to show and describe all options in Grafana, just the ones that we needed to configure our charts (hey, it’s a crash course, after all!).
Kubernetes dashboards
Since we have already covered a lot in previous parts, it’s a good idea to sum up what we have now. With that, we can create our own custom Grafana dashboards.
Kubernetes has many dashboards of its own. They provide information about resources. With the cluster created and Prometheus integration out of the way, it’s good to check them.

As you can see, it is quite a long list. We don’t want to get you bored with describing every dashboard and every metric they provide in detail. Let’s focus on “Kubernetes / Compute Resources / Namespace (Pods)”.
This dashboard displays information about CPU, memory and network usage. These pieces of information are crucial to find out if optimization helped or if the last deployment added some particularly “heavy” scripts.

Grafana custom dashboard
In our application, we should have many custom dashboards to accommodate each type of data that we need. A good practice is to not include too much data in a single dashboard – it will be hard to read and understand. What’s more, the loading time for that page may increase dramatically.
First, we need to choose the “+” option on the upper side of the left sidebar. Next, let’s go for the “Dashboard” option from the drop-down menu.

Grafana gives us two starting points – chart type or data. Usually, we want to use default charts, but it’s good to try all the different charts. In our case, we are going to create a default chart. In order to do that, we need to pick the “add query” option.

Here comes the trickiest part – we need to define WHAT metrics we want to show and HOW we want to show them.

First, we need to discuss specific options to understand how everything will work:
- Chart – this is where we are going to show our data.
- Query – in our case we should have configured PromQL by default. If it’s not, please change it to the “Prometheus” option.
- Add query – we can show more than one data source in one chart. It’s useful when we need to show a new data source but we want to keep the old one.
- Query inspection – our debugging tool to improve the experience when we need to find a problem with a query.
- Query box – a place where we define what and how we want to see.
Metrics – we use them to choose the data that we want to show. In this case, we collected “app_request_execution_time_seconds” so that we can search for this metric. If the application hasn’t collected a particular type of data, it will not be on the list. Remember to run any endpoint in your app first to generate a minimum amount of data. In our case “app_request_execution_time_seconds” is an histogram.
Prometheus logged three elements:
- Sum – summary of logged information.
- Count – counted data.
- Bucket – in some cases we don’t want exact data, just an approximation. If we configured buckets in our app, we can use them to approximate data.
In our example, we are going to show an execution time per request. We are going to use “sum” and “count” to prepare the exact data we want to see.
The easiest way to show it is to write:
“app_request_execution_time_seconds_sum/app_request_execution_time_seconds_count”.
Unfortunately in real project situations, it’s not enough. Usually, we have many pods that collect data. Also, we want to see and analyze data per routing.
First, we need to use the “sum” function which is going to sum up data from different pods:
“sum(app_request_execution_time_seconds_sum)/sum(app_request_execution_time_seconds_count)”

But now we can only see one element instead of all the routings we called! To fix this problem, we are going to need to group our data using “by (label)”:
”sum(app_request_execution_time_seconds_sum) by (router)/sum(app_request_execution_time_seconds_count) by (router)”

Congratulations! You configured your first chart with metrics!
Still, it’s not a pretty look. We can see “ugly” routing labels instead of clear data.

To fix this problem, we need to declare data in the “legend” field. Just put ”{{router}}”` and it’s done.
As we discussed in the previous article, we collect data for specific periods so we need to put the correct “min” step (in my case it will be “2 min”).
The final query configuration is:

And that’s all! Now you can see the proper metrics:

In the example application which we created for this article, we have only 8 API endpoints. Everything is easy to read. The problem starts when we have 50 or more endpoints.
Time to improve our visualization. We need to move to the second tab. In this tab, we can change our chart type (in our case the best one will be “graph”).

Draw modes
Most options depend on our visual taste. I like defaults, so I will stay with it.
There are only two options which improve visibility that I want to mention:
- Null value – we want to see data even if the value is zero so I will change it to “null as zero”.
- Mode – when you point to a specific chart element, you will see information about all endpoints within a specific time range. If you want to see the highest time, it will be better to change it to “single”.
Grafana custom dashboard legend
The current legend is not very pretty and we need to change it. It brings a lot of useful information. The data it shows is typically used for creating business reports.
We want to see everything as a table and move it to the right to improve better visibility and take less space. As far as values are concerned, the most useful statistics ones are the average and max peak. For some statistics it’s good to set decimal to a more general value (we changed it to 2), we are not interested in thousandths.

Grafana custom dashboard thresholds
Thresholds are a very nice feature that helps us find data that is greater/lesser than a specified value. With Thresholds, it’s much easier to find routings to optimize.
As a first step, we want to know which endpoints are slower than 0.3 seconds. We will see it easily when we configure thresholds with “gt 0.3”. In a real project, we suggest setting it to 1.0 seconds.

General options for Grafana custom dashboard
Title – the name for our new chart, the name should be short and easy to understand – e.g. when we log the execution time in an application we can name it “App request execution time”.
Description – a place when we can put more information on why and what we show.

That’s it. Our new chart is configured. We can save it. We will be redirected to the new dashboard with the first chart. In our case, we needed to repeat this process for each useful information.
The full configuration is available in the example app repository. The visualization will look like that:

Grafana custom dashboard – summary
Grafana helps us visualize and understand what happens in an application:
- It provides information such as which actions increased/decreased speed and resource utilization.
- It makes it possible to prepare a business report about progress.
- With alerts, it’s easy to get information that something is wrong.
Life with all this information is so much easier!
The optimization process is a long journey, hard to accomplish without additional help. It’s important to define goals, analyze exactly what should be measured. We need software which will help solve problems. In this case, Prometheus was the best tool for us. It delivered necessary information and provided integrations with many programming languages and Kubernetes out of the box.
With a deep understanding of how data collection and presentation layers work, it’s easy to find real, useful information about an application. Even business was satisfied when we prepared a presentation on how an application worked before and after optimization (thanks for the charts, Grafana!).
We can’t forget about end-users who stopped complaining about application speed.
Finally, it made us happy as developers, because we were able to find out exactly how our changes affect the application.
And if you are searching for a team that knows how to optimize application performance, saving your precious time and resources in the process, contact The Software House. 👇