This article will teach you how to configure observability for your Spring Boot applications. We assume that observability is understood as the interconnection between metrics, logging, and distributed tracing. In the end, it should allow you to monitor the state of your system to detect errors and latency.
There are some significant changes in the approach to observability between Spring Boot 2 and 3. Tracing will no longer be part of Spring Cloud through the Spring Cloud Sleuth project. The core of that project has been moved to Micrometer Tracing. You can read more about the reasons and future plans in this post on the Spring blog.
The main goal of that article is to give you a simple receipt for how to enable observability for your microservices written in Spring Boot using a new approach. In order to simplify our exercise, we will use a fully managed Grafana instance in their cloud. We will build a very basic architecture with two microservices running locally. Let’s take a moment to discuss our architecture in great detail.
Source Code
If you would like to try it by yourself, you may always take a look at my source code. In order to do that you need to clone my GitHub repository. It contains several tutorials. You need to go to the inter-communication
directory. After that, you should just follow my instructions.
Spring Boot Observability Architecture
There are two applications: inter-callme-service
and inter-caller-service
. The inter-caller-service
app calls the HTTP endpoint exposed by the inter-callme-service
app. We run two instances of inter-callme-service
. We will also configure a static load balancing between those two instances using Spring Cloud Load Balancer. All the apps will expose Prometheus metrics using Spring Boot Actuator and the Micrometer project. For tracing, we are going to use Open Telemetry with Micrometer Tracing and OpenZipkin. In order to send all the data including logs, metrics, and traces from our local Spring Boot instances to the cloud, we have to use Grafana Agent.
On the other hand, there is a stack responsible for collecting and visualizing all the data. As I mentioned before we will leverage Grafana Cloud for that. It is a very comfortable way since we don’t have to install and configure all the required tools. First of all, Grafana Cloud offers a ready instance of Prometheus responsible for collecting metrics. We also need a log aggregation tool for storing and querying logs from our apps. Grafana Cloud offers a preconfigured instance of Loki for that. Finally, we have a distributed tracing backend through the Grafana Tempo. Here’s the visualization of our whole architecture.
Enable Metrics and Tracing with Micrometer
In order to export metrics in Prometheus format, we need to include the micrometer-registry-prometheus
dependency. For tracing context propagation we should add the micrometer-tracing-bridge-otel
module. We should also export tracing spans using one of the formats supported by Grafana Tempo. It will be OpenZipkin through the opentelemetry-exporter-zipkin
dependency.
|
|
We need to use the latest available version of Spring Boot 3. Currently, it is 3.0.0-RC1
. As a release candidate that version is available in the Spring Milestone repository.
One of the more interesting new features in Spring Boot 3 is the support for Prometheus exemplars. Exemplars are references to data outside of the metrics published by an application. They allow linking metrics data to distributed traces. In that case, the published metrics contain a reference to the traceId
. In order to enable exemplars for the particular metrics, we need to expose percentiles histograms. We will do that for http.server.requests
metric (1). We will also send all the traces to Grafana Cloud by setting the sampling probability to 1.0
(2). Finally, just to verify it works properly we print traceId
and spanId
in the log line (3).
|
|
The inter-callme-service
exposes the POST endpoint that just returns the message in the reversed order. We don’t need to add here anything, just standard Spring Web annotations.
|
|
Load Balancing with Spring Cloud
In the endpoint exposed by the inter-caller-service
we just call the endpoint from inter-callme-service
. We use Spring RestTemplate
for that. You can also declare Spring Cloud OpenFeign client, but it seems it does not currently support Micrometer Tracing out-of-the-box.
|
|
In this exercise, we will use a static client-side load balancer that distributes traffic across two instances of inter-callme-service
. Normally, you would integrate Spring Cloud Load Balancer with service discovery based e.g. on Eureka. However, I don’t want to complicate our demo with external components in the architecture. Assuming we are running inter-callme-service
on 55800
and 55900
here’s the load balancer configuration in the application.yml
file:
Since there is no built-in static load balancer implementation we need to add some code. Firstly, we have to inject configuration properties into the Spring bean.
Then we need to create a bean that implements the ServiceInstanceListSupplier
interface. It just returns a list of ServiceInstance
objects that represents all defined static addresses.
|
|
Finally, we need to enable Spring Cloud Load Balancer for the app and annotate RestTemplate
with @LoadBalanced
.
|
|
Here’s the client-side load balancer configuration. We are providing our custom StaticServiceInstanceListSupplier
implementation as a default ServiceInstanceListSupplier
. Then we set RandomLoadBalancer
as a default implementation of a load-balancing algorithm.
|
|
Testing Observability with Spring Boot
Let’s see how it works. In the first step, we are going to run two instances of inter-callme-service
. Since we set a static value of the listen port we need to override the property server.port
for each instance. We can do it with the env variable SERVER_PORT
. Go to the inter-communication/inter-callme-service
directory and run the following commands:
Then, go to the inter-communication/inter-caller-service
directory and run a single instance on the default port 8080
:
|
|
Then, let’s call our endpoint POST /caller/send/{message}
several times with parameters, for example:
|
|
Here are the logs from inter-caller-service
with the highlighted value of the traceId
parameter:
Let’s take a look at the logs from inter-callme-service
. As you see the traceId
parameter is the same as the traceId
for that request on the inter-caller-service
side.
Here are the logs from the second instance of inter-callme-service
:
ou could also try the same exercise with Spring Cloud OpenFeign. It is configured and ready to use. However, for me, it didn’t propagate the traceId
parameter properly. Maybe, it is the case with the current non-GA versions of Spring Boot and Spring Cloud.
Let’s verify another feature - Prometheus exemplars. In order to do that we need to call the /actuator/prometheus
endpoint with the Accept
header that is asking for the OpenMetrics format. This is the same header Prometheus will use to scrape the metrics.
|
|
As you see several metrics for the result contain traceId
and spanId
parameters. These are our exemplars.
Install and Configure Grafana Agent
Our sample apps are ready. Now, the main goal is to send all the collected observables to our account on Grafana Cloud. There are various ways of sending metrics, logs, and traces to the Grafana Stack. In this article, I will show you how to use the Grafana Agent for that. Firstly, we need to install it. You can find detailed installation instructions depending on your OS here. Since I’m using macOS I can do it with Homebrew:
|
|
Before we start the agent, we need to prepare a configuration file. The location of that file also depends on your OS. For me it is $(brew --prefix)/etc/grafana-agent/config.yml
. The configuration YAML manifests contain information on how we want to collect and send metrics, traces, and logs. Let’s begin with the metrics. Inside the scrape_configs
section, we need to set a list of endpoints for scraping (1) and a default path (2). Inside the remote_write
section, we have to pass our Grafana Cloud instance auth credentials (3) and URL (4). By default, Grafana Agent does not send exemplars. Therefore we need to enable it with the send_exemplars
property (5).
|
|
You can find all information about your instance of Prometheus in the Grafana Cloud dashboard.
In the next step, we prepare a configuration for collecting and sending logs to Grafana Loki. The same as before we need to set auth credentials (1) and the URL (2) of our Loki instance. The most important thing here is to pass the location of log files (3).
|
|
By default, Spring Boot logs only to the console and does not write log files. In our case, the Grafana Agent reads log lines from output files. In order to write log files, we need to set a logging.file.name
or logging.file.path
property. Since there are two instances of inter-callme-service
we need to distinguish somehow their log files. We will use the server.port
property for that. The logs inside files are stored in JSON format.
|
|
Finally, we will configure trace collecting. Besides auth credentials and URL of the Grafana Tempo instance, we need to enable OpenZipkin receiver (1).
Then, we can start the agent with the following command:
|
|
Grafana agent contains a Zipkin collector that listens on the default port 9411
. There is also an HTTP API exposed outside the agent on the port 12345
for verifying agent status.
For example, we can use Grafana Agent HTTP API to verify how many log files it is monitoring. To do that just call the endpoint GET agent/api/v1/logs/targets
. As you see, for me, it is monitoring three files. So that’s what we exactly wanted to achieve since there are two running instances of inter-callme-service
and a single instance of inter-caller-service
.
Visualize Spring Boot Observability with Grafana Stack
One of the most important advantages of Grafana Cloud in our exercise is that we have all the required things configured. After you navigate to the Grafana dashboard you can display a list of available data sources. As you see, there are Loki, Prometheus, and Tempo already configured.
By default, Grafana Cloud enables exemplars in the Prometheus data source. If you are running Grafana by yourself, don’t forget to enable it on your Prometheus data source.
Let’s start with the logs. We will analyze exactly the same logs as in the section “Testing Observability with Spring Boot”. We will get all the logs sent by the Grafana Agent. As you probably remember, we formatted all the logs as JSON. Therefore, we will parse them using the Json
parser on the server side. Thanks to that, we would be able to filter by all the log fields. For example, we can use the traceId
label a filter expression: {job="springboot-json"} | json | traceId = 1bb1d7d78a5ac47e8ebc2da961955f87
.
Here’s a full list of logs without any filtering. The highlighted lines contain logs of two analyzed traces.
In the next step, we will configure Prometheus metrics visualization. Since we enabled percentile histograms for the http.server.requests
metrics, we have multiple buckets represented by the http_server_requests_seconds_bucket
values. A set of multiple buckets _bucket
with a label le
which contains a count of all samples whose values are less than or equal to the numeric value contained in the le
label. We need to count histograms for 90% and 60% percent of requests. Here are our Prometheus queries:
Here’s our histogram. Exemplars are shown as green diamonds.
When you hover over the selected exemplar, you will see more details. It includes, for example, the traceId
value.
Finally, the last part of our exercise. We would like to analyze the particular traces with Grafana Tempo. The only thing you need to do is to choose the grafanacloud-*-traces
data source and set the value of the searched traceId
. Here’s a sample result.
Final Thoughts
The first GA release of Spring Boot 3 is just around the corner. Probably one of the most important things you will have to handle during migration from Spring Boot 2 is observability. In this article, you can find a detailed description of the current Spring Boot approach. If you are interested in Spring Boot, it’s worth reading about its best practices for building microservices in this article.
Reference https://piotrminkowski.com/2022/11/03/spring-boot-3-observability-with-grafana/