Servicegraph Processor
Overview
The service graphs processor is a traces processor that builds a map representing the interrelationships between various services in a system. The processor will analyse trace data and generate metrics describing the relationship between the services. These metrics can be used by data visualization apps (e.g. Grafana) to draw a service graph. Service graphs are useful for a number of use-cases:- Infer the topology of a distributed system. As distributed systems grow, they become more complex. Service graphs can help you understand the structure of the system.
- Provide a high level overview of the health of your system. Service graphs show error rates, latencies, among other relevant data.
- Provide an historic view of a system’s topology. Distributed systems change very frequently, and service graphs offer a way of seeing how these systems have evolved over time.
How it works
Service graphs work by inspecting traces and looking for spans with parent-children relationship that represent a request. The processor uses the OpenTelemetry semantic conventions to detect a myriad of requests. It currently supports the following requests:- A direct request between two services where the outgoing and the incoming span must have
span.kindclient and server respectively. - A request across a messaging system where the outgoing and the incoming span must have
span.kindproducer and consumer respectively. - A database request; in this case the processor looks for spans containing attributes
span.kind=client as well as db.name.
Metrics
The following metrics are emitted by the processor:| Metric | Type | Labels | Description |
|---|---|---|---|
| traces_service_graph_request_total | Counter | client, server, connection_type | Total count of requests between two nodes |
| traces_service_graph_request_failed_total | Counter | client, server, connection_type | Total count of failed requests between two nodes |
| traces_service_graph_request_server_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the server |
| traces_service_graph_request_client_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the client |
| traces_service_graph_unpaired_spans_total | Counter | client, server, connection_type | Total count of unpaired spans |
| traces_service_graph_dropped_spans_total | Counter | client, server, connection_type | Total count of dropped spans |
connection_type: unset, messaging_system, or database.
Additional labels can be included using the dimensions configuration option. Those labels will have a prefix to mark where they originate (client or server span kinds).
The client_ prefix relates to the dimensions coming from spans with SPAN_KIND_CLIENT, and the server_ prefix relates to the
dimensions coming from spans with SPAN_KIND_SERVER.
Since the service graph processor has to process both sides of an edge,
it needs to process all spans of a trace to function properly.
If spans of a trace are spread out over multiple instances, spans are not paired up reliably.
A possible solution to this problem is using the load balancing exporter
in a layer on front of collector instances running this processor.
Visualization
Service graph metrics are natively supported by Grafana since v9.0.4. To run it, configure a Tempo data source’s ‘Service Graphs’ by linking to the Prometheus backend where metrics are being sent:Example configuration
Features and Feature-Gates
See the Collector feature gates for an overview of feature gates in the collector.Last generated: 2026-04-14