Servicegraph Connector
contrib, k8s
Maintainers: @mapno, @JaredTan95
Source: opentelemetry-collector-contrib
Overview
Overview
The service graphs connector builds a map representing the interrelationships between various services in a system. The connector will analyse trace data and generate metrics describing the relationship between the services. These metrics can be used by data visualization apps (e.g. Grafana) to draw a service graph. Service graphs are useful for a number of use-cases:- Infer the topology of a distributed system. As distributed systems grow, they become more complex. Service graphs can help you understand the structure of the system.
- Provide a high level overview of the health of your system. Service graphs show error rates, latencies, among other relevant data.
- Provide an historic view of a system’s topology. Distributed systems change very frequently, and service graphs offer a way of seeing how these systems have evolved over time.
How it works
Service graphs work by inspecting traces and looking for spans with parent-children relationship that represent a request. The connector uses the OpenTelemetry semantic conventions to detect a myriad of requests. It currently supports the following requests:- A direct request between two services where the outgoing and the incoming span must have
span.kindclient and server respectively. - A request across a messaging system where the outgoing and the incoming span must have
span.kindproducer and consumer respectively. - A database request; in this case the connector looks for spans containing attributes
span.kind=client as well as db.name.
Metrics
The following metrics are emitted by the connector:| Metric | Type | Labels | Description |
|---|---|---|---|
| traces_service_graph_request_total | Counter | client, server, connection_type | Total count of requests between two nodes |
| traces_service_graph_request_failed_total | Counter | client, server, connection_type | Total count of failed requests between two nodes |
| traces_service_graph_request_server | Histogram | client, server, connection_type | Number of seconds for a request between two nodes as seen from the server |
| traces_service_graph_request_client | Histogram | client, server, connection_type | Number of seconds for a request between two nodes as seen from the client |
| traces_service_graph_unpaired_spans_total | Counter | client, server, connection_type | Total count of unpaired spans |
| traces_service_graph_dropped_spans_total | Counter | client, server, connection_type | Total count of dropped spans |
connection_type: unset, messaging_system, or database.
Additional labels can be included using the dimensions configuration option. Those labels will have a prefix to mark where they originate (client or server span kinds).
The client_ prefix relates to the dimensions coming from spans with SPAN_KIND_CLIENT, and the server_ prefix relates to the
dimensions coming from spans with SPAN_KIND_SERVER.
Since the service graph connector has to process both sides of an edge,
it needs to process all spans of a trace to function properly.
If spans of a trace are spread out over multiple instances, spans are not paired up reliably.
A possible solution to this problem is using the load balancing exporter
in a layer on front of collector instances running this connector.
Visualization
Service graph metrics are natively supported by Grafana since v9.0.4. To run it, configure a Tempo data source’s ‘Service Graphs’ by linking to the Prometheus backend where metrics are being sent:Configuration
The following settings are required:latency_histogram_buckets: the list of durations defining the latency histogram buckets. Make sure use eitherlatency_histogram_bucketsorexponential_histogram_max_size.- Default:
[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]
- Default:
exponential_histogram_max_size: (no default) the maximum number of buckets per positive or negative number range.dimensions: the list of dimensions to add together with the default dimensions defined above.
store: defines the config for the in-memory store used to find requests between services by pairing spans.ttl: TTL is the time to live for items in the store.- Default:
2s
- Default:
max_items: MaxItems is the maximum number of items to keep in the store.- Default:
1000
- Default:
cache_loop: the interval at which to clean the cache.- Default:
1m
- Default:
store_expiration_loop: the time to expire old entries from the store periodically.- Default:
2s
- Default:
virtual_node_peer_attributes: the list of attributes, ordered by priority, whose presence in a client span will result in the creation of a virtual server node. An empty list disables virtual node creation.- Default:
[peer.service, db.name, db.system]
- Default:
virtual_node_extra_label: adds an extra labelvirtual_nodewith an optional value ofclientorserver, indicating which node is the uninstrumented one.- Default:
false
- Default:
metrics_flush_interval: the interval at which metrics are flushed to the exporter.- Default:
60s
- Default:
metrics_timestamp_offset: the offset to subtract from metric timestamps. If set to a positive duration, metric timestamps will be set to (current time - offset), effectively shifting metrics to appear as if they were generated in the past.- Default:
0
- Default:
database_name_attributes: the list of attribute names used to identify the database name from span attributes. The attributes are tried in order, selecting the first match.- Default:
[db.name]
- Default:
Example configurations
Sample with custom buckets and dimensions
Sample with options for uninstrumented services identification
Last generated: 2026-04-13