Spanmetrics Connector
contrib
Maintainers: @portertech, @Frapschen, @iblancasa
Source: opentelemetry-collector-contrib
Overview
⚠️ Breaking Change Warning: The default duration metrics unit will change fromms to s to adhere to the OpenTelemetry semantic conventions and a feature gate connector.spanmetrics.useSecondAsDefaultMetricsUnit is also added.
Currently, the feature gate is disabled by default, so the unit will remain ms. After one release cycle, the unit will switch to s and the feature gate will also be enabled by default.
Overview
Aggregates Request, Error and Duration (R.E.D) OpenTelemetry metrics from span data. Request counts are computed as the number of spans seen per unique set of dimensions, including Errors. Multiple metrics can be aggregated if, for instance, a user wishes to view call counts just onservice.name and span.name.
Error Status Code metric dimension.
service.namespan.namespan.kindstatus.code(orotel.status_codewhen thespanmetrics.statusCodeConvention.useOtelPrefixfeature gate is enabled)collector.instance.id
collector.instance.id dimension is intended to add a unique UUID to all metrics, ensuring that the spanmetrics connector
does not violate the Single Writer Principle when spanmetrics is used in a multi-deployment model.
Currently, collector.instance.id must be manually enabled via the feature gate: connector.spanmetrics.includeCollectorInstanceID.
More detail, please see Known Limitation: the Single Writer Principle
Span to Metrics processor to Span to metrics connector
The spanmetrics connector replaces spanmetrics processor with multiple improvements and breaking changes. It was done to bring thespanmetrics connector closer to the OpenTelemetry
specification and make the component agnostic to exporters logic. The spanmetrics processor
essentially was mixing the OTel with Prometheus conventions by using the OTel data model and
the Prometheus metric and attributes naming convention.
The following changes were done to the connector component.
Breaking changes:
- The
operationmetric attribute was renamed tospan.name. - The
latencyhistogram metric name was changed toduration. - The
_totalmetric prefix was dropped from generated metrics names. - The Prometheus-specific metrics labels sanitization was dropped.
- Added support for OTel exponential histograms for recording span duration measurements.
- Added support for the milliseconds and seconds histogram units.
- Added support for generating metrics resource scope attributes. The
spanmetricsconnector will generate the number of metrics resource scopes that corresponds to the number of the spans resource scopes meaning that more metrics are generated now. Previously,spanmetricsgenerated a single metrics resource scope.
Configurations
If you are not already familiar with connectors, you may find it helpful to first visit the Connectors README. The following settings can be optionally configured:-
histogram(default:explicit): Use to configure the type of histogram to record calculated from spans duration measurements. Must be eitherexplicitorexponential.disable(default:false): Disable all histogram metrics.unit(default:ms): The time unit for recording duration measurements. calculated from spans duration measurements. One of either:msors.dimensions: additional attributes to add as dimensions to thetraces.span.metrics.durationmetric, which will be included on top of the common and configureddimensionsfor span attributes and resource attributes.explicit:buckets: the list of durations defining the duration histogram time buckets. Default buckets:[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]
exponential:max_size(default:160) the maximum number of buckets per positive or negative number range.
-
dimensions: the list of dimensions to add totraces.span.metrics.calls,traces.span.metrics.durationandtraces.span.metrics.eventmetrics with the default dimensions defined above. Each additional dimension is defined with anamewhich is looked up in the span’s collection of attributes or resource attributes (AKA process tags) such asip,host.nameorregion. If thenamed attribute is missing in the span, the optional provideddefaultis used. If nodefaultis provided, this dimension will be omitted from the metric. -
calls_dimensions: additional attributes to add as dimensions to thetraces.span.metrics.callsmetric, which will be included on top of the common and configureddimensionsfor span attributes and resource attributes. -
exclude_dimensions: the list of dimensions to be excluded from the default set of dimensions. Use to exclude unneeded data from metrics. -
dimensions_cache_size: this setting is deprecated, please use aggregation_cardinality_limit instead. -
include_instrumentation_scope: a list of instrumentation scope names to include from the traces. -
resource_metrics_cache_size(default:1000): the size of the cache holding metrics for a service. This is mostly relevant for cumulative temporality to avoid memory leaks and correct metric timestamp resets. -
aggregation_temporality(default:AGGREGATION_TEMPORALITY_CUMULATIVE): Defines the aggregation temporality of the generated metrics. One of eitherAGGREGATION_TEMPORALITY_CUMULATIVEorAGGREGATION_TEMPORALITY_DELTA. -
namespace(default:traces.span.metrics): Defines the namespace of the generated metrics. Ifnamespaceprovided, generated metric name will be addednamespace.prefix. -
metrics_flush_interval(default:60s): Defines the flush interval of the generated metrics. -
metrics_expiration(default:0): Defines the expiration time astime.Duration, after which, if no new spans are received, metrics will no longer be exported. Setting to0means the metrics will never expire (default behavior). -
metric_timestamp_cache_size(default1000): Only relevant for delta temporality span metrics. Controls the size of the cache used to keep track of a metric’s TimestampUnixNano the last time it was flushed. When a metric is evicted from the cache, its next data point will indicate a “reset” in the series. Downstream components converting from delta to cumulative, likeprometheusexporter, may handle these resets by setting cumulative counters back to 0. -
exemplars: Use to configure how to attach exemplars to metrics.enabled(default:false): enabling will add spans as Exemplars to all metrics. Exemplars are only kept for one flush interval.rom the cache, its next data point will indicate a “reset” in the series. Downstream components converting from delta to cumulative, likeprometheusexporter, may handle these resets by setting cumulative counters back to 0.max_per_data_point(default:5): The maximum number of exemplars to attach to a single metric data point.
-
events: Use to configure the events metric.enabled: (default:false): enabling will add the events metric.dimensions: (mandatory ifenabled) the list of the span’s event attributes to add as dimensions to thetraces.span.metrics.eventsmetric, which will be included on top of the common and configureddimensionsfor span attributes and resource attributes.
-
resource_metrics_key_attributes: Filter the resource attributes used to produce the resource metrics key map hash(It’s only used to build the hash key, not copy the attributes to metrics resource attributes). Use this in case changing resource attributes (e.g. process id) are breaking counter metrics. -
aggregation_cardinality_limit(default:0): Defines the maximum number of unique combinations of dimensions that will be tracked for metrics aggregation. When the limit is reached, additional unique combinations will be dropped but registered under a new entry withotel.metric.overflow="true". A value of0means no limit is applied. -
add_resource_attributes(default:false): Add the resource attributes to the resulting metrics. This option enables the old behavior before theconnector.spanmetrics.excludeResourceMetricsfeature gate was introduced. When set totrue, resource attributes will be included in the metrics even if the feature gate is enabled. See GitHub issue #42103 for more context. -
enable_metrics_sampling_method(default:false): When enabled, adds thesampling.methodattribute to metrics with value"extrapolated"(when the span has a valid tracestate sampling threshold) or"counted"(otherwise).
connector.spanmetrics.legacyMetricNames (disabled by default) controls the connector to use legacy metric names.
Examples
The following is a simple example usage of thespanmetrics connector.
For configuration examples on other use cases, please refer to More Examples.
The full list of settings exposed for this connector are documented in spanmetricsconnector/config.go.
Using spanmetrics with Prometheus components
The spanmetrics connector can be used with Prometheus exporter components.
For some functionality of the exporters, e.g. like generation of the target_info metric the
incoming spans resource scope attributes must contain service.name and service.instance.id
attributes.
Let’s look at the example of using the spanmetrics connector with the prometheusremotewrite exporter:
spanmetrics connector to generate metrics from received spans and export the
metrics to the Prometheus Remote Write exporter. The target_info metric will be generated for each
resource scope, while OpenTelemetry metric names and attributes will be normalized
to be compliant with Prometheus naming rules. For example, the generated calls OTel sum metric can
result in multiple Prometheus calls_total (counter type) time series and the target_info time series.
For example:
More Examples
For more example configuration covering various other use cases, please visit the testdata directory.Known Limitation: the Single Writer Principle
Proper configuration of thespanmetricsconnector ensures compliance with the Single Writer Principle,
which is a core requirement in the OpenTelemetry metrics data model. Misconfiguration, however,
may allow multiple components to write to the same metric stream, resulting in data inconsistency,
metric conflicts, or the dropping of time series by metric backends.
Why this happens
This issue typically arises when:- Multiple pipelines use the same instance of the
spanmetricsconnector - The connector is instantiated more than once without ensuring the resulting metric streams are distinct
- The
resource_metrics_key_attributesfield is not configured correctly or includes common/shared attributes across all instances
Recommendations
To reduce the risk of conflicting writes:- Add
resource_metrics_key_attributesto your configuration.
- Manually enable the feature gate:
connector.spanmetrics.includeCollectorInstanceIDto produce uniquely identified metrics. - For exporters like Prometheus, which rely on the single writer assumption, use a dedicated pipeline with a single
spanmetricsconnectorinstance
About resource_metrics_key_attributes
The resource_metrics_key_attributes setting are used to build the key map that determines how metrics are grouped.
If this field is left empty, the connector will use all available attributes to compute the resource metric hash.
To avoid problems, be cautious when choosing which attributes to include.
Avoid attributes that:
- Change frequently – such as
request_id,timestamp, ortrace_id. These increase cardinality and create excessive metric streams. - Are shared across all sources – values like
true,default, orteam:backendoffer no uniqueness and can lead to multiple writers sharing the same stream. - Are optional or inconsistently applied – if an attribute is only present in some spans, this can fragment metric streams (e.g., one stream with the attribute and one without).
cluster_id, region, or deployment_environment.
Troubleshooting span metrics high cardinality
High cardinality issues in span metrics commonly manifest in APM dashboards as an excessive number of service operations with non-unique names. Examples include URIs with unique identifiers (e.g.,GET /product/1YMWWN1N4O) or HTTP parameters with random values (e.g., GET /?_ga=GA1.2.569539246.1760114706). These patterns render operation lists difficult to interpret and ineffective for monitoring purposes.
This issue stems from violations of OpenTelemetry semantic conventions, which require span names to have low cardinality (e.g. HTTP span name specs).
Beyond degrading APM interfaces with numerous non-meaningful operation names, this problem leads to metric time series explosion, resulting in significant performance degradation and increased costs.
The span metrics connector provides an optional circuit breaker through the aggregation_cardinality_limit attribute (disabled by default) to mitigate cardinality explosion. While this feature addresses performance and cost concerns, it does not resolve the underlying issue of semantically meaningless operation names.
Fixing high cardinality span name issues
The ideal long-term solution is to modify the OpenTelemetry instrumentation code to comply with semantic conventions, preventing the generation of non-compliant high cardinality span names. However, deploying updated instrumentation libraries can be time-consuming, often requiring an immediate interim solution to restore observability backend functionality.Addressing high cardinality span names in the ingestion pipeline
An effective short-term solution is to implement a span sanitization layer within the observability ingestion pipeline. This can be achieved by using the OpenTelemetry Collector Transform Processor’sset_semconv_span_name() function immediately before the Span Metrics Connector to enforce semantic conventions on span names.
Aggressive span name sanitization may be overly restrictive for instrumentations with incomplete resource attributes. For instance, HTTP service operations may be reduced to generic names like GET and POST when HTTP spans lack the http.route attribute. This information loss can impact the monitoring of critical business operations.
To preserve operation granularity, you can manually set the http.route attribute when detailed operation names are required. The missing http.route value can typically be derived through pattern matching on other span attributes such as http.target or url.full.
Example OpenTelemetry Collector configuration that prevents cardinality explosion
while preserving meaningful operation names on a service webshop/frontend:
Addressing high cardinality span names in the instrumentation code
The preferred long-term solution is to ensure span names and attributes comply with OpenTelemetry Semantic Conventions directly in the instrumentation code. Custom web frameworks are a common source of high cardinality span names. While default OpenTelemetry instrumentation (e.g., Java Servlet) may assign generic span names likeGET /my-web-fwk/*, your framework has access to more specific routing information. By overwriting span attributes in your framework code, you can create compliant, low-cardinality span names that preserve operational granularity.
Example: Custom Web Framework in Java
Consider a custom web framework that intercepts the generic route /my-web-fwk/* and dispatches requests like /my-web-fwk/product/123456ABCD or /my-web-fwk/user/john.doe.
The default Java Servlet instrumentation produces vague span names (GET /my-web-fwk/*), while directly using request URIs creates high cardinality (GET /my-web-fwk/product/123456ABCD).
The solution is to override span attributes with templated route patterns like /my-web-fwk/product/{productId} or /my-web-fwk/user/{userId}:
Configuration
Example Configuration
Last generated: 2026-04-13