Loadbalancing Exporter
contrib, k8s
Maintainers: @rlankfo
Source: opentelemetry-collector-contrib
Supported Telemetry
Overview
This is an exporter that will consistently export spans and metrics depending on therouting_key configured. Logs are exported based on the traceID (if it’s present) or an auto-generated traceID. Therefore setting the routing_key for logs does not have any effect.
The options for routing_key are: service, traceID, metric (metric name), resource, streamID.
| routing_key | can be used for |
|---|---|
| service | spans, metrics |
| traceID | spans |
| resource | metrics |
| metric | metrics |
| streamID | metrics |
| attributes | spans, metrics |
routing_key is configured, the default routing mechanism is traceID for traces, while service is the default for metrics. This means that spans belonging to the same traceID (or service.name, when service is used as the routing_key) will be sent to the same backend.
It requires a source of backend information to be provided: static, with a fixed list of backends, or DNS, with a hostname that will resolve to all IP addresses to use (such as a Kubernetes headless service). The DNS resolver will periodically check for updates.
Note that either the Trace ID or Service name is used for the decision on which backend to use: the actual backend load isn’t taken into consideration. Even though this load-balancer won’t do round-robin balancing of the batches, the load distribution should be very similar among backends with a standard deviation under 5% at the current configuration.
This load balancer is especially useful for backends configured with tail-based samplers or red-metrics-collectors, which make a decision based on the view of the full trace.
When a list of backends is updated, some of the signals will be rerouted to different backends.
Around R/N of the “routes” will be rerouted differently, where:
- A “route” is either a trace ID or a service name mapped to a certain backend.
- “R” is the total number of routes.
- “N” is the total number of backends.
groupbytrace processor. This way, traces are dispatched atomically to this exporter, and the same decision about the backend is made for the trace as a whole.
This also supports service name based exporting for traces. If you have two or more collectors that collect traces and then use spanmetrics connector to generate metrics and push to prometheus, there is a high chance of facing label collisions on prometheus if the routing is based on traceID because every collector sees the service+operation label. With service name based routing, each collector can only see one service name and can push metrics without any label collisions.
Resilience and scaling considerations
Theloadbalancingexporter will, irrespective of the chosen resolver (static, dns, k8s), create one otlp exporter per endpoint. Each level of exporters, loadbalancingexporter itself and all sub-exporters (one per each endpoint), have its own queue, timeout and retry mechanisms. Importantly, the loadbalancingexporter, by default, will NOT attempt to re-route data to a healthy endpoint on delivery failure, because in-memory queue, retry and timeout setting are disabled by default (more details on queuing, retry and timeout default settings).
- For all types of resolvers (
static,dns,k8s) - if one of endpoints is unavailable - first works queue, retry and timeout settings defined for sub-exporters (underotlpproperty). Once redelivery is exhausted on sub-exporter level, and resilience options 1 are enabled - telemetry data returns toloadbalancingexporteritself and data redelivery happens according to exporter level queue, retry and timeout settings. - When using the
staticresolver, there’s a risk of data loss if one of the defined endpoint targets becomes unavailable and both resiliency option 2 and resiliency option 1 (if enabled) are exhausted. This limitation stems from the nature of thestaticresolver, where endpoints are configured manually and remain fixed. In contrast, resolvers likednsandk8sautomatically update the list of available endpoints, offering greater flexibility and resilience. - When using
k8s,dns, and likely future resolvers, topology changes are eventually reflected in theloadbalancingexporter. Thek8sresolver will update more quickly thandns, but a window of time in which the true topology doesn’t match the view of theloadbalancingexporterremains. - Resiliency options 1 (
timeout,retry_on_failureandsending_queuesettings inloadbalancingsection) - are useful for highly elastic environment (like k8s), where list of resolved endpoints frequently changed due to deployments, scale-up or scale-down events. In case of permanent change of list of resolved exporters this options provide capability to re-route data into new set of healthy backends. Disabled by default. - Resiliency options 2 (
timeout,retry_on_failureandsending_queuesettings inotlpsection) - are useful for temporary problems with specific backend, like network flukes. Persistent Queue is NOT supported here as all sub-exporter shares the samesending_queueconfiguration, includingstorage. Enabled by default.
loadbalancingexporter. The consistent hashing mechanism will ensure a deterministic result between instances sharing the same configuration and resolve an exact list of backend endpoints.
Configuration
Refer to config.yaml for detailed examples on using the exporter.- The
otlpproperty configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that theendpointproperty should not be set and will be overridden by this exporter with the backend endpoint. - The
resolveraccepts astaticnode, adns, ak8sservice oraws_cloud_map. If all four are specified, anerrMultipleResolversProvidederror will be thrown. - The
hostnameproperty inside adnsnode specifies the hostname to query in order to obtain the list of IP addresses. - The
dnsnode also accepts the following optional properties:hostnameDNS hostname to resolve.portport to be used for exporting the traces to the IP addresses resolved fromhostname. Ifportis not specified, the default port 4317 is used.intervalresolver interval in go-Duration format, e.g.5s,1d,30m. If not specified,5swill be used.timeoutresolver timeout in go-Duration format, e.g.5s,1d,30m. If not specified,1swill be used.
- The
k8snode accepts the following optional properties:serviceKubernetes service to resolve, e.g.lb-svc.lb-ns. If no namespace is specified, an attempt will be made to infer the namespace for this collector, and if this fails it will fall back to thedefaultnamespace.portsport to be used for exporting the traces to the addresses resolved fromservice. Ifportsis not specified, the default port 4317 is used. When multiple ports are specified, two backends are added to the load balancer as if they were at different pods.timeoutresolver timeout in go-Duration format, e.g.5s,1d,30m. If not specified,1mwill be used.return_hostnameswill return hostnames instead of IPs. This is useful in certain situations like using istio in sidecar mode. To use this feature, theservicemust be a headlessService, pointing at aStatefulSet, and theservicemust be what is specified under.spec.serviceNamein theStatefulSet.- RBAC requirement: the Collector pod must run with a service account that is allowed to
get,list, andwatchdiscovery.k8s.io/v1EndpointSliceobjects in the target namespace; otherwise the resolver cache remains empty and the exporter logscouldn't find the exporter for the endpoint "".
- The
aws_cloud_mapnode accepts the following properties:namespaceThe CloudMap namespace where the service is register, e.g.cloudmap. If nonamespaceis specified, this will fail to start the Load Balancer exporter.service_nameThe name of the service that you specified when you registered the instance, e.g.otelcollectors. If noservice_nameis specified, this will fail to start the Load Balancer exporter.intervalresolver interval in go-Duration format, e.g.5s,1d,30m. If not specified,30swill be used.timeoutresolver timeout in go-Duration format, e.g.5s,1d,30m. If not specified,5swill be used.portport to be used for exporting the traces to the addresses resolved fromservice. By default, the port is set in Cloud Map, but can be be overridden with a static value in this confighealth_statusfilter in AWS Cloud Map, you can specify the health status of the instances that you want to discover. The health_status filter is optional and allows you to query based on the health status of the instances.- Available values are
HEALTHY: Only return instances that are healthy.UNHEALTHY: Only return instances that are unhealthy.ALL: Return all instances, regardless of their health status.HEALTHY_OR_ELSE_ALL: Returns healthy instances, unless none are reporting a healthy state. In that case, return all instances. This is also called failing open.
- Resolver’s default filter is set to
HEALTHYwhen none is explicitly defined
- Available values are
- Notes:
- This resolver currently returns a maximum of 100 hosts.
TODO: Feature request 29771 aims to cover the pagination for this scenario
- The
routing_keyproperty is used to specify how to route values (spans or metrics) to exporters based on different parameters. This functionality is currently enabled only fortraceandmetricpipeline types. It supports one of the following values:service: Routes values based on their service name. This is useful when using processors like the span metrics, so all spans for each service are sent to consistent collector instances for metric collection. Otherwise, metrics for the same services are sent to different collectors, making aggregations inaccurate.attributes: Routes based on values in attributes. This is similar to service, but useful for situations in which a single service overwhelms any given instance of the collector, and should be split over multiple collectors. For traces, resource / scope / span attributes plusspan.kindandspan.name(top-level span fields) are supported. For metrics, resource / scope / datapoint attributes are supported.traceID: Routes spans based on theirtraceID. Invalid for metrics.metric: Routes metrics based on their metric name. Invalid for spans.streamID: Routes metrics based on their datapoint streamID. That’s the unique hash of all it’s attributes, plus the attributes and identifying information of its resource, scope, and metric data
- loadbalancing exporter supports set of standard queuing, retry and timeout settings, but they are disable by default to maintain compatibility
- The
routing_attributesproperty is used to list the attributes that should be used if therouting_keyisattributes.
[!IMPORTANT] The k8s resolver requires proper permissions. See the full example for more information.
Metrics
The following metrics are recorded by this exporter:otelcol_loadbalancer_num_resolutionsrepresents the total number of resolutions performed by the resolver specified in the tagresolver, split by their outcome (success=true|false). For the static resolver, this should always be1with the tagsuccess=true.otelcol_loadbalancer_num_backendsinforms how many backends are currently in use. It should always match the number of items specified in the configuration file in case thestaticresolver is used, and should eventually (seconds) catch up with the DNS changes. Note that DNS caches that might exist between the load balancer and the record authority will influence how long it takes for the load balancer to see the change.otelcol_loadbalancer_num_backend_updatesrecords how many of the resolutions resulted in a new list of backends. Use this information to understand how frequent your backend updates are and how often the ring is rebalanced. If the DNS hostname is always returning the same list of IP addresses but this metric keeps increasing, it might indicate a bug in the load balancer.otelcol_loadbalancer_backend_latencymeasures the latency for each backend.otelcol_loadbalancer_backend_outcomecounts what the outcomes were for each endpoint,success=true|false.
Attributes
| Attribute Name | Description | Type | Values |
|---|---|---|---|
endpoint | The endpoint of the backend | string | |
resolver | Resolver used | string | aws, dns, k8s, static |
success | Whether an outcome was successful | bool |
Configuration
Example Configuration
Last generated: 2026-04-13