Cardinalityguardian Processor
Supported Telemetry
Overview
An OpenTelemetry Collector processor that catches metric cardinality explosions before they reach your TSDB. It strips only the exploding label — not the entire data point. Your dashboards keep working while the cardinality explosion is neutralized.What it does
A code change introduces raw exception strings intoerror.type. Yesterday that label had 5 unique values. Today it has 50,000 and climbing. Your TSDB bill noticed before you did.
This processor sits in your OTel pipeline and detects labels with abnormal growth. It either strips them (enforcement mode) or tags them for routing (tag-only mode). The metric stays intact — only the bad label is removed.
region and status survive. Your latency dashboards keep working. The 50,000 unique exception strings are gone.
How it works
Key design decisions:- Delta-based detection, not absolute thresholds. A label with 50K stable values is fine. A label that grew by 100 in the last epoch is a problem. The processor tracks growth rate using dual-epoch HyperLogLog++ sketches, so legitimate high-cardinality metrics aren’t penalized.
-
256-way sharding. Each shard has its own
RWMutex. With 50 concurrent goroutines across 256 shards, average occupancy is ~0.4 per shard. Contention is near zero. Shard selection ishash & 0xFF— one CPU cycle. -
HLL++ with ~2KB per tracker. Each sketch estimates cardinality regardless of whether 100 or 100M unique values have been observed. 1-2% accuracy. The
axiomhq/hyperlogloglibrary’sInsertHash(uint64)path avoids allocation on the hot path. - Stale eviction. Trackers that haven’t been seen for two epochs are cleaned up. Memory stays bounded.
Comparison with existing processors
| Cardinality Guardian | filterprocessor | metricstransformprocessor | |
|---|---|---|---|
| Detection | Dynamic (growth rate) | Static allow/deny lists | Static rules |
| Granularity | Per-label | Per-metric (drops entire metric) | Per-metric |
| False positives on stable high-cardinality | No (delta-based) | Yes (if above threshold) | Yes |
| Tag-only mode | Yes | No | No |
| Per-metric overrides | Yes | N/A | N/A |
| Top-N offender reporting | Yes | No | No |
| Memory per tracker | ~2KB (HLL++) | N/A | N/A |
filterprocessor and metricstransformprocessor are configuration-driven: you tell them what to drop. This processor is data-driven: it figures out what to drop based on observed behavior. The use cases are complementary, not competing.
Configuration
The Cardinality Guardian processor detects labels with abnormal growth using HyperLogLog++ sketches. It measures the delta (new unique values per epoch), not absolute cardinality, so stable high-cardinality metrics are never penalized. Three enforcement modes control what happens when a label exceeds the threshold:Enforcement Modes
Tag Only
otel.metric.overflow: true on data points that exceed the threshold. No data is modified — this is the safest mode and recommended for initial deployment.
Note: tag_only does not protect your TSDB on its own — high-cardinality labels still reach your backend unchanged. You must pair it with a downstream routing processor to split tagged metrics to cheap storage.
Overflow Attribute
otel.cardinality_overflow and performs inline spatial reaggregation to merge data points that now share this identity, resolving the Single-Writer violation. This mode is aligned with the OTel SDK cardinality overflow convention. Supported and unsupported metric types follow the exact same reaggregation rules and fallbacks as strip_and_reaggregate.
Strip and Reaggregate
strip_and_reaggregate and overflow_attribute modes utilize the spatial reaggregation engine to resolve the Single-Writer violation for supported metric types:
| Metric Type | Reaggregation Support | Merge Semantics |
|---|---|---|
| Delta Sum | ✅ Supported | Values are summed; timestamps span the union |
| Gauge | ✅ Supported | Last-value-wins by timestamp |
| Cumulative Sum | ⚠️ Falls back to tag_only | Requires stateful tracking (not yet supported) |
| Histogram | ⚠️ Falls back to tag_only | Bucket merging requires careful alignment |
| ExponentialHistogram | ⚠️ Falls back to tag_only | Scale alignment not yet implemented |
| Summary | ⚠️ Falls back to tag_only | Quantile merging is mathematically invalid |
Note: For metric types that don’t yet support reaggregation, the processor automatically falls back totag_onlybehavior with anotel.metric.overflowtag, ensuring no data corruption.
Warnings
Care needs to be taken when modifying data point attributes using this processor:- Single-Writer Conflict:
strip_and_reaggregateandoverflow_attributeboth intentionally cause attribute identity collisions and rely on inline spatial reaggregation to merge them. Reaggregation is only safe for Delta Sums and Gauges; Cumulative Sums, Histograms, ExponentialHistograms, and Summaries require stateful tracking that this processor doesn’t perform, so they fall back totag_onlyregardless of the configured mode.overflow_attributeis not a valid fallback for those types — collapsing multiple cumulative streams (or merging histogram buckets across mis-aligned scales) is what reaggregation has to solve in the first place. - Hash Collisions: Identity hashing uses
xxhashwith per-pcommon.ValueTypedispatch and a multiplicative key/value mix to remain order-independent. Theoretical hash collisions are possible but vanishingly rare for typical attribute maps. - Future work: Supporting reaggregation across multiple OTel SDK strategies (metrics SDK aggregation) — e.g. Cumulative Sum reaggregation with stateful tracking, Histogram bucket-aligned merging — is out of scope for this PR but tracked as a follow-up.
Troubleshooting
The processor emits internal metrics via the OTel SDK to help troubleshoot cardinality explosions:| Metric | Type | Description |
|---|---|---|
processor_cardinality_trackers.active | Gauge | Current tracked metric+label pairs across all shards |
processor_cardinality_labels.stripped | Counter | Attributes stripped or tagged per data point. Use rate() for spike detection. |
processor_cardinality_top.offenders | Gauge | Top N highest-delta trackers with metric_name and label_key attributes |
processor_cardinality_trackers.rejected | Counter | Trackers rejected after hitting max_tracker_count |
processor_cardinality_savings.estimated | Counter | Dollar value of series prevented from reaching your TSDB |
processor_cardinality_top.offenders internal metric on your collector’s /metrics endpoint.
Last generated: 2026-06-01