Isolationforest Processor
contrib
Maintainers: @atoulme
Source: opentelemetry-collector-contrib
Supported Telemetry
Overview
β¨ Key Features
| Capability | Description |
|---|---|
| Realtime Isolation Forest | Builds an ensemble of random trees over a sliding window of recent data and assigns a 0β1 anomaly score on ingestion (β O(log n) per point). |
| Multiβsignal support | Can be inserted into traces, metrics, logs pipelines β one config powers all three. |
| Perβentity modelling | features config lets you maintain a separate model per unique combination of resource / attribute keys (e.g. perβpod, perβservice). |
| Adaptive Window Sizing | Automatically adjusts window size based on traffic patterns, memory usage, and model stability for optimal performance and resource utilization. |
| Flexible output | β’ Add an attribute iforest.is_anomaly=true β’ Emit a gauge metric iforest.anomaly_score β’ Drop anomalous telemetry entirely. |
| Configβdriven | Tune tree count, subsample size, contamination rate, slidingβwindow length, retraining interval, target metrics, and more β all in collector.yml. |
| Zero external deps | Pure Go implementation; runs wherever the Collector does (edge, gateway, or backend). |
βοΈ How it Works
-
Training window β The processor keeps up to
window_sizeof the most recent data points for every featureβgroup. -
Periodic (reβ)training β Every
training_interval, it drawssubsample_sizepoints from that window and growsforest_sizerandom isolation trees. - Scoring β Each new point is pushed through the forest. Shorter average path length β higher anomaly score.
- Adaptive sizing β When enabled, window size automatically adjusts based on traffic velocity, memory usage, and model stability.
-
Postβprocessing β
- If
add_anomaly_score: true, a gauge metriciforest.anomaly_scoreis emitted with identical attributes/timestamp. - If the score β₯
anomaly_threshold, the original span/metric/log is flagged withiforest.is_anomaly=true. - If
drop_anomalous_data: true, flagged items are removed from the batch instead of being forwarded.
- If
Contamination rate β instead of hardβcodingPerformance is linear inanomaly_threshold, you can supplycontamination_rate(expected % of outliers). The processor then autoβderives a dynamic threshold equal to the(1 β contamination_rate)quantile of recent scores.
forest_size and logarithmic in window_size; a default of 100 trees and a 1 kβpoint window easily sustains 10β50 k points/s on a vCPU.
π§ Configuration
| Field | Type | Default | Notes |
|---|---|---|---|
forest_size | int | 100 | Number of trees in the ensemble. Higher β smoother scores, more CPU. |
subsample_size | int | 256 | Rows sampled to build each tree. Must be β€ window_size. |
window_size | int | 1000 | Sliding window of recent data maintained per featureβgroup. |
contamination_rate | float (0β1) | 0.10 | Fraction of points expected to be outliers; used to autoβtune threshold. |
anomaly_threshold | float (0β1) | derived | Manual override β score β₯ this β anomaly. Ignored if contamination_rate set. |
training_interval | duration | 5m | Model is retrained no sooner than this interval. |
features | []string | [] | Resource/attribute keys that define grouping. Blank β single global model. |
metrics_to_analyze | []string | [] | Only these metric names are scored (metrics pipeline only). Blank β all. |
add_anomaly_score | bool | false | Emit iforest.anomaly_score metric. |
drop_anomalous_data | bool | false | Remove anomalous items from the batch instead of forwarding. |
adaptive_window | object | null | Enables adaptive window sizing (see Adaptive Window section below). |
π Adaptive Window Configuration
When enabled, the processor automatically adjusts window size based on traffic patterns and resource constraints:| Field | Type | Default | Notes |
|---|---|---|---|
enabled | bool | false | Enable adaptive window sizing. |
min_window_size | int | 1000 | Minimum window size (safety bound). |
max_window_size | int | 100000 | Maximum window size (memory protection). |
memory_limit_mb | int | 256 | Shrink window when memory usage exceeds this limit. |
adaptation_rate | float | 0.1 | Rate of window size changes (0.0-1.0). |
velocity_threshold | float | 50.0 | Samples/sec threshold for triggering window growth. |
stability_check_interval | duration | 5m | How often to evaluate model stability for expansion. |
π Sample config.yml
Note: Useroutingconnectorto seggregate the different kind of spans(db, messaging etc.) and send them to separateisolationforestprocessordeployments so the anomaly detection is pertianing to the respective category of signals.
What the example does
| Signal | Whatβs scored | Feature grouping | Output | Notes |
|---|---|---|---|---|
| Traces | Span duration (ns) | service.name, k8s.pod.name | iforest.is_anomaly attr + optional drop | Use a span/trace exporter to route anomalies. |
| Metrics | Only system.cpu.utilization, system.memory.utilization | Same | Attribute + score metric | The score appears as iforest.anomaly_score gauge. |
| Logs | Size of the log payload (bytes) by default | Same | Attribute flag | You can expose a numeric log attribute and configure the processor to use that via code changes. |
π Best Practices
- Tune
forest_sizevs. latency β start with 100 trees; raise to 200β300 if scores look noisy. - Use perβentity models β add
features(service, pod, host) to avoid global comparisons across very different series. - Let contamination drive threshold β set
contamination_rateto the % of traffic youβre comfortable labelling outlier; avoid handβtuninganomaly_threshold. - Use adaptive window sizing β enable for dynamic workloads; the processor will automatically grow windows during high traffic and shrink under memory pressure.
- Route anomalies β keep
drop_anomalous_data=falseand add a simple [routingβprocessor] downstream to ship anomalies to a dedicated exporter or topic. - Monitor model health β the emitted
iforest.anomaly_scoremetric is perfect for a Grafana panel; watch its distribution and adapt window / contamination accordingly.
ποΈ Internals (HighβLevel)
training_interval
Scoring cost: O(forest_size Γ log subsample_size) per item
Note: With adaptive window sizing enabled, current_window_size dynamically adjusts between min_window_size and max_window_size based on traffic patterns and memory constraints, making training costs adaptive to workload conditions.
π€ Contributing
- Bugs / Questions β please open an issue in the fork first.
- Recently added: Adaptive window sizing for dynamic traffic patterns.
-
Planned enhancements
- Multivariate scoring (multiple numeric attributes per point).
- Expose Prometheus counters for training time / CPU cost.
Configuration
Example Configuration
Last generated: 2026-04-13