Documentation Index
Fetch the complete documentation index at: https://otel.fyi/llms.txt
Use this file to discover all available pages before exploring further.
Isolationforest Processor
contrib
Maintainers: @atoulme
Source: opentelemetry-collector-contrib
Supported Telemetry
Overview
✨ Key Features
| Capability | Description |
|---|---|
| Realtime Isolation Forest | Builds an ensemble of random trees over a sliding window of recent data and assigns a 0–1 anomaly score on ingestion (≈ O(log n) per point). |
| Multi‑signal support | Can be inserted into traces, metrics, logs pipelines – one config powers all three. |
| Per‑entity modelling | features config lets you maintain a separate model per unique combination of resource / attribute keys (e.g. per‑pod, per‑service). |
| Adaptive Window Sizing | Automatically adjusts window size based on traffic patterns, memory usage, and model stability for optimal performance and resource utilization. |
| Flexible output | • Add an attribute iforest.is_anomaly=true • Emit a gauge metric iforest.anomaly_score • Drop anomalous telemetry entirely. |
| Config‑driven | Tune tree count, subsample size, contamination rate, sliding‑window length, retraining interval, target metrics, and more – all in collector.yml. |
| Zero external deps | Pure Go implementation; runs wherever the Collector does (edge, gateway, or backend). |
⚙️ How it Works
-
Training window – The processor keeps up to
window_sizeof the most recent data points for every feature‑group. -
Periodic (re‑)training – Every
training_interval, it drawssubsample_sizepoints from that window and growsforest_sizerandom isolation trees. - Scoring – Each new point is pushed through the forest. Shorter average path length ⇒ higher anomaly score.
- Adaptive sizing – When enabled, window size automatically adjusts based on traffic velocity, memory usage, and model stability.
-
Post‑processing –
- If
add_anomaly_score: true, a gauge metriciforest.anomaly_scoreis emitted with identical attributes/timestamp. - If the score ≥
anomaly_threshold, the original span/metric/log is flagged withiforest.is_anomaly=true. - If
drop_anomalous_data: true, flagged items are removed from the batch instead of being forwarded.
- If
Contamination rate – instead of hard‑codingPerformance is linear inanomaly_threshold, you can supplycontamination_rate(expected % of outliers). The processor then auto‑derives a dynamic threshold equal to the(1 – contamination_rate)quantile of recent scores.
forest_size and logarithmic in window_size; a default of 100 trees and a 1 k‑point window easily sustains 10–50 k points/s on a vCPU.
🔧 Configuration
| Field | Type | Default | Notes |
|---|---|---|---|
forest_size | int | 100 | Number of trees in the ensemble. Higher → smoother scores, more CPU. |
subsample_size | int | 256 | Rows sampled to build each tree. Must be ≤ window_size. |
window_size | int | 1000 | Sliding window of recent data maintained per feature‑group. |
contamination_rate | float (0–1) | 0.10 | Fraction of points expected to be outliers; used to auto‑tune threshold. |
anomaly_threshold | float (0–1) | derived | Manual override – score ≥ this ⇒ anomaly. Ignored if contamination_rate set. |
training_interval | duration | 5m | Model is retrained no sooner than this interval. |
features | []string | [] | Resource/attribute keys that define grouping. Blank ⇒ single global model. |
metrics_to_analyze | []string | [] | Only these metric names are scored (metrics pipeline only). Blank ⇒ all. |
add_anomaly_score | bool | false | Emit iforest.anomaly_score metric. |
drop_anomalous_data | bool | false | Remove anomalous items from the batch instead of forwarding. |
adaptive_window | object | null | Enables adaptive window sizing (see Adaptive Window section below). |
🔄 Adaptive Window Configuration
When enabled, the processor automatically adjusts window size based on traffic patterns and resource constraints:| Field | Type | Default | Notes |
|---|---|---|---|
enabled | bool | false | Enable adaptive window sizing. |
min_window_size | int | 1000 | Minimum window size (safety bound). |
max_window_size | int | 100000 | Maximum window size (memory protection). |
memory_limit_mb | int | 256 | Shrink window when memory usage exceeds this limit. |
adaptation_rate | float | 0.1 | Rate of window size changes (0.0-1.0). |
velocity_threshold | float | 50.0 | Samples/sec threshold for triggering window growth. |
stability_check_interval | duration | 5m | How often to evaluate model stability for expansion. |
📄 Sample config.yml
Note: Useroutingconnectorto seggregate the different kind of spans(db, messaging etc.) and send them to separateisolationforestprocessordeployments so the anomaly detection is pertianing to the respective category of signals.
What the example does
| Signal | What’s scored | Feature grouping | Output | Notes |
|---|---|---|---|---|
| Traces | Span duration (ns) | service.name, k8s.pod.name | iforest.is_anomaly attr + optional drop | Use a span/trace exporter to route anomalies. |
| Metrics | Only system.cpu.utilization, system.memory.utilization | Same | Attribute + score metric | The score appears as iforest.anomaly_score gauge. |
| Logs | Size of the log payload (bytes) by default | Same | Attribute flag | You can expose a numeric log attribute and configure the processor to use that via code changes. |
🚀 Best Practices
- Tune
forest_sizevs. latency – start with 100 trees; raise to 200–300 if scores look noisy. - Use per‑entity models – add
features(service, pod, host) to avoid global comparisons across very different series. - Let contamination drive threshold – set
contamination_rateto the % of traffic you’re comfortable labelling outlier; avoid hand‑tuninganomaly_threshold. - Use adaptive window sizing – enable for dynamic workloads; the processor will automatically grow windows during high traffic and shrink under memory pressure.
- Route anomalies – keep
drop_anomalous_data=falseand add a simple [routing‑processor] downstream to ship anomalies to a dedicated exporter or topic. - Monitor model health – the emitted
iforest.anomaly_scoremetric is perfect for a Grafana panel; watch its distribution and adapt window / contamination accordingly.
🏗️ Internals (High‑Level)
training_interval
Scoring cost: O(forest_size × log subsample_size) per item
Note: With adaptive window sizing enabled, current_window_size dynamically adjusts between min_window_size and max_window_size based on traffic patterns and memory constraints, making training costs adaptive to workload conditions.
🤝 Contributing
- Bugs / Questions – please open an issue in the fork first.
- Recently added: Adaptive window sizing for dynamic traffic patterns.
-
Planned enhancements
- Multivariate scoring (multiple numeric attributes per point).
- Expose Prometheus counters for training time / CPU cost.
Configuration
Example Configuration
Last generated: 2026-04-20