Isolationforest Processor

Available in: contrib Maintainers: @atoulme, @aarvee11 Source: opentelemetry-collector-contrib

Supported Telemetry

Overview

✨ Key Features

Capability	Description
Realtime Isolation Forest	Builds an ensemble of random trees over a sliding window of recent data and assigns a 0–1 anomaly score on ingestion (≈ O(log n) per point).
Multi‑signal support	Can be inserted into traces, metrics, logs pipelines – one config powers all three.
Per‑entity modelling	`features` config lets you maintain a separate model per unique combination of resource / attribute keys (e.g. per‑pod, per‑service).
Adaptive Window Sizing	Automatically adjusts window size based on traffic patterns, memory usage, and model stability for optimal performance and resource utilization.
Flexible output	• Add an attribute `iforest.is_anomaly=true` • Emit a gauge metric `iforest.anomaly_score` • Drop anomalous telemetry entirely.
Config‑driven	Tune tree count, subsample size, contamination rate, sliding‑window length, retraining interval, target metrics, and more – all in `collector.yml`.
Zero external deps	Pure Go implementation; runs wherever the Collector does (edge, gateway, or backend).

⚙️ How it Works

Training window – The processor keeps up to window_size of the most recent data points for every feature‑group.
Periodic (re‑)training – Every training_interval, it draws subsample_size points from that window and grows forest_size random isolation trees.
Scoring – Each new point is pushed through the forest. Shorter average path length ⇒ higher anomaly score.
Adaptive sizing – When enabled, window size automatically adjusts based on traffic velocity, memory usage, and model stability.
Post‑processing –
- If add_anomaly_score: true, a gauge metric iforest.anomaly_score is emitted with identical attributes/timestamp.
- If the score ≥ anomaly_threshold, the original span/metric/log is flagged with iforest.is_anomaly=true.
- If drop_anomalous_data: true, flagged items are removed from the batch instead of being forwarded.

Contamination rate – instead of hard‑coding anomaly_threshold, you can supply contamination_rate (expected % of outliers). The processor then auto‑derives a dynamic threshold equal to the (1 – contamination_rate) quantile of recent scores.

Performance is linear in forest_size and logarithmic in window_size; a default of 100 trees and a 1 k‑point window easily sustains 10–50 k points/s on a vCPU.

🔧 Configuration

Field	Type	Default	Notes
`forest_size`	int	`100`	Number of trees in the ensemble. Higher → smoother scores, more CPU.
`subsample_size`	int	`256`	Rows sampled to build each tree. Must be ≤ `window_size`.
`window_size`	int	`1000`	Sliding window of recent data maintained per feature‑group.
`contamination_rate`	float (0–1)	`0.10`	Fraction of points expected to be outliers; used to auto‑tune threshold.
`anomaly_threshold`	float (0–1)	derived	Manual override – score ≥ this ⇒ anomaly. Ignored if `contamination_rate` set.
`training_interval`	duration	`5m`	Model is retrained no sooner than this interval.
`features`	[]string	`[]`	Resource/attribute keys that define grouping. Blank ⇒ single global model.
`metrics_to_analyze`	[]string	`[]`	Only these metric names are scored (metrics pipeline only). Blank ⇒ all.
`add_anomaly_score`	bool	`false`	Emit `iforest.anomaly_score` metric.
`drop_anomalous_data`	bool	`false`	Remove anomalous items from the batch instead of forwarding.
`adaptive_window`	object	`null`	Enables adaptive window sizing (see Adaptive Window section below).

🔄 Adaptive Window Configuration

When enabled, the processor automatically adjusts window size based on traffic patterns and resource constraints:

Field	Type	Default	Notes
`enabled`	bool	`false`	Enable adaptive window sizing.
`min_window_size`	int	`1000`	Minimum window size (safety bound).
`max_window_size`	int	`100000`	Maximum window size (memory protection).
`memory_limit_mb`	int	`256`	Shrink window when memory usage exceeds this limit.
`adaptation_rate`	float	`0.1`	Rate of window size changes (0.0-1.0).
`velocity_threshold`	float	`50.0`	Samples/sec threshold for triggering window growth.
`stability_check_interval`	duration	`5m`	How often to evaluate model stability for expansion.

See the sample below for context.

📄 Sample `config.yml`

receivers:
  otlp:
    protocols:
      grpc:            # → listen on 0.0.0.0:4317

processors:
  isolationforest:
    # ─── core algorithm parameters ────────────────────────────────
    forest_size:        150          # trees per forest
    subsample_size:     512          # rows per tree
    contamination_rate: 0.05         # 5 % expected outliers
    threshold:          0.0          # 0 ⇒ let contamination_rate drive the cut-off
    mode:               both         # enrich + filter (see docstring)
    training_window:    24h          # window of data kept for training
    update_frequency:   5m           # retrain every 5 minutes
    min_samples:        1000         # wait until this many points seen

    # ─── where to write results on each data point ───────────────
    score_attribute:          anomaly.isolation_score   # float 0–1
    classification_attribute: anomaly.is_anomaly        # bool

    # ─── which numeric features the model should look at ─────────
    features:
      traces:  [duration]           # span duration (µs / ns)
      metrics: [value]              # the sample’s numeric value
      logs:    [severity_number]    # log severity enum

    # ─── performance guard-rails (optional) ──────────────────────
    performance:
      max_memory_mb:     512
      batch_size:        1000
      parallel_workers:  4

exporters:
  prometheus:
    endpoint: "0.0.0.0:9464"   # Prom-server will scrape /metrics here
    send_timestamps: true      # (field is valid in the standard exporter)

service:
  pipelines:
    metrics:
      receivers:  [otlp]
      processors: [isolationforest]
      exporters:  [prometheus]

Note: Use routingconnector to seggregate the different kind of spans(db, messaging etc.) and send them to separate isolationforestprocessor deployments so the anomaly detection is pertianing to the respective category of signals.

What the example does

Signal	What’s scored	Feature grouping	Output	Notes
Traces	Span duration (ns)	`service.name`, `k8s.pod.name`	`iforest.is_anomaly` attr + optional drop	Use a span/trace exporter to route anomalies.
Metrics	Only `system.cpu.utilization`, `system.memory.utilization`	Same	Attribute + score metric	The score appears as `iforest.anomaly_score` gauge.
Logs	Size of the log payload (bytes) by default	Same	Attribute flag	You can expose a numeric log attribute and configure the processor to use that via code changes.

🚀 Best Practices

Tune forest_size vs. latency – start with 100 trees; raise to 200–300 if scores look noisy.
Use per‑entity models – add features (service, pod, host) to avoid global comparisons across very different series.
Let contamination drive threshold – set contamination_rate to the % of traffic you’re comfortable labelling outlier; avoid hand‑tuning anomaly_threshold.
Use adaptive window sizing – enable for dynamic workloads; the processor will automatically grow windows during high traffic and shrink under memory pressure.
Route anomalies – keep drop_anomalous_data=false and add a simple [routing‑processor] downstream to ship anomalies to a dedicated exporter or topic.
Monitor model health – the emitted iforest.anomaly_score metric is perfect for a Grafana panel; watch its distribution and adapt window / contamination accordingly.

🏗️ Internals (High‑Level)

               ┌───────────────────────────────────────────────────┐
               │ IsolationForestProcessor (per Collector instance) │
               │ ───────────────────────────────────────────────── │
               │  • Sliding window (per feature‑group)             │
               │  • Forest of N trees (per feature‑group)          │
Telemetry ───▶ │  • Score calculator & anomaly decision            │ ───▶  Next processor/exporter
               │  • Adaptive window sizing (optional)              │
               └───────────────────────────────────────────────────┘

Training cost: O(current_window_size × forest_size × log subsample_size) every training_interval Scoring cost: O(forest_size × log subsample_size) per item Note: With adaptive window sizing enabled, current_window_size dynamically adjusts between min_window_size and max_window_size based on traffic patterns and memory constraints, making training costs adaptive to workload conditions.

🤝 Contributing

Bugs / Questions – please open an issue in the fork first.
Recently added: Adaptive window sizing for dynamic traffic patterns.
Planned enhancements
- Multivariate scoring (multiple numeric attributes per point).
- Expose Prometheus counters for training time / CPU cost.

PRs welcome – please include unit tests and doc updates.

Configuration

Example Configuration

processors:
  isolationforest:
    forest_size: 50
    mode: "enrich"
    threshold: 0.75
    features:
      traces: ["duration", "error"]

Last generated: 2026-07-06

​Isolationforest Processor

​Supported Telemetry

​Overview

​✨ Key Features

​⚙️ How it Works

​🔧 Configuration

​🔄 Adaptive Window Configuration

​📄 Sample config.yml

​What the example does

​🚀 Best Practices

​🏗️ Internals (High‑Level)

​🤝 Contributing

​Configuration

​Example Configuration