Skip to main content

Otelarrow Exporter

Status Available in: contrib, k8s Maintainers: @jmacd, @JakeDern Source: opentelemetry-collector-contrib

Supported Telemetry

Logs Metrics Traces

Overview

Exports telemetry data using OpenTelemetry Protocol with Apache Arrow components with support for both OpenTelemetry Protocol with Apache Arrow and standard OpenTelemetry Protocol (OTLP) protocol via gRPC.

Getting Started

The OpenTelemetry Protocol with Apache Arrow exporter combines the features and configuration syntax of the core OpenTelemetry Collector OTLP exporter component with additional support for the OpenTelemetry Protocol with Apache Arrow. OpenTelemetry Protocol with Apache Arrow supports column-oriented data transport using the Apache Arrow data format. This component converts OTLP data into an optimized representation and then sends batches of data using Apache Arrow to encode the stream. The OpenTelemetry Protocol with Apache Arrow receiver component contains logic to reverse the process used in this component. The use of an OpenTelemetry Protocol with Apache Arrow exporter-receiver pair is recommended when the network is expensive. Typically, expect to see a 50% reduction in bandwidth compared with the same data being sent using standard OTLP/gRPC with Zstd compression, batch sizes being equal. This component includes all the features and configuration of the core OTLP exporter, making it possible to upgrade from the core OTLP exporter component. This is as simple as replacing “otlp” with “otelarrow” as the component name in the collector configuration. To enable the OpenTelemetry Protocol with Apache Arrow exporter, include it in the list of exporters for a pipeline. The endpoint setting is required. The tls setting is required for insecure transport.
  • endpoint (no default): host:port to which the exporter is going to send OTLP trace data, using the gRPC protocol. The valid syntax is described in grpc’s naming.md. If a scheme of https is used then client transport security is enabled and overrides the insecure setting.
  • tls: see TLS Configuration Settings for the full set of available options.
Example:
exporters:
  otelarrow/secure:
    endpoint: external-collector:4317
    tls:
      cert_file: file.cert
      key_file: file.key
  otelarrow/insecure:
    endpoint: internal-collector:4317
    tls:
      insecure: true
By default, zstd compression is enabled at the gRPC level. See compression configuration below. To disable gRPC-level compression, configure “none”:
exporters:
  otelarrow:
    compression: none
    endpoint: ...
    tls: ...

Configuration

Several helper files are leveraged to provide additional capabilities automatically:

Arrow-specific Configuration

In the arrow configuration block, the following settings enable and disable the use of OpenTelemetry Protocol with Apache Arrow as opposed to standard OTLP.
  • disabled (default: false): disables use of Arrow, causing the exporter to use standard OTLP
  • disable_downgrade (default: false): prevents this exporter from using standard OTLP.
The following setting determines how long a stream will stay open. Stream lifetime is limited to 30 seconds because compression benefit is limited at that point and shorter streams make load balancing easier.
  • max_stream_lifetime (default: 30s): duration after which streams are recycled.
The following setting determines memory and CPU resources that the exporter will use:
  • num_streams (default: max(1, NumCPU()/2)): the number of concurrent Arrow streams
The num_streams default limits the exporter stream count to half the number of CPUs or 1, whichever is greater. When num_streams is greater than one, a configurable policy determines how load is assigned across streams to balance load. The supported policies are leastloaded, which picks the stream with the smallest number of outstanding requests, and leastloadedN for N <= num_streams, which limits the decision to a random subset of N streams.
  • prioritizer (default: “leastloaded”): policy for distributing load across multiple streams.

Matching Metadata Per Stream

The following configuration values allow for separate streams per unique metadata combinations:
  • metadata_keys (default = empty): When set, this exporter will create one arrow exporter instance per distinct combination of values in the client.Metadata.
  • metadata_cardinality_limit (default = 1000): When metadata_keys is not empty, this setting limits the number of unique combinations of metadata key values that will be processed over the lifetime of the exporter.

Network Configuration

This component uses round_robin by default as the gRPC load balancer. This can be modified using the balancer_name setting, for example, to configure the pick_first balancer:
exporters:
  otelarrow:
    balancer_name: pick_first
    endpoint: ...
    tls: ...
When the server or an intermediate proxy uses a keepalive setting, the Arrow-specific max_stream_lifetime setting is critical to avoiding abrupt termination of Arrow streams, which causes retries of the in-flight requests. The maximum stream lifetime should be set to a value less than the minimum of the server’s keepalive parameter (and any of the intermediate proxies), plus the export timeout.
exporters:
  otelarrow:
    timeout: 30s
    arrow:
      max_stream_lifetime: 9m30s
    endpoint: ...
    tls: ...
When this is configured, the stream will terminate cleanly without causing retries, with OK gRPC status. The corresponding otelarrowreceiver keepalive setting, that is compatible with the one above, reads:
receivers:
  otelarrow:
    protocols:
      grpc:
        keepalive:
          server_parameters:
            max_connection_age: 1m
            max_connection_age_grace: 10m

Exporter metrics

In addition to the standard exporterhelper and obsreport metrics, this component provides network-level measurement instruments which we anticipate will become part of exporterhelper and/or obsreport in the future. At the normal level of metrics detail:
  • otelcol_exporter_sent: uncompressed bytes sent, prior to compression
  • otelcol_exporter_sent_wire: compressed bytes sent, on the wire.
Arrow’s compression performance can be derived by dividing the average otelcol_exporter_sent value by the average otelcol_exporter_sent_wire value. At the detailed metrics detail level, information about the stream of data being returned to the exporter will be instrumented:
  • otelcol_exporter_recv: uncompressed bytes received, prior to compression
  • otelcol_exporter_recv_wire: compressed bytes received, on the wire.

Compression Configuration

The exporter supports configuring Zstd compression at both the gRPC and the Arrow level. The exporter metrics described above will be correct in either case. The default settings are subject to change as we gain experience. See the Collector compression comparison for general information about the choice of Zstd by default, for other general compression configuration and benchmark information. For the OpenTelemetry Protocol with Apache Arrow streams specifically, gRPC-level the Zstd compression level can be configured. However, there is an important caveat: the gRPC-Go library requires that compressor implementations be registered statically. These libraries use compressors named zstdarrow1, zstdarrow2, …, zstdarrow10, supporting 10 configurable compression levels. Note, however that these configurations are static and only one unique configuration is possible per level. It is possible to configure multiple OpenTelemetry Protocol with Apache Arrow exporters with different Zstd configuration simply by using distinct levels. Under arrow, the zstd sub-configuration has the following fields:
  • level: in the range 1-10 determines a number of defaults (default 5)
  • window_size_mib: size of the Zstd window in MiB, 0 indicates to determine based on level (default 0)
  • concurrency: controls background CPU used for compression, 0 indicates to let zstd library decide (default 1)
The exporter supports configuring compression at the Arrow columnar-protocol level.
  • payload_compression (default “zstd”): compression applied at the Arrow IPC level.
Compression at the Arrow level is enabled by default because it boosts compression slightly and helps Arrow payloads meet gRPC maximum request size limits. Compression settings at the Arrow IPC level cannot be further configured. For example, two exporters may be configured with multiple zstd configurations, provided they use different levels:
exporters:
  otelarrow/best:
    compression: zstd  # describes gRPC-level compression (default "zstd")
    arrow:
      zstd:
        level: 10      # describes gRPC-level compression level (default 5)
  otelarrow/fastest:
    compression: zstd
    arrow:
      zstd:
        level: 1       # 1 is the "fastest" compression level

Batching Configuration

This exporter supports built-in exporterhelper support for combined queue and batch behavior via the sending_queue settings. Note that the bytes-based batching is supported, but that the exporterhelper estimates batch sizes using the OTLP representation, not considering Arrow compression. In the default configuration, without a persistent storage extension, the exporter uses an in-memory queue and will respond to the caller before the export completes. In the sending_queue structure, the default settings set by this component are:
  • block_on_overflow: true
  • wait_for_result: false
  • sizer: items
  • queue_size: 100_000
  • batch::flush_timeout: 1s
  • batch::min_size: 1_000
  • batch::max_size: 1_500
  • num_consumers: 100
This indicates to use the in-memory queue, to return success to the client on acceptance and block when full, to allow 100 thousand items to export concurrently in up to 100 concurrent batches of 1000 to 1500 items each. For additional safety in the event of a Collector crash, set wait_for_result: true. The example below demonstrates how to use the in-memory queue for batching with this additional level of safety:
exporters:
  otelarrow:
    # ...
    sending_queue:
      enabled: true

          # Use wait_for_result: true for additional safety, otherwise
          # a collector crash will cause loss of data.
          wait_for_result: true

          # This is the default setting, it ensures the exporter will
          # block the pipeline (subject to deadline) instead of failing
          # fast when overflow data arrives.
          block_on_overflow: true

          # The queue will admit 1 million items into the queue and return
          # success before blocking new requests.
          sizer: items
          queue_size: 1_000_000

          # Use relatively large batches, improves compression.
      batch:
            flush_timeout: 1s
            min_size: 4_000
            max_size: 5_000

      # With max-size batches, we need 200 consumers to keep the
          # OTel-Arrow streams busy. There will be (num_consumers /
          # num_streams) pending requests per stream on average.
          num_consumers: 200

          # Optional persistent storage. If this is set, you
          # can safely use wait_for_result: false above.
          # storage: name_of_extension

Configuration

Example Configuration

endpoint: "1.2.3.4:1234"
compression: "none"
tls:
  ca_file: /var/lib/mycert.pem
timeout: 10s
sending_queue:
  enabled: true
  num_consumers: 2
  queue_size: 10
  batch:
    flush_timeout: 200ms
    min_size: 1000
    max_size: 10000
retry_on_failure:
  enabled: true
  initial_interval: 10s
  randomization_factor: 0.7
  multiplier: 1.3
  max_interval: 60s
  max_elapsed_time: 10m
auth:
  authenticator: nop
headers:
  "can you have a . here?": "F0000000-0000-0000-0000-000000000000"
  header1: "234"
  another: "somevalue"
keepalive:
  time: 20s
  timeout: 30s
  permit_without_stream: true
balancer_name: "experimental"
arrow:
  num_streams: 2
  disabled: false
  max_stream_lifetime: 2h
  payload_compression: "zstd"
  prioritizer: leastloaded8

Last generated: 2026-04-13