Genainormalizer Processor

Available in: contrib Maintainers: @TylerHelmuth, @kylehounslow Source: opentelemetry-collector-contrib

Supported Telemetry

Overview

The GenAI Normalizer Processor rewrites attributes on spans emitted by non-OTel GenAI instrumentation libraries into the OTel GenAI Semantic Conventions.

Configuration

Built-in sources:

openinference — OpenInference instrumentation
openllmetry — OpenLLMetry (Traceloop) instrumentation

Any other name is a user-defined source: the entry’s mappings and value_mappings drive the normalization.

Top-level fields

Field	Type	Default	Description
`sources`	list of source	required	Ordered list of sources to normalize. At least one must be specified. Each span is processed by every source in the order given.

Source

Each entry in sources accepts the following fields:

Field	Type	Default	Description
`name`	string	required	Source identifier. Built-in names (`openinference`, `openllmetry`) use pre-defined mapping tables. Any other name is a user-defined source. Names must be unique across `sources`.
`remove_originals`	bool	`false`	Delete source attributes after mapping.
`overwrite`	bool	`false`	When `true`, overwrite the target attribute if it already exists. When `false`, skip the mapping.
`mappings`	map[string]string	required for user-defined sources, rejected on built-ins	Source-attribute → target-attribute rename table. See User-defined sources.
`value_mappings`	map[string]map[string]string	user-defined sources only	Per-target value-fold rules, keyed by post-rename target attribute. See User-defined sources.

Scope

Normalization is applied to:

Span attributes

The following are not modified:

Resource attributes
Scope attributes
Span event attributes
Span link attributes

Schema URL

When a mapping fires on a span, the enclosing ScopeSpans.schema_url is set to the OTel semantic-conventions version this processor targets (https://opentelemetry.io/schemas/1.40.0). An existing schema_url is overwritten. ResourceSpans.schema_url is never modified.

Type handling

After renaming, the processor enforces target attribute types against the OTel GenAI semantic conventions, derived from the typed constructor functions in go.opentelemetry.io/otel/semconv. For target keys with a typed primitive constructor in semconv (gen_ai.usage.input_tokens int, gen_ai.request.temperature float64, gen_ai.request.model string, gen_ai.response.finish_reasons []string, etc.), the processor coerces between compatible scalar types and drops the rename when coercion is unsafe:

string -> int: parsed via strconv.ParseInt; non-numeric strings drop.
string -> float64: parsed via strconv.ParseFloat; non-numeric strings drop.
string -> []string: wrapped into a single-element slice.
int / double / bool -> string: converted to canonical string form.
structured source (map / slice) -> primitive target: dropped (would lose information).

For target keys defined as any in the spec (gen_ai.input.messages, gen_ai.output.messages, gen_ai.tool.definitions, gen_ai.operation.name enum, etc.), the processor preserves whatever shape the source emitted. Backends that require a uniform type for these targets should pair this processor with the transformprocessor for OTTL-based shape normalization.

Examples

Default configuration:

processors:
  gen_ai_normalizer:
    sources:
      - name: openinference

Delete source attributes after mapping:

processors:
  gen_ai_normalizer:
    sources:
      - name: openinference
        remove_originals: true

Overwrite existing target attributes:

processors:
  gen_ai_normalizer:
    sources:
      - name: openinference
        remove_originals: true
        overwrite: true

Normalize both OpenInference and OpenLLMetry:

processors:
  gen_ai_normalizer:
    sources:
      - name: openinference
        remove_originals: true
      - name: openllmetry
        remove_originals: true

User-defined renames and value foldings (see User-defined sources):

processors:
  gen_ai_normalizer:
    sources:
      - name: my_vendor
        remove_originals: true
        mappings:
          my_vendor.model: gen_ai.request.model
          my_vendor.tokens.in: gen_ai.usage.input_tokens
        value_mappings:
          gen_ai.operation.name:
            chat_completion: chat
            tool_invoke: execute_tool

User-defined sources

Any name that is not a built-in (openinference, openllmetry) is a user-defined source. The entry’s mappings and value_mappings drive the normalization. User-defined sources reuse the same remove_originals, overwrite, and type-coercion semantics as the built-in sources.

Field	Type	Description
`mappings`	map[string]string	Required. Source-attribute → target-attribute rename table. Must be non-empty.
`value_mappings`	map[string]map[string]string	Optional. Outer key is the post-rename target attribute name; inner map folds source string values onto preferred target string values. Source-value lookups are exact-match. Non-string sources, missing rules, and unmatched source values pass through verbatim.

Validation rules:

mappings must be non-empty on any user-defined source.
mappings and value_mappings are rejected on built-in sources.
Each value_mappings outer key must appear as a target in mappings (catches unreachable rules at config time).
name must be unique across sources.

User-defined mappings landing on typed gen_ai.* targets get the same int/float/string/bool/[]string coercion as built-in mappings (see Type handling). User-defined mappings landing on non-gen_ai.* targets pass through verbatim. Future built-in sources. New built-in source names may be added in future releases. This is not treated as a breaking change. To avoid collisions, namespace user-defined names with a vendor or company prefix (e.g. custom.anthropic, acme.internal).

Performance

For user-defined sources, cost grows with the number of attributes on each span, not with the size of the mappings table. The processor walks every attribute on every span, and looking up a single attribute in mappings is constant time. Real-world spans carry tens to a few hundred attributes and process in microseconds. Spans with thousands of attributes still work, but know that per-span cost grows proportionally to the number of attributes in each span. See processor_benchmark_test.go for the benchmark suite. Run with go test -bench=. -benchmem.

Built-in mappings

`openinference`

Attribute renames:

Source attribute	Target attribute
`llm.token_count.prompt`	`gen_ai.usage.input_tokens`
`llm.token_count.completion`	`gen_ai.usage.output_tokens`
`llm.model_name`	`gen_ai.request.model`
`llm.provider`	`gen_ai.provider.name`
`llm.input_messages.N.message.*`	`gen_ai.input.messages` (reconstructed as JSON, see below)
`llm.output_messages.N.message.*`	`gen_ai.output.messages` (reconstructed as JSON, see below)
`embedding.model_name`	`gen_ai.request.model`
`tool.name`	`gen_ai.tool.name`
`tool.description`	`gen_ai.tool.description`
`tool_call.function.arguments`	`gen_ai.tool.call.arguments`
`tool_call.id`	`gen_ai.tool.call.id`
`reranker.model_name`	`gen_ai.request.model`
`agent.name`	`gen_ai.agent.name`
`session.id`	`gen_ai.conversation.id`
`openinference.span.kind`	`gen_ai.operation.name` (with value mapping, see below)

See internal/openinference/mappings.go for the canonical map. Source reference: OpenInference semantic conventions.

Message reconstruction

OpenInference represents messages as flattened indexed span attributes (e.g., llm.input_messages.0.message.role, llm.input_messages.0.message.content). The processor reconstructs these into a single JSON string attribute following the GenAI input messages schema and sets it as gen_ai.input.messages (or gen_ai.output.messages). Supported OpenInference message fields:

llm.{input,output}_messages.N.message.role
llm.{input,output}_messages.N.message.content
llm.{input,output}_messages.N.message.name — emitted as the name field on the message object when present
llm.{input,output}_messages.N.message.tool_calls.M.tool_call.id
llm.{input,output}_messages.N.message.tool_calls.M.tool_call.function.name
llm.{input,output}_messages.N.message.tool_calls.M.tool_call.function.arguments
llm.{input,output}_messages.N.message.tool_call_id

Not supported: multimodal content arrays (llm.{input,output}_messages.N.message.contents.M.message_content.*). OpenInference’s indexed content array format for images, audio, and other modalities is not reconstructed. Only the flat message.content string field is handled. Multimodal spans pass through with the original flattened attributes intact.

Role inference

Roles are constrained to the GenAI semconv enum: system, user, assistant, tool. When the source role is absent, empty, or not one of these values, the processor infers it from context:

Condition	Inferred role
`tool_call_id` is present	`tool` (always, regardless of source role)
`tool_calls` are present	`assistant`
Neither present	`user`

GenAI semconv part types not produced

The following part types are defined in the GenAI input messages schema and output messages schema but are not emitted by this processor:

Part type	Applies to	Reason not produced
`blob`	input & output	Multimodal — OpenInference uses the `message.contents.M.message_content.*` indexed array, which is not reconstructed (see multimodal limitation above)
`file`	input & output	Same multimodal limitation
`uri`	input & output	Same multimodal limitation
`reasoning`	input & output	OpenInference carries reasoning inside the `message.contents` indexed array (type `"reasoning"`), not as a top-level message field; blocked by the same multimodal limitation
`server_tool_call` (incl. nested `GenericServerToolCall`)	input & output	OpenInference does not model server-side tool calls (e.g. `code_interpreter`, `web_search`) as flattened span attributes
`server_tool_call_response` (incl. nested `GenericServerToolCallResponse`)	input & output	Same — no OpenInference source attributes exist for server tool responses
`compaction`	input & output	OpenInference does not emit compaction/context-window summary data
`GenericPart`	input & output	Extensibility type — no OpenInference source attributes to map from
`finish_reason` (non-empty)	output only	OpenInference has no per-message finish reason; the field is required by the schema and always emitted as `""`. Use the span-level `gen_ai.response.finish_reasons` instead

Output format

Messages are serialized as a JSON array of objects. Input messages (gen_ai.input.messages) follow the GenAI input messages schema; output messages (gen_ai.output.messages) follow the GenAI output messages schema. Example gen_ai.input.messages:

[
  {
    "role": "user",
    "parts": [{"type": "text", "content": "Hello"}]
  },
  {
    "role": "tool",
    "parts": [{"type": "tool_call_response", "id": "call_1", "response": "sunny, 22C"}]
  }
]

Example gen_ai.output.messages:

[
  {
    "role": "assistant",
    "parts": [{"type": "tool_call", "id": "call_1", "name": "get_weather", "arguments": {"city": "Berlin"}}],
    "finish_reason": ""
  }
]

The name field (participant name) is included on a message object only when the source carries message.name; otherwise it is omitted. Example:

[
  {
    "role": "assistant",
    "name": "my_agent",
    "parts": [{"type": "text", "content": "Done"}],
    "finish_reason": ""
  }
]

The finish_reason field is always present on output messages (required by the schema) and always set to "" because OpenInference does not carry per-message finish reasons. Use gen_ai.response.finish_reasons (a span-level attribute) for the model’s stop reason. Messages are ordered by their numeric index N. The arguments field is parsed as JSON if valid; otherwise kept as a raw string.

`openllmetry`

Attribute renames:

Source attribute	Target attribute	Notes
`llm.usage.prompt_tokens`	`gen_ai.usage.input_tokens`
`llm.usage.completion_tokens`	`gen_ai.usage.output_tokens`
`llm.request.model`	`gen_ai.request.model`
`llm.response.model`	`gen_ai.response.model`
`llm.request.max_tokens`	`gen_ai.request.max_tokens`
`llm.request.temperature`	`gen_ai.request.temperature`
`llm.request.top_p`	`gen_ai.request.top_p`
`llm.top_k`	`gen_ai.request.top_k`
`llm.frequency_penalty`	`gen_ai.request.frequency_penalty`
`llm.presence_penalty`	`gen_ai.request.presence_penalty`
`llm.chat.stop_sequences`	`gen_ai.request.stop_sequences`
`llm.request.functions`	`gen_ai.tool.definitions`	source-shape preserved (Type handling)
`llm.response.finish_reason`	`gen_ai.response.finish_reasons`	string wrapped into a single-element string[]
`llm.response.stop_reason`	`gen_ai.response.finish_reasons`	string wrapped into a single-element string[]
`llm.request.type`	`gen_ai.operation.name`	with value mapping, see below
`traceloop.span.kind`	`gen_ai.operation.name`	with value mapping, see below
`traceloop.entity.name`	`gen_ai.agent.name`
`traceloop.entity.input`	`gen_ai.input.messages`	source-shape preserved (Type handling)
`traceloop.entity.output`	`gen_ai.output.messages`	source-shape preserved (Type handling)

Coverage: this table covers the most common OpenLLMetry attributes. OpenLLMetry attributes not listed pass through unchanged. Open an issue if a missing attribute is blocking your migration. OpenLLMetry instrumentation typically emits one of each collision pair (llm.response.finish_reason xor llm.response.stop_reason; llm.request.type xor traceloop.span.kind). When both attributes in a pair are present on a span, the resolved value at the target key is undefined. See internal/openllmetry/mappings.go for the canonical map. Source reference: OpenLLMetry semantic conventions.

Value transformations

When a built-in mapping lands on gen_ai.operation.name, the string value is normalized to the OTel GenAI enum. Built-in lookups are case-insensitive; user-defined value_mappings are exact-match.

Source	Source attribute	Source value	Target value
`openinference`	`openinference.span.kind`	`LLM`	`chat`
`openinference`	`openinference.span.kind`	`EMBEDDING`	`embeddings`
`openinference`	`openinference.span.kind`	`CHAIN`	`invoke_agent`
`openinference`	`openinference.span.kind`	`RETRIEVER`	`retrieval`
`openinference`	`openinference.span.kind`	`RERANKER`	`retrieval`
`openinference`	`openinference.span.kind`	`TOOL`	`execute_tool`
`openinference`	`openinference.span.kind`	`AGENT`	`invoke_agent`
`openinference`	`openinference.span.kind`	`PROMPT`	`text_completion`
`openllmetry`	`traceloop.span.kind`	`workflow`	`invoke_workflow`
`openllmetry`	`traceloop.span.kind`	`task`	`invoke_agent`
`openllmetry`	`traceloop.span.kind`	`agent`	`invoke_agent`
`openllmetry`	`traceloop.span.kind`	`tool`	`execute_tool`
`openllmetry`	`llm.request.type`	`completion`	`text_completion`
`openllmetry`	`llm.request.type`	`chat`	`chat`
`openllmetry`	`llm.request.type`	`rerank`	`retrieval`
`openllmetry`	`llm.request.type`	`embedding`	`embeddings`

When a mapped attribute lands on gen_ai.response.finish_reasons with a string source value, the value is wrapped into a single-element string[] to match the OTel GenAI spec type. Target reference: OTel GenAI operation names.

Relationship to other processors

The schemaprocessor translates between OTel semantic convention versions using schema_url and the OTel schema file format. Source conventions normalized by this processor do not set schema_url and do not publish OTel schema files, so schemaprocessor cannot be used for this translation today. The transformprocessor can rewrite attributes via OTTL but requires users to author and maintain the full mapping set themselves. This processor ships the mappings built-in. For pure value-mutation without renames, prefer transformprocessor.

Configuration

Example Configuration

# Full config exercising the supported source and its per-source fields.
gen_ai_normalizer:
  sources:
    - name: openinference
      remove_originals: true
      overwrite: false

# Sources omitted entirely: validation error (at least one source required).
gen_ai_normalizer/empty:

# Sources with only openinference defaults.
gen_ai_normalizer/openinference_only:
  sources:
    - name: openinference

# Sources with only openllmetry defaults.
gen_ai_normalizer/openllmetry_only:
  sources:
    - name: openllmetry

# Both sources, applied in listed order.
gen_ai_normalizer/openinference_and_openllmetry:
  sources:
    - name: openinference
      remove_originals: true
    - name: openllmetry
      remove_originals: true

# Explicit empty sources list: validation error.
gen_ai_normalizer/empty_sources:
  sources: []

# Same source listed twice: validation error.
gen_ai_normalizer/duplicate_source:
  sources:
    - name: openinference
    - name: openinference

# Single user-defined source with two mappings.
gen_ai_normalizer/user_defined_only:
  sources:
    - name: my_vendor
      remove_originals: true
      mappings:
        my_vendor.model: gen_ai.request.model
        my_vendor.tokens.in: gen_ai.usage.input_tokens

# Built-in source followed by a user-defined source.
gen_ai_normalizer/user_defined_with_builtin:
  sources:
    - name: openinference
      remove_originals: true
    - name: my_vendor
      remove_originals: true
      mappings:
        my_vendor.model: gen_ai.request.model

# Two distinct user-defined sources in one config; built-in sources still
# reject duplicates.
gen_ai_normalizer/multiple_user_defined:
  sources:
    - name: vendor_a
      remove_originals: true
      mappings:
        vendor_a.model: gen_ai.request.model
    - name: vendor_b
      remove_originals: true
      overwrite: true
      mappings:
        vendor_b.model: gen_ai.request.model

# User-defined source with empty mappings: validation error.
gen_ai_normalizer/user_defined_empty_mappings:
  sources:
    - name: my_vendor

# Built-in source with mappings set: validation error
# (mappings only valid on user-defined sources).
gen_ai_normalizer/openinference_with_mappings:
  sources:
    - name: openinference
      mappings:
        foo: gen_ai.request.model

# User-defined source with a value_mappings key that is not a target
# in mappings: validation error (the rule would never fire).
gen_ai_normalizer/user_defined_unreachable_value_mapping:
  sources:
    - name: my_vendor
      mappings:
        my_vendor.model: gen_ai.request.model
      value_mappings:
        gen_ai.operation.name:
          chat_completion: chat

# User-defined source with mappings + value_mappings folding onto an OTel
# GenAI enum target.
gen_ai_normalizer/user_defined_with_value_mappings:
  sources:
    - name: my_vendor
      remove_originals: true
      mappings:
        my_vendor.op: gen_ai.operation.name
      value_mappings:
        gen_ai.operation.name:
          chat_completion: chat
          tool_invoke: execute_tool

Last generated: 2026-07-06

​Genainormalizer Processor

​Supported Telemetry

​Overview

​Configuration

​Built-in sources:

​Top-level fields

​Source

​Scope

​Schema URL

​Type handling

​Examples

​User-defined sources

​Performance

​Built-in mappings

​openinference

​Message reconstruction

Role inference

GenAI semconv part types not produced

Output format

​openllmetry

​Value transformations

​Relationship to other processors

​Configuration

​Example Configuration

Genainormalizer Processor

Supported Telemetry

Overview

Configuration

Built-in sources:

Top-level fields

Source

Scope

Schema URL

Type handling

Examples

User-defined sources

Performance

Built-in mappings

`openinference`

Message reconstruction

`openllmetry`

Value transformations

Relationship to other processors

Configuration

Example Configuration