Skip to main content

K8sattributes Processor

Status Available in: contrib, k8s Maintainers: @dmitryax, @fatsheep9146, @TylerHelmuth, @ChrsMark, @odubajDT Source: opentelemetry-collector-contrib

Supported Telemetry

Logs Metrics Traces

Overview

The processor automatically discovers k8s resources (pods), extracts metadata from them and adds the extracted metadata to the relevant spans, metrics and logs as resource attributes. The processor uses the kubernetes API to discover all pods running in a cluster, keeps a record of their IP addresses, pod UIDs and interesting metadata. The rules for associating the data passing through the processor (spans, metrics and logs) with specific Pod Metadata are configured via “pod_association” key. It represents a list of associations that are executed in the specified order until the first one is able to do the match.

Configuration

The processor stores the list of running pods and the associated metadata. When it sees a datapoint (log, trace or metric), it will try to associate the datapoint to the pod from where the datapoint originated, so we can add the relevant pod metadata to the datapoint. By default, it associates the incoming connection IP to the Pod IP. But for cases where this approach doesn’t work (sending through a proxy, etc.), a custom association rule can be specified. Each association is specified as a list of sources of associations. The maximum number of sources within an association is 4. A source is a rule that matches metadata from the datapoint to pod metadata. In order to get an association applied, all the sources specified need to match. Each sources rule is specified as a pair of from (representing the rule type) and name (representing the attribute name if from is set to resource_attribute). The following rule types are available:
  • connection: Takes the IP attribute from connection context (if available). In this case the processor must appear before any batching or tail sampling, which remove this information.
  • resource_attribute: Allows specifying the attribute name to lookup in the list of attributes of the received Resource. Semantic convention should be used for naming.
Example for a pod association configuration:
pod_association:
  # below association takes a look at the datapoint's k8s.pod.ip resource attribute and tries to match it with
  # the pod having the same attribute.
  - sources:
      - from: resource_attribute
        name: k8s.pod.ip
  # below association matches for pair `k8s.pod.name` and `k8s.namespace.name`
  - sources:
      - from: resource_attribute
        name: k8s.pod.name
      - from: resource_attribute
        name: k8s.namespace.name
If Pod association rules are not configured, resources are associated with metadata only by connection’s IP Address. Which metadata to collect is determined by metadata configuration that defines list of resource attributes to be added. Items in the list called exactly the same as the resource attributes that will be added. The following attributes are added by default:
  • k8s.namespace.name
  • k8s.pod.name
  • k8s.pod.uid
  • k8s.pod.start_time
  • k8s.deployment.name (requires watching Deployment resources unless deployment_name_from_replicaset is enabled)
  • k8s.node.name
These attributes are also available for the use within association rules by default. The metadata section can also be extended with additional attributes which, if present in the metadata section, are then also available for the use within association rules. Available attributes are:
  • k8s.namespace.name
  • k8s.pod.name
  • k8s.pod.hostname
  • k8s.pod.ip
  • k8s.pod.start_time
  • k8s.pod.uid
  • k8s.replicaset.uid
  • k8s.replicaset.name
  • k8s.deployment.uid
  • k8s.deployment.name
  • k8s.daemonset.uid
  • k8s.daemonset.name
  • k8s.statefulset.uid
  • k8s.statefulset.name
  • k8s.cronjob.uid
  • k8s.cronjob.name
  • k8s.job.uid
  • k8s.job.name
  • k8s.node.name
  • k8s.cluster.uid
  • service.namespace
  • service.name
  • service.version(cannot be used for source rules in the pod_association when it’s calculated based on container’s image tag/digest)
  • service.instance.id(cannot be used for source rules in the pod_association)
  • Any tags extracted from the pod labels and annotations, as described in extracting attributes from pod labels and annotations
Not all the attributes are guaranteed to be added. Only attribute names from metadata should be used for pod_association’s resource_attribute, because empty or non-existing values will be ignored. Additional container level attributes can be extracted. If a pod contains more than one container, either the container.id, or the k8s.container.name attribute must be provided in the incoming resource attributes to correctly associate the matching container to the resource:
  1. If the container.id resource attribute is provided, the following additional attributes will be available:
    • k8s.container.name
    • container.image.name
    • container.image.tag
    • container.image.repo_digests (if k8s CRI populates repository digest field)
    • service.version
    • service.instance.id
  2. If the k8s.container.name resource attribute is provided, the following additional attributes will be available:
    • container.id (if the k8s.container.restart_count resource attribute is not provided, it’s not guaranteed to get the right container ID.)
    • container.image.name
    • container.image.tag
    • container.image.repo_digests (if k8s CRI populates repository digest field)
    • service.version
    • service.instance.id
  3. If the k8s.container.restart_count resource attribute is provided, it can be used to associate with a particular container instance. If it’s not set, the latest container instance will be used:
    • container.id (not added by default, has to be specified in metadata)
Please note, however, that only container.id attribute can be used for source rules in the pod_association. To use container.id in pod association, at least one container attribute must be included in the metadata extraction configuration (e.g., container.id, container.image.name, etc.). Example for extracting container level attributes:
pod_association:
- sources:
    - from: connection
extract:
  metadata:
  - k8s.pod.name
  - k8s.pod.uid
  - container.image.name
  - container.image.tag
  - k8s.container.name
The previous configuration attaches the attributes listed in the metadata section to all resources received by a matching pod with the k8s.container.name attribute being present. For example, when the following trace
{
  "name": "lets-go",
  "context": {
    "trace_id": "0x5b8aa5a2d2c872e8321cf37308d69df2",
    "span_id": "0x051581bf3cb55c13"
  },
  "parent_id": null,
  "start_time": "2022-04-29T18:52:58.114201Z",
  "end_time": "2022-04-29T18:52:58.114687Z",
  "attributes": {
    "k8s.container.name": "telemetrygen"
  }
}
is sent to the collector by the following pod,
apiVersion: v1
kind: Pod
metadata:
  annotations:
    workload: deployment
  name: telemetrygen-pod
  namespace: e2ek8senrichment
  uid: 038e2267-b473-489b-b48c-46bafdb852eb
spec:
  containers:
  - command:
    - /telemetrygen
    - traces
    - --otlp-insecure
    - --otlp-endpoint=otelcollector.svc.cluster.local:4317
    - --duration=10s
    - --rate=1
    - --otlp-attributes=k8s.container.name="telemetrygen"
    image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:0.112.0@sha256:b248ef911f93ae27cbbc85056d1ffacc87fd941bbdc2ffd951b6df8df72b8096
    name: telemetrygen
status:
  podIP: 10.244.0.11
the processor associates the received trace to the pod, based on the connection IP, and add those attributes to the resulting span:
{
  "name": "lets-go",
  "context": {
    "trace_id": "0x5b8aa5a2d2c872e8321cf37308d69df2",
    "span_id": "0x051581bf3cb55c13"
  },
  "parent_id": null,
  "start_time": "2022-04-29T18:52:58.114201Z",
  "end_time": "2022-04-29T18:52:58.114687Z",
  "attributes": {
    "k8s.container.name": "telemetrygen",
    "k8s.pod.name": "telemetrygen-pod",
    "k8s.pod.uid": "038e2267-b473-489b-b48c-46bafdb852eb",
    "container.image.name": "telemetrygen",
    "container.image.tag": "0.112.0",
    "container.image.repo_digests": ["ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen@sha256:b248ef911f93ae27cbbc85056d1ffacc87fd941bbdc2ffd951b6df8df72b8096"]
  }
}
By default, the processor will be ready as soon as it starts, even if no metadata has been fetched yet. If data is sent to this processor before the metadata is synced, there will be no metadata to enrich the data with. To wait for the metadata to be synced before the processor is ready, set the wait_for_metadata option to true. Then the processor will not be ready until the metadata is fully synced. As a result, the start-up of the Collector will be blocked. If the metadata cannot be synced, the Collector will ultimately fail to start. If a timeout is reached, the processor will fail to start and return an error, which will cause the collector to exit. The timeout defaults to 10s and can be configured with the wait_for_metadata_timeout option. example for setting the processor to wait for metadata to be synced before it is ready:
wait_for_metadata: true
wait_for_metadata_timeout: 10s

Extracting attributes from pod labels and annotations

The k8sattributesprocessor can also set resource attributes from k8s labels and annotations of pods, namespaces, deployments, statefulsets, daemonsets, jobs and nodes. The config for associating the data passing through the processor (spans, metrics and logs) with specific Pod/Namespace/Deployment/StatefulSet/DaemonSet/Job/Node annotations/labels is configured via “annotations” and “labels” keys. This config represents a list of annotations/labels that are extracted from pods/namespaces/deployments/statefulsets/daemonsets/jobs/nodes and added to spans, metrics and logs. Each item is specified as a config of tag_name (representing the tag name to tag the spans with), key (representing the key used to extract value) and from (representing the kubernetes object used to extract the value). The “from” field has only three possible values “pod”, “namespace”, “deployment”, “statefulset”, “daemonset”, “job” and “node” and defaults to “pod” if none is specified. By default, extracting metadata from Deployments, StatefulSets, DaemonSets and Jobs is disabled. Enabling extraction of these metadata comes with an extra memory consumption cost. A few examples to use this config are as follows:
extract:
  annotations:
    - tag_name: a1 # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
      key: annotation-one
      from: pod
    - tag_name: a2 # extracts value of annotation from namespaces with key `annotation-two` and inserts it as a tag with key `a2`
      key: annotation-two
      from: namespace
    - tag_name: a3 # extracts value of annotation from nodes with key `annotation-three` and inserts it as a tag with key `a3`
      key: annotation-three
      from: node
  labels:
    - tag_name: l1 # extracts value of label from namespaces with key `label1` and inserts it as a tag with key `l1`
      key: label1
      from: namespace
    - tag_name: l2 # extracts value of label from pods with key `label2` and inserts it as a tag with key `l2`
      key: label2
      from: pod
    - tag_name: l3 # extracts value of label from nodes with key `label3` and inserts it as a tag with key `l3`
      key: label3
      from: node
The processor can be configured to set the recommended resource attributes:
  • otel_annotations will translate resource.opentelemetry.io/foo to the foo resource attribute, etc.
  • deployment_name_from_replicaset allows extracting deployment name from replicaset name by trimming pod template hash. This will disable watching for replicaset resources, which can be useful in environments with limited RBAC permissions as the processor will not need get, watch, and list permissions for deployments. It also reduces memory consumption of the processor. Important: When deployment_name_from_replicaset: true is set, you must still include k8s.deployment.name (or service.name) in the extract.metadata section for the deployment name to be extracted. The processor derives the deployment name from the ReplicaSet’s naming convention without requiring direct access to Deployment resources, but the extraction rules must be enabled. Take the following ownerReference of a pod managed by deployment for example:
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: opentelemetry-collector-6c45f8d6f6
    uid: ee75293d-14ec-42a0-9548-a768d9e07c48
The Extracted deployment name is: opentelemetry-collector.
Please note, if your pods are managed by a replicaset but not by a deployment the k8s.deployment.name will be set incorrectly. For example, if the replicaset is named opentelemetry-collector-6c45f8d6f6, the feature will still set the deployment name of the pod to opentelemetry-collector because it skips watching for the deployment and has no context if the pod is managed by a deployment or a replicaset. Another edge case to be aware of is when the deployment name is long. Kubernetes may truncate it in the ReplicaSet name to ensure there is enough space for the pod template hash suffix, so the full name fits within the DNS subdomain limit (253 characters). In such cases, the extracted k8s.deployment.name will be the truncated form, not the original full deployment name.
Example:
  extract:
    otel_annotations: true
    deployment_name_from_replicaset: true
    metadata:
      - service.namespace
      - service.name
      - service.version
      - service.instance.id

Config example

k8s_attributes:
  auth_type: "serviceAccount"
  passthrough: false
  filter:
    # only retrieve pods running on the same node as the collector
    node_from_env_var: KUBE_NODE_NAME
  extract:
    # The attributes provided in 'metadata' will be added to associated resources
    metadata:
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.deployment.name  # Requires watching Deployment resources. To avoid this, use deployment_name_from_replicaset instead.
      - k8s.namespace.name
      - k8s.node.name
      - k8s.pod.start_time
      - service.namespace
      - service.name
      - service.version
      - service.instance.id
    labels:
      # This label extraction rule takes the value 'app.kubernetes.io/component' label and maps it to the 'app.label.component' attribute which will be added to the associated resources
      - tag_name: app.label.component
        key: app.kubernetes.io/component
        from: pod
    otel_annotations: true
  pod_association:
    - sources:
        # This rule associates all resources containing the 'k8s.pod.ip' attribute with the matching pods. If this attribute is not present in the resource, this rule will not be able to find the matching pod.
        - from: resource_attribute
          name: k8s.pod.ip
    - sources:
        # This rule associates all resources containing the 'k8s.pod.uid' attribute with the matching pods. If this attribute is not present in the resource, this rule will not be able to find the matching pod.
        - from: resource_attribute
          name: k8s.pod.uid
    - sources:
        # This rule will use the IP from the incoming connection from which the resource is received, and find the matching pod, based on the 'pod.status.podIP' of the observed pods
        - from: connection

Common Use Cases

Example 1: Basic Agent Deployment (DaemonSet)

Minimal configuration for an agent collecting telemetry from pods on the same node:
processors:
  k8s_attributes:
    # Use downward API to automatically filter by current node
    filter:
      node_from_env_var: KUBE_NODE_NAME
    # Extract common metadata
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.node.name
    # Default connection-based pod association
    pod_association:
      - sources:
          - from: connection
Required environment variable in your collector DaemonSet:
env:
  - name: KUBE_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

Example 2: Gateway Deployment with Resource Attribute Association

Gateway configuration that receives telemetry from agents that have already added pod IP:
processors:
  k8s_attributes:
    # Extract comprehensive metadata
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.node.name
      # Extract custom labels
      labels:
        - tag_name: deployment_mode
          key: deployment_mode
          from: pod
        - tag_name: owner_team
          key: team
          from: pod
    # Associate by resource attributes set by agents
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip

Example 3: Production Deployment with Namespace Filtering

Configuration for monitoring a specific namespace with comprehensive metadata:
processors:
  k8s_attributes:
    filter:
      namespace: production
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.node.name
        - service.name
        - service.version
      labels:
        - tag_name: team
          key: team
          from: namespace
        - tag_name: environment
          key: environment
          from: pod
      annotations:
        - tag_name: commit_sha
          key: git-commit
          from: pod
      otel_annotations: true

Example 4: Memory-Optimized Configuration

Minimal memory footprint configuration for large clusters:
processors:
  k8s_attributes:
    filter:
      node_from_env_var: KUBE_NODE_NAME
    extract:
      # Only extract essential fields
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name  # Required to enable deployment name extraction
        # Note: deployment_name_from_replicaset extracts the name from the ReplicaSet
        # without watching Deployment resources, but k8s.deployment.name must still be listed
      # Use deployment name extraction without watching replicasets
      deployment_name_from_replicaset: true

Example 5: Multi-Container Pod Support

Configuration for extracting container-level metadata:
processors:
  k8s_attributes:
    filter:
      node_from_env_var: KUBE_NODE_NAME
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.container.name
        - container.id
        - container.image.name
        - container.image.tag
    # Associate by container ID for multi-container pods
    pod_association:
      - sources:
          - from: resource_attribute
            name: container.id
      - sources:
          - from: connection

Role-based access control

Cluster-scoped RBAC

If you’d like to set up the k8sattributesprocessor to receive telemetry from across namespaces, it will need get, watch and list permissions on both pods and namespaces resources, for all namespaces and pods included in the configured filters. Additionally, when using k8s.deployment.name (which is enabled by default) or k8s.deployment.uid the processor also needs get, watch and list permissions for replicasets resources (unless deployment_name_from_replicaset is enabled). When using k8s.node.uid or extracting metadata from node, the processor needs get, watch and list permissions for nodes resources. When using k8s.cronjob.uid the processor also needs get, watch and list permissions for jobs resources. Here is an example of a ClusterRole to give a ServiceAccount the necessary permissions for all pods, nodes, and namespaces in the cluster (replace <OTEL_COL_NAMESPACE> with a namespace where collector is deployed):
apiVersion: v1
kind: ServiceAccount
metadata:
  name: collector
  namespace: <OTEL_COL_NAMESPACE>
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-collector
rules:
- apiGroups: [""]
  resources: ["pods", "namespaces", "nodes"]
  verbs: ["get", "watch", "list"]
- apiGroups: ["apps"]
  resources: ["replicasets", "deployments", "statefulsets", "daemonsets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
  resources: ["replicasets"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector
subjects:
- kind: ServiceAccount
  name: collector
  namespace: <OTEL_COL_NAMESPACE>
roleRef:
  kind: ClusterRole
  name: otel-collector
  apiGroup: rbac.authorization.k8s.io

Namespace-scoped RBAC

When running the k8sattributesprocessor to receive telemetry traffic from pods in a specific namespace, you can use a k8s Role and Rolebinding to provide collector access to query pods and replicasets in the namespace. This would require setting the filter::namespace config as shown below.
k8s_attributes:
  filter:
    namespace: <WORKLOAD_NAMESPACE>
With the namespace filter set, the processor will only look up pods and replicasets (if deployment_name_from_replicaset is not enabled) in the selected namespace. Note that with just a role binding, the processor cannot query metadata such as labels and annotations from k8s nodes and namespaces which are cluster-scoped objects. This also means that the processor cannot set the value for k8s.cluster.uid attribute if enabled, since the k8s.cluster.uid attribute is set to the uid of the namespace kube-system which is not queryable with namespaced rbac. Please note, when extracting the workload related attributes, these workloads need to be present in the Role with the correct permissions. For example, an extraction of k8s.deployment.label.* attributes, deployments need to be present in Role. Example Role and RoleBinding to create in the namespace being watched.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector
  namespace: <OTEL_COL_NAMESPACE>
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: otel-collector
  namespace: <WORKLOAD_NAMESPACE>
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
- apiGroups: ["apps"]
  resources: ["replicasets", "deployments", "statefulsets", "daemonsets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: otel-collector
  namespace: <WORKLOAD_NAMESPACE>
subjects:
- kind: ServiceAccount
  name: otel-collector
  namespace: <OTEL_COL_NAMESPACE>
roleRef:
  kind: Role
  name: otel-collector
  apiGroup: rbac.authorization.k8s.io

Deployment scenarios

The processor can be used in collectors deployed both as an agent (Kubernetes DaemonSet) or as a gateway (Kubernetes Deployment).

As an agent

When running as an agent, the processor detects IP addresses of pods sending spans, metrics or logs to the agent and uses this information to extract metadata from pods. When running as an agent, it is important to apply a discovery filter so that the processor only discovers pods from the same host that it is running on. Not using such a filter can result in unnecessary resource usage especially on very large clusters. Once the filter is applied, each processor will only query the k8s API for pods running on its own node. Node filter can be applied by setting the filter.node config option to the name of a k8s node. While this works as expected, it cannot be used to automatically filter pods by the same node that the processor is running on in most cases as it is not know before hand which node a pod will be scheduled on. Luckily, kubernetes has a solution for this called the downward API. To automatically filter pods by the node the processor is running on, you’ll need to complete the following steps:
  1. Use the downward API to inject the node name as an environment variable. Add the following snippet under the pod env section of the OpenTelemetry container.
spec:
  containers:
  - env:
    - name: KUBE_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
This will inject a new environment variable to the OpenTelemetry container with the value as the name of the node the pod was scheduled to run on.
  1. Set “filter.node_from_env_var” to the name of the environment variable holding the node name.
k8s_attributes:
  filter:
    node_from_env_var: KUBE_NODE_NAME # this should be same as the var name used in previous step
This will restrict each OpenTelemetry agent to query pods running on the same node only dramatically reducing resource requirements for very large clusters.

As a gateway

When running as a gateway, the processor cannot correctly detect the IP address of the pods generating the telemetry data without any of the well-known IP attributes, when it receives them from an agent instead of receiving them directly from the pods. To workaround this issue, agents deployed with the k8s_attributes processor can be configured to detect the IP addresses and forward them along with the telemetry data resources. Collector can then match this IP address with k8s pods and enrich the records with the metadata. In order to set this up, you’ll need to complete the following steps:
  1. Setup agents in passthrough mode Configure the agents’ k8s_attributes processors to run in passthrough mode.

k8s_attributes:
  passthrough: true
This will ensure that the agents detect the IP address as add it as an attribute to all telemetry resources. Agents will not make any k8s API calls, do any discovery of pods or extract any metadata.
  1. Configure the collector as usual No special configuration changes are needed to be made on the collector. It’ll automatically detect the IP address of spans, logs and metrics sent by the agents as well as directly by other services/pods.

Complete Configuration Options

Below is a comprehensive configuration example with all available options:
k8s_attributes:
  # Authentication type for Kubernetes API access
  # Options: "none", "serviceAccount", "kubeConfig"
  # Default: "serviceAccount"
  auth_type: "serviceAccount"
  
  # Path to kubeconfig file (only used when auth_type is "kubeConfig")
  # Default: ""
  kube_config_path: "~/.kube/config"
  
  # Kubernetes API server context (only used when auth_type is "kubeConfig")
  # Default: ""
  context: ""
  
  # Passthrough mode - only annotates resources with pod IP without extracting metadata
  # Useful for agents that don't need K8s API access
  # Default: false
  passthrough: false
  
  # Wait for metadata to be synced before marking processor as ready
  # When true, collector startup will block until metadata is available
  # Default: false
  wait_for_metadata: false
  
  # Maximum time to wait for metadata sync during startup
  # Only applies when wait_for_metadata is true
  # Default: 10s
  wait_for_metadata_timeout: 10s
  
  # Extract configuration - defines what metadata to extract
  extract:
    # Metadata fields to extract as resource attributes
    # Default: [k8s.namespace.name, k8s.pod.name, k8s.pod.uid, k8s.pod.start_time, k8s.deployment.name, k8s.node.name]
    # Full description of each metadata attribute can be found in the semantic conventions:
    # https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource/k8s
    metadata:
      - k8s.namespace.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.hostname
      - k8s.pod.start_time
      - k8s.pod.ip
      - k8s.deployment.name
      - k8s.deployment.uid
      - k8s.replicaset.name
      - k8s.replicaset.uid
      - k8s.daemonset.name
      - k8s.daemonset.uid
      - k8s.statefulset.name
      - k8s.statefulset.uid
      - k8s.job.name
      - k8s.job.uid
      - k8s.cronjob.name
      - k8s.cronjob.uid
      - k8s.node.name
      - k8s.node.uid
      - k8s.cluster.uid
      - k8s.container.name
      - container.id
      - container.image.name
      - container.image.tag
      - container.image.repo_digests
      # See [the configuration section](#configuration) for more details
      - service.namespace
      - service.name
      - service.version
      - service.instance.id
    
    # Extract pod annotations as resource attributes
    # See [Extracting attributes from pod labels and annotations](#extracting-attributes-from-pod-labels-and-annotations) section for more details
    annotations:
      - tag_name: annotation_value  # Resource attribute name
        key: my-annotation           # Annotation key to extract
        from: pod                     # Source: pod, namespace, deployment, statefulset, daemonset, job, or node
      - tag_name: deployment_annotation
        key: app.version
        from: deployment
      # Extract multiple annotations matching a regex pattern
      - tag_name: $1                 # Use regex capture group
        key_regex: custom\.(.*)      # Extract all annotations matching pattern
        from: pod
      # Extract all annotations (use with caution - may extract many attributes)
      - tag_name: $$1
        key_regex: (.*)
        from: pod
    
    # Extract pod labels as resource attributes
    # See [Extracting attributes from pod labels and annotations](#extracting-attributes-from-pod-labels-and-annotations) section for more details
    labels:
      - tag_name: label_value        # Resource attribute name
        key: my-label                # Label key to extract
        from: pod                     # Source: pod, namespace, deployment, statefulset, daemonset, job, or node
      - tag_name: namespace_label
        key: environment
        from: namespace
      # Extract multiple labels matching a regex pattern
      - tag_name: $1                 # Use regex capture group
        key_regex: app\.(.*)         # Extract all labels matching pattern
        from: pod
      # Extract labels matching a pattern without specifying tag_name
      # Uses default format: k8s.pod.labels.<label_key>
      - key_regex: environment\.(.*)  # Extract labels starting with "environment."
        from: pod
      # Extract all labels (use with caution - may extract many attributes)
      - tag_name: $$1
        key_regex: (.*)
        from: pod
    
    # Extract OpenTelemetry resource attributes from pod annotations
    # See [Configuring recommended resource attributes](#configuring-recommended-resource-attributes) section for more details
    # Annotations with prefix "resource.opentelemetry.io/" become resource attributes
    # Example: "resource.opentelemetry.io/service.version" → "service.version"
    # Default: false
    otel_annotations: true
    
    # Extract deployment name from replicaset name (disables replicaset watching)
    # Reduces memory usage and RBAC requirements
    # See [Configuring recommended resource attributes](#configuring-recommended-resource-attributes) section for more details
    # Default: false
    deployment_name_from_replicaset: false
  
  # Filter configuration - restrict which pods to monitor
  filter:
    # Filter by node name (static)
    node: "node-1"
    
    # Filter by node name from environment variable (dynamic)
    # Use with Kubernetes downward API to automatically filter by current node
    node_from_env_var: "KUBE_NODE_NAME"
    
    # Filter by namespace
    namespace: "my-namespace"
    
    # Filter by field selectors
    fields:
      - key: spec.nodeName
        value: "node-1"
        op: equals              # Options: equals, not-equals
      - key: status.phase
        value: "Running"
        op: equals
    
    # Filter by label selectors
    labels:
      - key: app
        value: "my-app"
        op: equals              # Options: equals, not-equals, exists, does-not-exist
      - key: environment
        value: "production"
        op: equals
      - key: monitoring
        op: exists
  
  # Pod association rules - define how to match telemetry data to pods
  # Rules are evaluated in order; first match wins
  # Note: If a source attribute is found but doesn't match any pod, the association fails
  # and subsequent rules will not be evaluated
  # Maximum 4 sources per rule
  pod_association:
    # Rule 1: Match by pod IP from resource attribute
    - sources:
        - from: resource_attribute
          name: k8s.pod.ip
    
    # Rule 2: Match by pod UID from resource attribute
    - sources:
        - from: resource_attribute
          name: k8s.pod.uid
    
    # Rule 3: Match by pod name AND namespace (both must match)
    - sources:
        - from: resource_attribute
          name: k8s.pod.name
        - from: resource_attribute
          name: k8s.namespace.name
    
    # Rule 4: Match by connection IP (default if no rules specified)
    - sources:
        - from: connection
  
  # Exclude configuration - ignore specific pods
  exclude:
    pods:
      - name: "jaeger-agent"        # Exact pod name to exclude
      - name: "jaeger-collector"

Configuration Options Reference

Top-Level Options

OptionTypeDefaultDescription
auth_typestringserviceAccountAuthentication method for K8s API: none, serviceAccount, or kubeConfig
kube_config_pathstring""Path to kubeconfig file (only when auth_type: kubeConfig)
contextstring""K8s context to use (only when auth_type: kubeConfig)
passthroughboolfalseOnly add pod IP without extracting metadata (no K8s API calls)
wait_for_metadataboolfalseBlock collector startup until metadata is synced
wait_for_metadata_timeoutduration10sMax wait time for metadata sync on startup

Extract Options

OptionTypeDefaultDescription
metadata[]stringSee belowList of metadata fields to extract as resource attributes
annotations[]FieldExtractConfig[]Pod/namespace/node annotations to extract
labels[]FieldExtractConfig[]Pod/namespace/node labels to extract
otel_annotationsboolfalseExtract OpenTelemetry resource attributes from pod annotations with prefix resource.opentelemetry.io/
deployment_name_from_replicasetboolfalseExtract deployment name from replicaset name (disables replicaset watching)
Default metadata fields:
  • k8s.namespace.name
  • k8s.pod.name
  • k8s.pod.uid
  • k8s.pod.start_time
  • k8s.deployment.name
  • k8s.node.name
Available metadata fields: All fields listed in the “Complete Configuration Options” section above under extract.metadata.

FieldExtractConfig Options

OptionTypeDefaultDescription
tag_namestringAuto-generatedResource attribute name (supports regex backreferences with key_regex)
keystring""Exact annotation/label key to extract (mutually exclusive with key_regex)
key_regexstring""Regex pattern to match annotation/label keys (mutually exclusive with key)
fromstringpodSource to extract from: pod, namespace, deployment, statefulset, daemonset, job, or node

Filter Options

OptionTypeDefaultDescription
nodestring""Filter pods by specific node name
node_from_env_varstring""Environment variable containing node name to filter by
namespacestring""Filter pods by specific namespace
fields[]FieldFilterConfig[]Filter by K8s field selectors
labels[]FieldFilterConfig[]Filter by K8s label selectors

FieldFilterConfig Options

OptionTypeDefaultDescription
keystringRequiredField or label key
valuestring""Field or label value
opstringequalsOperation: equals, not-equals (fields); equals, not-equals, exists, does-not-exist (labels)

PodAssociationConfig Options

OptionTypeDefaultDescription
sources[]AssociationSourceRequiredList of sources to match (maximum 4, all must match)

AssociationSource Options

OptionTypeDefaultDescription
fromstringRequiredSource type: connection or resource_attribute
namestringConditionalResource attribute name (required when from: resource_attribute)

Exclude Options

OptionTypeDefaultDescription
pods[]ExcludePodConfigDefault excludesList of pods to exclude from processing

ExcludePodConfig Options

OptionTypeDefaultDescription
namestringRequiredPod name pattern (regex) to exclude
Default excluded pods:
  • jaeger-agent
  • jaeger-collector

Caveats

There are some edge-cases and scenarios where k8s_attributes will not work properly.

Host networking mode

The processor cannot correct identify pods running in the host network mode and enriching telemetry data generated by such pods is not supported at the moment, unless the association rule is not based on IP attribute.

As a sidecar

The processor does not support detecting containers from the same pods when running as a sidecar. While this can be done, we think it is simpler to just use the kubernetes downward API to inject environment variables into the pods and directly use their values as tags.

Compatibility

Kubernetes Versions

This processor is tested against the Kubernetes versions specified in the e2e-tests.yml workflow. These tested versions represent the officially supported Kubernetes versions for this component.

Production Deployment Guide

Scaling Considerations

Memory Consumption

The processor maintains an in-memory cache of K8s metadata for all pods it monitors. Memory usage scales with:
  • Number of pods monitored: Each pod’s metadata (labels, annotations, owner references) is cached
  • Metadata fields extracted: More fields = more memory per pod
  • Label/annotation extraction rules: Regex patterns and multiple rules increase overhead
  • Workload metadata: Extracting deployment/statefulset/daemonset/job metadata adds additional caching
Memory estimates:
  • Agent mode (node-filtered): ~50-200 MB for 100 pods per node
  • Gateway mode (cluster-wide): ~500 MB - 2 GB for 1000-10000 pods
  • With workload metadata: Add 20-30% overhead
Optimization strategies:
  1. Use node filtering in agent deployments: filter.node_from_env_var: KUBE_NODE_NAME
  2. Limit metadata extraction: Only extract fields you need
  3. Use deployment_name_from_replicaset: true: Reduces memory by not caching replicaset data
  4. Filter by namespace: Limits scope when monitoring specific applications
  5. Avoid extracting workload metadata unless necessary (deployment, statefulset, etc.)

CPU Usage

CPU usage is generally low but increases with:
  • High telemetry throughput: Each data point requires pod lookup and attribute enrichment
  • Frequent pod churn: More K8s API watch events to process
  • Complex association rules: Multiple rules with many sources
Recommended resource limits:
  • Agent mode: 100-500m CPU, 256-512 Mi memory
  • Gateway mode: 500m-2 CPU, 1-4 Gi memory

High Availability

For gateway deployments, run multiple replicas with:
  • Load balancer distributing telemetry traffic
  • Each replica independently queries K8s API and maintains its own cache
  • No shared state between replicas
  • Horizontal scaling based on CPU/memory usage
For production deployments using Helm charts, see the official OpenTelemetry Kube Stack chart and the isolated multicollector deployment example.
# Example: 3 replicas for HA
replicas: 3
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2
    memory: 4Gi

Graceful Shutdown

The processor is stateless and requires no special shutdown procedures:
  1. Collector receives SIGTERM
  2. Processor stops watching K8s API
  3. In-flight telemetry data is processed
  4. Collector shuts down cleanly
No persistent storage required - all metadata is refreshed from K8s API on startup.

Performance Benchmarks

Based on testing with 1000 pods using the default configuration:
processors:
  k8s_attributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
        - k8s.deployment.name
        - k8s.node.name
Signal TypeThroughputLatencyMemoryCPU
Traces50k spans/sec<1ms added800 MB400m
Metrics100k metrics/sec<0.5ms added750 MB350m
Logs75k logs/sec<0.7ms added850 MB380m
Profiles10k profiles/sec<2ms added700 MB300m
Results may vary based on metadata extraction configuration and cluster size.

Timestamp Format

By default, the k8s.pod.start_time uses Time.MarshalText() to format the timestamp value as an RFC3339 compliant timestamp.

Self-Observability Features

The processor exposes internal telemetry metrics for monitoring its operation. For a complete list of all available metrics, see the Internal Telemetry documentation. Key metrics to monitor:
  • otelcol_otelsvc_k8s_ip_lookup_miss: Number of times pod lookup by IP failed
    • High values indicate association issues
  • otelcol_otelsvc_k8s_pod_added / otelcol_otelsvc_k8s_pod_deleted: Track pod churn rates
    • Monitor for unexpected spikes in pod lifecycle events
  • otelcol_otelsvc_k8s_pod_table_size: Current size of pod metadata cache
    • Use to monitor memory consumption trends

Warnings

  • Memory consumption: Since the processor fetches and caches the K8s metadata for the resources of the node it is on, it consumes more memory than other processors. That consumption is compounded if users don’t filter down to only the metadata for the node the processor is running on.

Feature Gates

See documentation.md for the complete list of feature gates supported by this processor. Feature gates can be enabled using the --feature-gates flag:
"--feature-gates=<feature-gate>"

Semantic Conventions Compatibility

The processor is moving towards the latest Semantic Conventions through the following feature gates:
  • processor.k8sattributes.DontEmitV0K8sConventions
  • processor.k8sattributes.EmitV1K8sConventions
This follows the migration pattern described by the respective Collector RFC. The breaking changes between the 2 schemas are the following:
  • container.image.tag -> container.image.tags
  • k8s.pod.labels.<key> -> k8s.pod.label.<key>
  • k8s.pod.annotations.<key> -> k8s.pod.annotation.<key>
  • k8s.node.labels.<key> -> k8s.node.label.<key>
  • k8s.node.annotations.<key> -> k8s.node.annotation.<key>
  • k8s.namespace.labels.<key> -> k8s.namespace.label.<key>
  • k8s.namespace.annotations.<key> -> k8s.namespace.annotation.<key>
All attributes emitted through the processor.k8sattributes.EmitV1K8sConventions feature gate are currently in beta stability and are actively moving towards stable stability.

Available Benchmarks

The component is tested as part of the project’s load tests, with the results being publicly available at the benchmarks page. In that page, users can find details such as memory and CPU performance when the component is used in K8s Clusters (tests use KWOK) with a range number of workloads. Refer to the test for more information about the setup.

Attributes

Attribute NameDescriptionTypeValues
otelcol.signalThe signal type the telemetry metric is associated withstringmetrics, traces, logs, profiles
pod_identifierThe pod identifier value(s) used for the association attemptstring
statusThe status of the pod association operationstringsuccess, error

Resource Attributes

Attribute NameDescriptionTypeEnabled
container.idContainer ID. Usually a UUID, as for example used to identify Docker containers. The UUID might be abbreviated. Requires k8s.container.restart_count.string
container.image.nameName of the image the container was built on. Requires container.id or k8s.container.name.string
container.image.repo_digestsRepo digests of the container image as provided by the container runtime.slice
container.image.tagContainer image tag. Defaults to “latest” if not provided (unless digest also in image path) Requires container.id or k8s.container.name. Deprecated, use container.image.tags instead.string
container.image.tagsContainer image tags. Requires container.id or k8s.container.name.slice
k8s.cluster.uidGives cluster uid identified with kube-system namespacestring
k8s.container.nameThe name of the Container in a Pod template. Requires container.id.string
k8s.cronjob.nameThe name of the CronJob.string
k8s.cronjob.uidThe uid of the CronJob.string
k8s.daemonset.nameThe name of the DaemonSet.string
k8s.daemonset.uidThe UID of the DaemonSet.string
k8s.deployment.nameThe name of the Deployment.string
k8s.deployment.uidThe UID of the Deployment.string
k8s.job.nameThe name of the Job.string
k8s.job.uidThe UID of the Job.string
k8s.namespace.nameThe name of the namespace that the pod is running in.string
k8s.node.nameThe name of the Node.string
k8s.node.uidThe UID of the Node.string
k8s.pod.hostnameThe hostname of the Pod.string
k8s.pod.ipThe IP address of the Pod.string
k8s.pod.nameThe name of the Pod.string
k8s.pod.start_timeThe start time of the Pod.string
k8s.pod.uidThe UID of the Pod.string
k8s.replicaset.nameThe name of the ReplicaSet.string
k8s.replicaset.uidThe UID of the ReplicaSet.string
k8s.statefulset.nameThe name of the StatefulSet.string
k8s.statefulset.uidThe UID of the StatefulSet.string
service.instance.idThe instance ID of the service.string
service.nameThe name of the service.string
service.namespaceThe namespace of the service.string
service.versionThe version of the service.string

Configuration

Example Configuration

k8s_attributes:
k8s_attributes/2:
  passthrough: false
  auth_type: "kubeConfig"
  extract:
    metadata:
      # the following metadata fields configuration options are deprecated
      # - podName
      # - podUID
      # - deployment
      # - namespace
      # - node
      # - startTime
      # extract the following well-known metadata fields from pods and namespaces
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.ip
      - k8s.deployment.name
      - k8s.namespace.name
      - k8s.node.name
      - k8s.pod.start_time
      - k8s.cluster.uid

    annotations:
      - tag_name: a1 # extracts value of annotation with key `annotation-one` and inserts it as a tag with key `a1`
        key: annotation-one
        from: pod
      - tag_name: a2 # extracts value of annotation with key `annotation-two` with regexp and inserts it as a tag with key `a2`
        key: annotation-two
        from: pod
    labels:
      - tag_name: l1 # extracts value of label with key `label1` and inserts it as a tag with key `l1`
        key: label1
        from: pod
      - tag_name: l2 # extracts value of label with key `label1` with regexp and inserts it as a tag with key `l2`
        key: label2
        from: pod

  filter:
    namespace: ns2 # only look for pods running in ns2 namespace
    node: ip-111.us-west-2.compute.internal # only look for pods running on this node/host
    node_from_env_var: K8S_NODE # only look for pods running on the node/host specified by the K8S_NODE environment variable
    labels: # only consider pods that match the following labels
      - key: key1 # match pods that have a label `key1=value1`. `op` defaults to "equals" when not specified
        value: value1
      - key: key2 # ignore pods that have a label `key2=value2`.
        value: value2
        op: not-equals
    fields: # works the same way as labels but for fields instead (like annotations)
      - key: key1
        value: value1
      - key: key2
        value: value2
        op: not-equals

  pod_association:
    - sources:
      - from: resource_attribute
        name: ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: host.name
    - sources:
      - from: connection
        name: ip

  exclude:
    pods:
      - name: jaeger-agent
      - name: jaeger-collector

k8s_attributes/3:
  passthrough: false
  auth_type: "kubeConfig"
  extract:
    annotations:
      - key_regex: opentel.* # extracts Keys & values of annotations matching regex `opentel.*`
        from: pod
    labels:
      - key_regex: opentel.* # extracts Keys & values of labels matching regex `opentel.*`
        from: pod

k8s_attributes/4:
  auth_type: "kubeConfig"
  extract:
    metadata:
      # the following metadata field has been deprecated
      - k8s.cluster.name

k8s_attributes/too_many_sources:
  pod_association:
    - sources:
        - from: resource_attribute
          name: custom-attribute
        - from: resource_attribute
          name: ip
        - from: resource_attribute
          name: k8s.pod.ip
        - from: resource_attribute
          name: host.name
        - from: connection
          name: ip

k8s_attributes/bad_keys_labels:
  extract:
    labels:
      - tag_name: l1
        key: label1
        key_regex: label\d

k8s_attributes/bad_keys_annotations:
  extract:
    annotations:
      - tag_name: a1
        key: annotation1
        key_regex: annotation\d

k8s_attributes/bad_from_labels:
  extract:
    labels:
      - tag_name: a1
        key: label1
        from: unknown

k8s_attributes/bad_from_annotations:
  extract:
    annotations:
      - tag_name: a1
        key: annotation1
        from: unknown

k8s_attributes/bad_keyregex_labels:
  extract:
    labels:
      - tag_name: a1
        from: pod
        key_regex: "["

k8s_attributes/bad_keyregex_annotations:
  extract:
    annotations:
      - tag_name: a1
        from: pod
        key_regex: "["

k8s_attributes/bad_filter_label_op:
  filter:
    labels:
      - key: label
        value: v1
        op: "unknown"

k8s_attributes/deployment_name_from_replicaset:
  extract:
    deployment_name_from_replicaset: true
k8s_attributes/bad_filter_field_op:
  filter:
    fields:
      - key: field
        value: v1
        op: "exists"

k8s_attributes/otel_annotations:
  extract:
    otel_annotations: true

k8s_attributes/wait_for_metadata:
  wait_for_metadata: true
  wait_for_metadata_timeout: 30s

k8s_attributes/passthrough_mode:
  passthrough: true

k8s_attributes/filter_label_exists:
  filter:
    labels:
      - key: app
        op: exists

k8s_attributes/filter_label_does_not_exist:
  filter:
    labels:
      - key: deprecated-label
        op: does-not-exist

k8s_attributes/extract_from_namespace:
  extract:
    labels:
      - tag_name: ns_label
        key: team
        from: namespace
    annotations:
      - tag_name: ns_annotation
        key: owner
        from: namespace

k8s_attributes/extract_from_node:
  extract:
    labels:
      - tag_name: node_label
        key: node-role
        from: node
    metadata:
      - k8s.node.name
      - k8s.node.uid

k8s_attributes/extract_from_deployment:
  extract:
    labels:
      - tag_name: deployment_label
        key: app
        from: deployment

k8s_attributes/extract_from_statefulset:
  extract:
    labels:
      - tag_name: statefulset_label
        key: app
        from: statefulset

k8s_attributes/extract_from_daemonset:
  extract:
    labels:
      - tag_name: daemonset_label
        key: app
        from: daemonset

k8s_attributes/extract_from_job:
  extract:
    labels:
      - tag_name: job_label
        key: app
        from: job

k8s_attributes/all_metadata_fields:
  extract:
    metadata:
      - k8s.namespace.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.hostname
      - k8s.pod.start_time
      - k8s.pod.ip
      - k8s.deployment.name
      - k8s.deployment.uid
      - k8s.replicaset.name
      - k8s.replicaset.uid
      - k8s.daemonset.name
      - k8s.daemonset.uid
      - k8s.statefulset.name
      - k8s.statefulset.uid
      - k8s.job.name
      - k8s.job.uid
      - k8s.cronjob.name
      - k8s.cronjob.uid
      - k8s.node.name
      - k8s.node.uid
      - k8s.container.name
      - container.id
      - container.image.name
      - container.image.tag
      - container.image.repo_digests
      - service.namespace
      - service.name
      - service.version
      - service.instance.id
      - k8s.cluster.uid

k8s_attributes/bad_metadata_field:
  extract:
    metadata:
      - invalid.metadata.field

Last generated: 2026-04-13