Skip to main content

Documentation Index

Fetch the complete documentation index at: https://otel.fyi/llms.txt

Use this file to discover all available pages before exploring further.

Apachespark Receiver

Status Available in: contrib Maintainers: @Caleb-Hurshman, @mrsillydog Source: opentelemetry-collector-contrib

Supported Telemetry

Metrics

Overview

Purpose

The purpose of this component is to monitor Apache Spark clusters and the applications running on them through the collection of performance metrics like memory utilization, CPU utilization, shuffle operations, garbage collection time, I/O operations, and more.

Prerequisites

This receiver supports Apache Spark versions:
  • 3.3.2+

Configuration

These configuration options are for connecting to an Apache Spark application. The following settings are optional:
  • collection_interval: (default = 60s): This receiver collects metrics on an interval. This value must be a string readable by Golang’s time.ParseDuration. Valid time units are ns, us (or µs), ms, s, m, h.
  • initial_delay (default = 1s): defines how long this receiver waits before starting.
  • endpoint: (default = http://localhost:4040): Apache Spark endpoint to connect to in the form of [http][://]{host}[:{port}]
  • application_names: An array of Spark application names for which metrics should be collected. If no application names are specified, metrics will be collected for all Spark applications running on the cluster at the specified endpoint.

Example Configuration

receivers:
  apachespark:
    collection_interval: 60s
    endpoint: http://localhost:4040
    application_names:
    - PythonStatusAPIDemo
    - PythonLR
The full list of settings exposed for this receiver are documented in config.go with detailed sample configurations in testdata/config.yaml.

Metrics

Details about the metrics produced by this receiver can be found in metadata.yaml

Metrics

Metric NameDescriptionUnitTypeAttributes
spark.driver.block_manager.disk.usageDisk space used by the BlockManager.mbUpDownCounter
spark.driver.block_manager.memory.usageMemory usage for the driver’s BlockManager.mbUpDownCounterlocation, state
spark.driver.code_generator.compilation.average_timeAverage time spent during CodeGenerator source code compilation operations.msGauge
spark.driver.code_generator.compilation.countNumber of source code compilation operations performed by the CodeGenerator.{ compilation }Counter
spark.driver.code_generator.generated_class.average_sizeAverage class size of the classes generated by the CodeGenerator.bytesGauge
spark.driver.code_generator.generated_class.countNumber of classes generated by the CodeGenerator.{ class }Counter
spark.driver.code_generator.generated_method.average_sizeAverage method size of the classes generated by the CodeGenerator.bytesGauge
spark.driver.code_generator.generated_method.countNumber of methods generated by the CodeGenerator.{ method }Counter
spark.driver.code_generator.source_code.average_sizeAverage size of the source code generated by a CodeGenerator code generation operation.bytesGauge
spark.driver.code_generator.source_code.operationsNumber of source code generation operations performed by the CodeGenerator.{ operation }Counter
spark.driver.dag_scheduler.job.activeNumber of active jobs currently being processed by the DAGScheduler.{ job }UpDownCounter
spark.driver.dag_scheduler.job.countNumber of jobs that have been submitted to the DAGScheduler.{ job }Counter
spark.driver.dag_scheduler.stage.countNumber of stages the DAGScheduler is either running or needs to run.{ stage }UpDownCounterscheduler_status
spark.driver.dag_scheduler.stage.failedNumber of failed stages run by the DAGScheduler.{ stage }Counter
spark.driver.executor.gc.operationsNumber of garbage collection operations performed by the driver.{ gc_operation }Countergc_type
spark.driver.executor.gc.timeTotal elapsed time during garbage collection operations performed by the driver.msCountergc_type
spark.driver.executor.memory.executionAmount of execution memory currently used by the driver.bytesUpDownCounterlocation
spark.driver.executor.memory.jvmAmount of memory used by the driver’s JVM.bytesUpDownCounterlocation
spark.driver.executor.memory.poolAmount of pool memory currently used by the driver.bytesUpDownCounterpool_memory_type
spark.driver.executor.memory.storageAmount of storage memory currently used by the driver.bytesUpDownCounterlocation
spark.driver.hive_external_catalog.file_cache_hitsNumber of file cache hits on the HiveExternalCatalog.{ hit }Counter
spark.driver.hive_external_catalog.files_discoveredNumber of files discovered while listing the partitions of a table in the Hive metastore{ file }Counter
spark.driver.hive_external_catalog.hive_client_callsNumber of calls to the underlying Hive Metastore client made by the Spark application.{ call }Counter
spark.driver.hive_external_catalog.parallel_listing_jobsNumber of parallel listing jobs initiated by the HiveExternalCatalog when listing partitions of a table.{ listing_job }Counter
spark.driver.hive_external_catalog.partitions_fetchedTable partitions fetched by the HiveExternalCatalog.{ partition }Counter
spark.driver.jvm_cpu_timeCurrent CPU time taken by the Spark driver.nsCounter
spark.driver.live_listener_bus.droppedNumber of events that have been dropped by the LiveListenerBus.{ event }Counter
spark.driver.live_listener_bus.postedNumber of events that have been posted on the LiveListenerBus.{ event }Counter
spark.driver.live_listener_bus.processing_time.averageAverage time taken for the LiveListenerBus to process an event posted to it.msGauge
spark.driver.live_listener_bus.queue_sizeNumber of events currently waiting to be processed by the LiveListenerBus.{ event }UpDownCounter
spark.executor.disk.usageDisk space used by this executor for RDD storage.bytesUpDownCounter
spark.executor.gc_timeElapsed time the JVM spent in garbage collection in this executor.msCounter
spark.executor.input_sizeAmount of data input for this executor.bytesCounter
spark.executor.memory.usageStorage memory used by this executor.bytesUpDownCounter
spark.executor.shuffle.io.sizeAmount of data written and read during shuffle operations for this executor.bytesCounterdirection
spark.executor.storage_memory.usageThe executor’s storage memory usage.bytesUpDownCounterlocation, state
spark.executor.task.activeNumber of tasks currently running in this executor.{ task }UpDownCounter
spark.executor.task.limitMaximum number of tasks that can run concurrently in this executor.{ task }UpDownCounter
spark.executor.task.resultNumber of tasks with a specific result in this executor.{ task }Counterexecutor_task_result
spark.executor.timeElapsed time the JVM spent executing tasks in this executor.msCounter
spark.job.stage.activeNumber of active stages in this job.{ stage }UpDownCounter
spark.job.stage.resultNumber of stages with a specific result in this job.{ stage }Counterjob_result
spark.job.task.activeNumber of active tasks in this job.{ task }UpDownCounter
spark.job.task.resultNumber of tasks with a specific result in this job.{ task }Counterjob_result
spark.stage.disk.spilledThe amount of disk space used for storing portions of overly large data chunks that couldn’t fit in memory in this stage.bytesCounter
spark.stage.executor.cpu_timeCPU time spent by the executor in this stage.nsCounter
spark.stage.executor.run_timeAmount of time spent by the executor in this stage.msCounter
spark.stage.io.recordsNumber of records written and read in this stage.{ record }Counterdirection
spark.stage.io.sizeAmount of data written and read at this stage.bytesCounterdirection
spark.stage.jvm_gc_timeThe amount of time the JVM spent on garbage collection in this stage.msCounter
spark.stage.memory.peakPeak memory used by internal data structures created during shuffles, aggregations and joins in this stage.bytesCounter
spark.stage.memory.spilledThe amount of memory moved to disk due to size constraints (spilled) in this stage.bytesCounter
spark.stage.shuffle.blocks_fetchedNumber of blocks fetched in shuffle operations in this stage.{ block }Countersource
spark.stage.shuffle.fetch_wait_timeTime spent in this stage waiting for remote shuffle blocks.msCounter
spark.stage.shuffle.io.diskAmount of data read to disk in shuffle operations (sometimes required for large blocks, as opposed to the default behavior of reading into memory).bytesCounter
spark.stage.shuffle.io.read.sizeAmount of data read in shuffle operations in this stage.bytesCountersource
spark.stage.shuffle.io.recordsNumber of records written or read in shuffle operations in this stage.{ record }Counterdirection
spark.stage.shuffle.io.write.sizeAmount of data written in shuffle operations in this stage.bytesCounter
spark.stage.shuffle.write_timeTime spent blocking on writes to disk or buffer cache in this stage.nsCounter
spark.stage.statusA one-hot encoding representing the status of this stage.{ status }UpDownCounterstage_active, stage_complete, stage_pending, stage_failed
spark.stage.task.activeNumber of active tasks in this stage.{ task }UpDownCounter
spark.stage.task.resultNumber of tasks with a specific result in this stage.{ task }Counterstage_task_result
spark.stage.task.result_sizeThe amount of data transmitted back to the driver by all the tasks in this stage.bytesCounter

Attributes

Attribute NameDescriptionTypeValues
directionWhether the metric is in regards to input or output operations.stringin, out
resultThe result of the executor tasks for which the metric was recorded.stringcompleted, failed
gc_typeThe type of the garbage collection performed for the metric.stringmajor, minor
resultThe result of the job stages or tasks for which the metric was recorded.stringcompleted, failed, skipped
locationThe location of the memory for which the metric was recorded..stringon_heap, off_heap
typeThe type of pool memory for which the metric was recorded.stringdirect, mapped
statusThe status of the DAGScheduler stages for which the metric was recorded.stringwaiting, running
sourceThe source from which data was fetched for the metric.stringlocal, remote
activeWhether the stage for which the metric was recorded is active.bool
completeWhether the stage for which the metric was recorded is complete.bool
failedWhether the stage for which the metric was recorded is failed.bool
pendingWhether the stage for which the metric was recorded is pending.bool
resultThe result of the stage tasks for which the metric was recorded.stringcompleted, failed, killed
stateThe state of the memory for which the metric was recorded.stringused, free

Resource Attributes

Attribute NameDescriptionTypeEnabled
spark.application.idThe ID of the application for which the metric was recorded.string
spark.application.nameThe name of the application for which the metric was recorded.string
spark.executor.idThe ID of the executor for which the metric was recorded.string
spark.job.idThe ID of the job for which the metric was recorded.int
spark.stage.attempt.idThe ID of the stage attempt for which the metric was recorded.int
spark.stage.idThe ID of the application stage for which the metric was recorded.int

Configuration

Example Configuration

# ./bin/otelcontribcol_darwin_arm64 --config ./receiver/apachesparkreceiver/testdata/config.yaml
receivers:
  apachespark:
    collection_interval: 15s
exporters:
  file:
    path: ./receiver/apachesparkreceiver/output/metrics.json

service:
  pipelines:
    metrics:
      receivers: [apachespark]
      exporters: [file]

Last generated: 2026-04-20