Skip to main content

Apachespark Receiver

Status Available in: contrib Maintainers: @Caleb-Hurshman, @mrsillydog Source: opentelemetry-collector-contrib

Supported Telemetry

Metrics

Overview

Purpose

The purpose of this component is to monitor Apache Spark clusters and the applications running on them through the collection of performance metrics like memory utilization, CPU utilization, shuffle operations, garbage collection time, I/O operations, and more.

Prerequisites

This receiver supports Apache Spark versions:
  • 3.3.2+

Configuration

These configuration options are for connecting to an Apache Spark application. The following settings are optional:
  • collection_interval: (default = 60s): This receiver collects metrics on an interval. This value must be a string readable by Golang’s time.ParseDuration. Valid time units are ns, us (or Β΅s), ms, s, m, h.
  • initial_delay (default = 1s): defines how long this receiver waits before starting.
  • endpoint: (default = http://localhost:4040): Apache Spark endpoint to connect to in the form of [http][://]{host}[:{port}]
  • application_names: An array of Spark application names for which metrics should be collected. If no application names are specified, metrics will be collected for all Spark applications running on the cluster at the specified endpoint.

Example Configuration

receivers:
  apachespark:
    collection_interval: 60s
    endpoint: http://localhost:4040
    application_names:
    - PythonStatusAPIDemo
    - PythonLR
The full list of settings exposed for this receiver are documented in config.go with detailed sample configurations in testdata/config.yaml.

Metrics

Details about the metrics produced by this receiver can be found in metadata.yaml

Metrics

Metric NameDescriptionUnitTypeAttributes
βœ… spark.driver.block_manager.disk.usageDisk space used by the BlockManager.mbUpDownCounter
βœ… spark.driver.block_manager.memory.usageMemory usage for the driver’s BlockManager.mbUpDownCounterlocation, state
βœ… spark.driver.code_generator.compilation.average_timeAverage time spent during CodeGenerator source code compilation operations.msGauge
βœ… spark.driver.code_generator.compilation.countNumber of source code compilation operations performed by the CodeGenerator.{ compilation }Counter
βœ… spark.driver.code_generator.generated_class.average_sizeAverage class size of the classes generated by the CodeGenerator.bytesGauge
βœ… spark.driver.code_generator.generated_class.countNumber of classes generated by the CodeGenerator.{ class }Counter
βœ… spark.driver.code_generator.generated_method.average_sizeAverage method size of the classes generated by the CodeGenerator.bytesGauge
βœ… spark.driver.code_generator.generated_method.countNumber of methods generated by the CodeGenerator.{ method }Counter
βœ… spark.driver.code_generator.source_code.average_sizeAverage size of the source code generated by a CodeGenerator code generation operation.bytesGauge
βœ… spark.driver.code_generator.source_code.operationsNumber of source code generation operations performed by the CodeGenerator.{ operation }Counter
βœ… spark.driver.dag_scheduler.job.activeNumber of active jobs currently being processed by the DAGScheduler.{ job }UpDownCounter
βœ… spark.driver.dag_scheduler.job.countNumber of jobs that have been submitted to the DAGScheduler.{ job }Counter
βœ… spark.driver.dag_scheduler.stage.countNumber of stages the DAGScheduler is either running or needs to run.{ stage }UpDownCounterscheduler_status
βœ… spark.driver.dag_scheduler.stage.failedNumber of failed stages run by the DAGScheduler.{ stage }Counter
βœ… spark.driver.executor.gc.operationsNumber of garbage collection operations performed by the driver.{ gc_operation }Countergc_type
βœ… spark.driver.executor.gc.timeTotal elapsed time during garbage collection operations performed by the driver.msCountergc_type
βœ… spark.driver.executor.memory.executionAmount of execution memory currently used by the driver.bytesUpDownCounterlocation
βœ… spark.driver.executor.memory.jvmAmount of memory used by the driver’s JVM.bytesUpDownCounterlocation
βœ… spark.driver.executor.memory.poolAmount of pool memory currently used by the driver.bytesUpDownCounterpool_memory_type
βœ… spark.driver.executor.memory.storageAmount of storage memory currently used by the driver.bytesUpDownCounterlocation
βœ… spark.driver.hive_external_catalog.file_cache_hitsNumber of file cache hits on the HiveExternalCatalog.{ hit }Counter
βœ… spark.driver.hive_external_catalog.files_discoveredNumber of files discovered while listing the partitions of a table in the Hive metastore{ file }Counter
βœ… spark.driver.hive_external_catalog.hive_client_callsNumber of calls to the underlying Hive Metastore client made by the Spark application.{ call }Counter
βœ… spark.driver.hive_external_catalog.parallel_listing_jobsNumber of parallel listing jobs initiated by the HiveExternalCatalog when listing partitions of a table.{ listing_job }Counter
βœ… spark.driver.hive_external_catalog.partitions_fetchedTable partitions fetched by the HiveExternalCatalog.{ partition }Counter
βœ… spark.driver.jvm_cpu_timeCurrent CPU time taken by the Spark driver.nsCounter
βœ… spark.driver.live_listener_bus.droppedNumber of events that have been dropped by the LiveListenerBus.{ event }Counter
βœ… spark.driver.live_listener_bus.postedNumber of events that have been posted on the LiveListenerBus.{ event }Counter
βœ… spark.driver.live_listener_bus.processing_time.averageAverage time taken for the LiveListenerBus to process an event posted to it.msGauge
βœ… spark.driver.live_listener_bus.queue_sizeNumber of events currently waiting to be processed by the LiveListenerBus.{ event }UpDownCounter
βœ… spark.executor.disk.usageDisk space used by this executor for RDD storage.bytesUpDownCounter
βœ… spark.executor.gc_timeElapsed time the JVM spent in garbage collection in this executor.msCounter
βœ… spark.executor.input_sizeAmount of data input for this executor.bytesCounter
βœ… spark.executor.memory.usageStorage memory used by this executor.bytesUpDownCounter
βœ… spark.executor.shuffle.io.sizeAmount of data written and read during shuffle operations for this executor.bytesCounterdirection
βœ… spark.executor.storage_memory.usageThe executor’s storage memory usage.bytesUpDownCounterlocation, state
βœ… spark.executor.task.activeNumber of tasks currently running in this executor.{ task }UpDownCounter
βœ… spark.executor.task.limitMaximum number of tasks that can run concurrently in this executor.{ task }UpDownCounter
βœ… spark.executor.task.resultNumber of tasks with a specific result in this executor.{ task }Counterexecutor_task_result
βœ… spark.executor.timeElapsed time the JVM spent executing tasks in this executor.msCounter
βœ… spark.job.stage.activeNumber of active stages in this job.{ stage }UpDownCounter
βœ… spark.job.stage.resultNumber of stages with a specific result in this job.{ stage }Counterjob_result
βœ… spark.job.task.activeNumber of active tasks in this job.{ task }UpDownCounter
βœ… spark.job.task.resultNumber of tasks with a specific result in this job.{ task }Counterjob_result
βœ… spark.stage.disk.spilledThe amount of disk space used for storing portions of overly large data chunks that couldn’t fit in memory in this stage.bytesCounter
βœ… spark.stage.executor.cpu_timeCPU time spent by the executor in this stage.nsCounter
βœ… spark.stage.executor.run_timeAmount of time spent by the executor in this stage.msCounter
βœ… spark.stage.io.recordsNumber of records written and read in this stage.{ record }Counterdirection
βœ… spark.stage.io.sizeAmount of data written and read at this stage.bytesCounterdirection
βœ… spark.stage.jvm_gc_timeThe amount of time the JVM spent on garbage collection in this stage.msCounter
βœ… spark.stage.memory.peakPeak memory used by internal data structures created during shuffles, aggregations and joins in this stage.bytesCounter
βœ… spark.stage.memory.spilledThe amount of memory moved to disk due to size constraints (spilled) in this stage.bytesCounter
βœ… spark.stage.shuffle.blocks_fetchedNumber of blocks fetched in shuffle operations in this stage.{ block }Countersource
βœ… spark.stage.shuffle.fetch_wait_timeTime spent in this stage waiting for remote shuffle blocks.msCounter
βœ… spark.stage.shuffle.io.diskAmount of data read to disk in shuffle operations (sometimes required for large blocks, as opposed to the default behavior of reading into memory).bytesCounter
βœ… spark.stage.shuffle.io.read.sizeAmount of data read in shuffle operations in this stage.bytesCountersource
βœ… spark.stage.shuffle.io.recordsNumber of records written or read in shuffle operations in this stage.{ record }Counterdirection
βœ… spark.stage.shuffle.io.write.sizeAmount of data written in shuffle operations in this stage.bytesCounter
βœ… spark.stage.shuffle.write_timeTime spent blocking on writes to disk or buffer cache in this stage.nsCounter
βœ… spark.stage.statusA one-hot encoding representing the status of this stage.{ status }UpDownCounterstage_active, stage_complete, stage_pending, stage_failed
βœ… spark.stage.task.activeNumber of active tasks in this stage.{ task }UpDownCounter
βœ… spark.stage.task.resultNumber of tasks with a specific result in this stage.{ task }Counterstage_task_result
βœ… spark.stage.task.result_sizeThe amount of data transmitted back to the driver by all the tasks in this stage.bytesCounter

Attributes

Attribute NameDescriptionTypeValues
directionWhether the metric is in regards to input or output operations.stringin, out
resultThe result of the executor tasks for which the metric was recorded.stringcompleted, failed
gc_typeThe type of the garbage collection performed for the metric.stringmajor, minor
resultThe result of the job stages or tasks for which the metric was recorded.stringcompleted, failed, skipped
locationThe location of the memory for which the metric was recorded..stringon_heap, off_heap
typeThe type of pool memory for which the metric was recorded.stringdirect, mapped
statusThe status of the DAGScheduler stages for which the metric was recorded.stringwaiting, running
sourceThe source from which data was fetched for the metric.stringlocal, remote
activeWhether the stage for which the metric was recorded is active.bool
completeWhether the stage for which the metric was recorded is complete.bool
failedWhether the stage for which the metric was recorded is failed.bool
pendingWhether the stage for which the metric was recorded is pending.bool
resultThe result of the stage tasks for which the metric was recorded.stringcompleted, failed, killed
stateThe state of the memory for which the metric was recorded.stringused, free

Resource Attributes

Attribute NameDescriptionTypeEnabled
spark.application.idThe ID of the application for which the metric was recorded.stringβœ…
spark.application.nameThe name of the application for which the metric was recorded.stringβœ…
spark.executor.idThe ID of the executor for which the metric was recorded.stringβœ…
spark.job.idThe ID of the job for which the metric was recorded.intβœ…
spark.stage.attempt.idThe ID of the stage attempt for which the metric was recorded.int❌
spark.stage.idThe ID of the application stage for which the metric was recorded.intβœ…

Configuration

Example Configuration

# ./bin/otelcontribcol_darwin_arm64 --config ./receiver/apachesparkreceiver/testdata/config.yaml
receivers:
  apachespark:
    collection_interval: 15s
exporters:
  file:
    path: ./receiver/apachesparkreceiver/output/metrics.json

service:
  pipelines:
    metrics:
      receivers: [apachespark]
      exporters: [file]

Last generated: 2026-04-13