Skip to content

Latest commit

 

History

History
210 lines (166 loc) · 17.5 KB

File metadata and controls

210 lines (166 loc) · 17.5 KB

Pipeline Metrics

This reference lists all of the metrics that Feldera exports through its /metrics endpoint in Prometheus exposition format. It is automatically generated using the documentation embedded in Prometheus output.

All of the metrics exported by a particular Feldera pipeline are labeled with the pipeline's UUID as pipeline and its name as pipeline_name. Some metrics have additional labels, as documented below.

See Monitoring and Profiling for a guide to setting up Prometheus and Grafana with Feldera. The Feldera template dashboard is a sample Grafana dashboard for Feldera.

Process Metrics

These metrics report statistics for a running Feldera pipeline process. When a pipeline process is killed and restarts from a checkpoint, the new process's metrics are for it alone, not cumulative with any previous instantiations.

These metrics are intended to match the standard Prometheus definitions.

Name Type Description
process_cpu_seconds_total counter Total user and system CPU time spent in seconds.
process_max_fds gauge Maximum number of open file descriptors.
process_open_fds gauge Number of open file descriptors.
process_resident_memory_bytes gauge Resident set size in bytes.
process_start_time_seconds counter Start time of the process in seconds since the Unix epoch.
process_threads gauge Number of OS threads in the process.
process_virtual_memory_bytes gauge Virtual memory size in bytes.
process_virtual_memory_max_bytes gauge Maximum amount of virtual memory available in bytes.

Feldera metrics

These metrics report statistics for Feldera operations.

Name Type Description
feldera_checkpoint_delay_seconds histogram Sub-duration of feldera_checkpoint_runtime_seconds during which pipeline execution was blocked.
feldera_checkpoint_records_processed_total counter Total number of records that had processed when the most recent checkpoint successfully committed.
feldera_checkpoint_runtime_seconds histogram Time to run checkpoint operations, in seconds, including time that the pipeline could continue executing along with the checkpoint.
feldera_checkpoint_written_bytes histogram Amount of data written to storage during checkpoints, in bytes.

DBSP metrics

These metrics report statistics for DBSP, the low-level mechanism on which Feldera is built.

Name Type Description
compaction_stall_duration_seconds counter Time in seconds a worker was stalled waiting for more merges to complete.
dbsp_operator_checkpoint_latency_seconds histogram The time that individual operator checkpoint operations delayed the pipeline, in seconds. (Because checkpoints run in parallel across workers, these will add up to more than feldera_checkpoint_delay_seconds.)
dbsp_runtime_elapsed_seconds_total counter Time elapsed while the pipeline is executing a step, multiplied by the number of foreground and background threads, in seconds.
dbsp_step_latency_seconds histogram Latency of DBSP steps over the last 60 seconds or 1000 steps, whichever is less, in seconds
dbsp_steps_total counter Total number of DBSP steps executed.
output_stall_seconds gauge If the pipeline is currently stalled because one or more output connectors' output buffers were full, this is the time in seconds for which it has been stalled.

If the pipeline is not currently stalled, this is zero.

If this is nonzero, then the output connectors causing the stall can be identified by observing which values of output_connector_queued_records are greater than or equal to the configured maximum (which defaults to 1,000,000).
output_stall_seconds_total counter Time in seconds that the pipeline was stalled because one or more output connectors' output buffers were full.

This value is greater than or equal to output_stall_seconds.

Record Processing

These metrics report overall counts of records as they pass through the pipeline. They accumulate across checkpoint and resume.

Name Type Description
output_buffered_batches gauge Number of batches of records currently buffered by the output connector.
records_input_buffered gauge Total amount of data currently buffered by all endpoints, in records.
records_input_buffered_bytes gauge Total amount of data currently buffered by all endpoints, in bytes.
records_input_bytes_total counter Total amount of data received from all connectors, in bytes.
records_input_total counter Total amount of data received from all connectors, in records.
records_late_total counter Number of records dropped due to LATENESS annotations.
records_processed_bytes_total counter Total amount of input processed by the pipeline, in bytes.
records_processed_total counter Total amount of input processed by the pipeline, in records.

Storage Performance

These metrics report the performance of storage, which allows Feldera to work with data larger than memory.

Name Type Description
files_created_total counter Total number of files created.
files_deleted_total counter Total number of files deleted.
storage_byte_seconds_total counter Storage usage integrated over time during this run of the pipeline, in bytes × seconds.
storage_cache_usage_bytes gauge The number of bytes of memory currently in use for caching data on storage.
storage_cache_usage_limit_bytes_total counter The limit for the number of bytes of memory for caching data on storage.
storage_read_block_bytes histogram Sizes in bytes of blocks read from storage.
storage_read_latency_seconds histogram Read latency for storage blocks in seconds
storage_sync_latency_seconds histogram Sync latency in seconds
storage_usage_bytes gauge The number of bytes of storage currently in use
storage_write_block_bytes histogram Sizes in bytes of blocks written to storage.
storage_write_latency_seconds histogram Write latency for storage blocks in seconds

Pipeline Status

These metrics report the status of the pipeline.

Name Type Description
pipeline_complete counter Transitions from 0 to 1 when pipeline completes.
pipeline_start_time_seconds counter Start time of the pipeline in seconds since the Unix epoch.

This will be earlier than process_start_time_seconds if the pipeline resumed from a checkpoint. This will be zero if the pipeline resumed from a checkpoint produced by a pipeline too old to record its start time.

Input Connectors

These metrics are per-input connector, labeled with endpoint set to the name of the input connector, which is either the name assigned in the SQL program or automatically generated as unnamed-<number>, where <number> counts starting from 1 for the first connector for a given table.

These metrics accumulate across checkpoint and resume.

For byte counters, for some input connectors, such as columnar formats, bytes are difficult to attribute accurately to records, so Feldera approximates. Feldera also approximately attributes byte counts to records when it processes only some of the records in a batch in a DBSP step. This approximation is corrected when the remainder of the batch is processed in a subsequent step, so it is invisible to users unless a pause or checkpoint happens mid-batch.

Name Type Description
input_connector_barrier gauge Whether the input connector is currently a barrier for checkpointing/suspend (1 for true, 0 for false).
input_connector_buffered_records gauge Amount of data currently buffered by an input connector, in records.
input_connector_buffered_records_bytes gauge Amount of data currently buffered by an input connector, in bytes.
input_connector_bytes_total counter Total number of bytes received by an input connector.
input_connector_completion_latency_seconds histogram Time between when the connector receives new data and when the pipeline processes this data, computes output updates, and sends these updates to all output connectors, over the last 600 seconds or 10,000 samples.
input_connector_delta_phase gauge Current phase: 0=loading_snapshot, 1=follow/streaming, 2=completed.
input_connector_delta_snapshot_completed_seconds gauge Unix epoch seconds when the snapshot phase finished (0 if not yet complete).
input_connector_delta_snapshot_records_total counter Total records loaded during the snapshot phase.
input_connector_end_of_input gauge Whether the input connector has reached end of input (1 for true, 0 for false).
input_connector_errors_parse_total counter Total number of errors encountered parsing records received by the input connector.
input_connector_errors_transport_total counter Total number of errors encountered by the input connector at the transport layer.
input_connector_extra_memory_bytes gauge Additional memory used by an input connector beyond that used for buffered records.
input_connector_processing_latency_seconds histogram Time between when the connector receives new data and when the pipeline processes this data and computes output updates, over the last 600 seconds or 10,000 samples.
input_connector_records_total counter Total number of records received by an input connector.
input_connector_running gauge Whether the input connector is running (1) or paused by the user (0).

Output Connectors

These metrics are per-output connector, labeled with endpoint set to the name of the output connector, which is either the name assigned in the SQL program or automatically generated as unnamed-<number>, where <number> counts starting from 1 for the first connector for a given view.

These metrics accumulate across checkpoint and resume.

Name Type Description
output_connector_buffered_records gauge Number of records currently buffered by the output connector.
output_connector_bytes_total counter Total number of bytes of records sent by the output connector.
output_connector_errors_encode_total counter Total number of errors encountered encoding records to send.
output_connector_errors_transport_total counter Total number of errors encountered at the transport layer sending records.
output_connector_extra_memory_bytes gauge Additional memory used by an output connector beyond that used for buffered records.
output_connector_queued_batches gauge Number of batches of records currently queued by the output connector.
output_connector_queued_records gauge Number of records currently queued by the output connector.
output_connector_records_total counter Total number of records sent by the output connector.

Checkpoint Synchronization

These metrics report the status of checkpoint synchronization.

Name Type Description
checkpoint_sync_pull_duration_seconds histogram Time taken to pull a checkpoint from object store in seconds.
checkpoint_sync_pull_failures counter Number of failures when pulling a checkpoint.
checkpoint_sync_pull_success counter Number of checkpoints pulled successfully.
checkpoint_sync_pull_transfer_speed_bytes_per_second histogram Transfer speed when pulling a checkpoint, in bytes per second.
checkpoint_sync_pull_transferred_bytes histogram Bytes transferred when pulling a checkpoint.
checkpoint_sync_push_duration_seconds histogram Time taken to push a checkpoint to object store in seconds.
checkpoint_sync_push_failures counter Number of failures when pushing a checkpoint.
checkpoint_sync_push_success counter Number of checkpoints pushed successfully.
checkpoint_sync_push_transfer_speed_bytes_per_second histogram Transfer speed when pushing a checkpoint, in bytes per second.
checkpoint_sync_push_transferred_bytes histogram Bytes transferred when pushing a checkpoint.

Transactions

These metrics report the status of transactions.

Name Type Description
transaction_commit_seconds histogram Transaction commit time, that is, from starting commit to finishing commit.
transaction_completed_operators gauge Number of operators that have been fully flushed while the current transaction is committing. This is 0 if no transaction is active, or if a transaction is running but has not yet started committing.
transaction_in_progress_operators gauge Number of operators that are currently being flushed while the current transaction is committing. This is 0 if no transaction is active, or if a transaction is running but has not yet started committing.
transaction_ingest_seconds histogram Transaction ingestion time, that is, from transaction start to start of commit.
transaction_remaining_operators gauge Number of operators that have not started flushing while the current transaction is committing. This is 0 if no transaction is active, or if a transaction is running but has not yet started committing.
transaction_state gauge 0 when no transaction is active, 1 when a transaction has started, 2 while a transaction is committing.