Skip to content

HIgh Cardinality Metrics #4442

@jpadilla

Description

@jpadilla

Summary

Some labels can lead to metrics with high cardinality

Motivation

The following metrics have labels that could lead to high cardinality issues.

disk_total_files_created
disk_total_files_deleted
total_input_records
input_num_parse_errors
buffered_input_records
input_total_bytes
output_buffered_batches
rss_bytes
dbsp_inputs
pipeline_completed
output_num_transport_errors
input_total_records
input_num_transport_errors
dbsp_spine_batches_per_level
total_processed_records
dbsp_outputs
cpu_msecs
output_buffered_records
output_num_encode_errors
output_transmitted_records
dbsp_total_size
dbsp_spine_ongoing_merges
input_buffered_records
output_transmitted_bytes

Labels:

  • pipeline
  • endpoint
  • worker
  • gid
  • level

I'm not sure if making these configurable makes sense. I also wonder if pipeline name instead of pipeline id might be useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions