Skip to content

Cap metric label cardinality + hoist peer-tag schema; rename to ClientStatsAggregator#11387

Draft
dougqh wants to merge 8 commits into
dougqh/optimize-metric-keyfrom
dougqh/control-tag-cardinality
Draft

Cap metric label cardinality + hoist peer-tag schema; rename to ClientStatsAggregator#11387
dougqh wants to merge 8 commits into
dougqh/optimize-metric-keyfrom
dougqh/control-tag-cardinality

Conversation

@dougqh
Copy link
Copy Markdown
Contributor

@dougqh dougqh commented May 15, 2026

Summary

Sits on top of #11382. Bounds client-stats label cardinality, reworks peer-tag handling, renames the aggregator, and adds a design doc.

  • Cardinality control: replaces the per-field DDCaches with PropertyCardinalityHandler / TagCardinalityHandler. Each has a per-field budget; once exhausted, overflow values canonicalize to a blocked_by_tracer sentinel UTF8BytesString. Handlers reset every reporting cycle.
  • Canonicalize before hashing: AggregateTable.findOrInsert runs every label through its handler before computing the lookup hash, so cardinality-blocked values collapse into one bucket instead of fragmenting into N entries.
  • Peer tags reworked: producer captures peer-tag values only (parallel String[] to a PeerTagSchema.names array). tag:value interning happens on the aggregator thread via TagCardinalityHandler. The schema is synced once per trace via PeerTagSchema.currentSyncedTo(Set) with an identity-check fast path.
  • Rename: ConflatingMetricsAggregatorClientStatsAggregator.
  • Producer-side wins identified via JFR: use the cached span.kind byte ordinal through a new CoreSpan.getSpanKindString() (skips a tag-map lookup per metrics-eligible span); hoist schema.names out of capturePeerTagValues; avoid toString() allocation in isSynthetic.
  • Cleanups: fix TracerHealthMetrics.previousCounts size bug that would have silently dropped the new statsInboxFull counter; drop dead clearAggregates().
  • Docs: new docs/client_metrics_design.md covering the pipeline shape, the canonical-key trick, thread-safety contract, reporting cadence, failure modes, and benchmark numbers.

Benchmark

ClientStatsAggregatorDDSpanBenchmark (64 client-kind DDSpans per op, real CoreTracer):

Stage µs/op
master baseline 6.428
stack tip before this PR 2.454
+ peer-tag schema hoist 2.410
+ cached span-kind ordinal 1.995

~3.2× over master end to end.

Test plan

  • :dd-trace-core:test — metrics tests pass (existing + new AggregateTableTest cases for cardinality collapse)
  • JMH benchmark numbers reproduce locally
  • No behavior change to client-stats wire payload for traces within the cardinality budget

🤖 Generated with Claude Code

dougqh and others added 8 commits May 15, 2026 15:58
Replaces the per-field DDCache layer inside AggregateEntry with the two new
cardinality handlers. Each per-field handler holds a small HashMap working
set; when its budget is exhausted, subsequent values collapse to a stable
"blocked_by_tracer" sentinel UTF8BytesString rather than growing without
bound. The handlers are reset on the aggregator thread at the end of each
report() cycle (10s default), so the cardinality budget refreshes per
reporting interval.

Caches replaced (limits preserved from the prior DDCache sizes):

  RESOURCE_HANDLER         32
  SERVICE_HANDLER          32
  OPERATION_HANDLER        64
  SERVICE_SOURCE_HANDLER   16
  TYPE_HANDLER              8
  SPAN_KIND_HANDLER        16
  HTTP_METHOD_HANDLER       8
  HTTP_ENDPOINT_HANDLER    32
  GRPC_STATUS_CODE_HANDLER 32
  PEER_TAG_HANDLERS        per-tag-name TagCardinalityHandler, each 512

Two production-only changes to the handlers as the user wrote them:

- Fixed import: datadog.collections.tagmap6lazy.TagMap doesn't exist;
  TagCardinalityHandler now imports datadog.trace.api.TagMap which has the
  Entry API the handler uses.
- Added TagCardinalityHandler.register(String) overload so AggregateEntry's
  peer-tag canonicalization doesn't have to allocate a TagMap.Entry per
  call -- the snapshot already carries peer-tag values as a flattened
  String[] {name, value, ...}.

AggregateEntry split into two construction paths:

- forSnapshot(snapshot, agg): the hot path; runs each field through the
  appropriate handler.
- of(...): test-only factory; bypasses the handlers and creates UTF8
  instances directly, so tests don't pollute static handler state. Content-
  equality on the resulting entry still matches the production-built one.

Thread-safety: handlers are HashMap-backed and not safe for concurrent
access. Both forSnapshot and resetCardinalityHandlers must be called from
the aggregator thread. After the prior commits that moved MetricKey
construction to the aggregator thread, this is the only thread that
canonicalizes; the test factory path runs on test threads but doesn't
touch the handlers.

Reset semantics: clearing the handler's working set drops the {value ->
UTF8BytesString} mapping but doesn't invalidate existing AggregateEntry
fields -- those keep their UTF8BytesString references alive on their own.
Subsequent snapshots with the same content still resolve to the existing
entries via content-equality matches(). New values after reset get freshly
allocated UTF8BytesStrings via the handler.

Known limitation (not fixed here): hashOf(SpanSnapshot) hashes from the
raw snapshot fields, not from the post-handler canonical form. So when
cardinality is exceeded, multiple distinct raw values that collapse to
the "blocked_by_tracer" sentinel still produce distinct hashes and land
in different AggregateEntry buckets -- the wire payload will carry
multiple rows that all label as blocked. This is the same behavior the
prior DDCache-based design would have had at capacity. Collapsing those
into a single sentinel entry would require canonicalizing before hashing
and is a follow-up.

Tests: new CardinalityHandlerTest covers PropertyCardinalityHandler and
TagCardinalityHandler in isolation (hit/miss, over-limit blocking, reset
behavior, sentinel stability). Existing ConflatingMetricAggregatorTest /
SerializingMetricWriterTest / AggregateTableTest all pass unchanged
because the test factory bypasses handlers.

Benchmarks (2 forks x 5 iter x 15s) -- producer side unchanged because
the handlers live on the consumer thread:

  SimpleSpan bench:   3.114 +- 0.045 us/op   (prior: 3.123 +- 0.018)
  DDSpan bench:       2.364 +- 0.113 us/op   (prior: 2.412 +- 0.022)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit ran every snapshot through the cardinality handlers but still
hashed the raw snapshot fields. When a field exceeded its cardinality budget
the handlers collapsed many distinct values to a single "blocked_by_tracer"
sentinel, but the raw hashes were still all different -- so the blocked entries
fragmented across the AggregateTable. This commit makes hash + match work off
the canonical (post-handler) UTF8BytesString fields, so blocked values land in
the same bucket and merge into one entry.

How the lookup path changes
---------------------------

A new package-private AggregateEntry.Canonical scratch buffer:

  - holds the 10 canonical UTF8BytesString refs, primitives, peerTags list,
    and the precomputed keyHash;
  - exposes populate(SpanSnapshot) which runs each field through the
    appropriate handler and computes the long hash from the canonical refs;
  - exposes matches(AggregateEntry) for content-equality lookup;
  - exposes toEntry(AggregateMetric) which copies its refs into a fresh
    AggregateEntry on miss.

AggregateTable holds one Canonical instance and reuses it per findOrInsert.
On a hit nothing is allocated -- the buffer's refs feed the bucket walk and
matches() directly. On a miss the refs are copied into the new entry and the
buffer is overwritten on the next call.

Hash function
-------------

hashOf now takes UTF8BytesString fields (plus primitives + peerTags list)
instead of raw CharSequence/String from the snapshot. UTF8BytesString.hashCode
returns the underlying String's hash, so:

  - content-equal entries built via AggregateEntry.of(...) (test factory,
    bypasses handlers) produce the same hash as entries built via
    Canonical.toEntry(...) (production, via handlers);
  - all values that collapsed to "blocked_by_tracer" share that sentinel
    instance and therefore that hashCode -- they land in the same bucket and
    merge into one entry.

Matches
-------

The SpanSnapshot-keyed matches() on AggregateEntry is gone. Lookup goes
through Canonical.matches(entry) which compares the buffer's UTF8 fields
against the entry's UTF8 fields via Objects.equals (content equality on
UTF8BytesString). This is needed because across handler resets the
UTF8BytesString instance referenced by an existing entry differs from the
freshly-issued instance for the same content -- content-equality lets the
existing entry survive resets.

The peerTagPairsRaw field on AggregateEntry was previously kept for matching
against snapshot.peerTagPairs (the flat String[]). Canonical.matches uses
List.equals on the encoded UTF8 peerTags directly, so peerTagPairsRaw is
dropped.

New test in AggregateTableTest -- cardinalityBlockedValuesCollapseIntoOneEntry
inserts 50 distinct services into a table whose SERVICE_HANDLER has a
cardinality limit of 32, and asserts the final size is 33 (the 32 in-budget
services plus a single collapsed "blocked_by_tracer" entry, not 50 separate
entries).

Benchmarks (2 forks x 5 iter x 15s) -- producer side unchanged:

  SimpleSpan bench:   3.117 +- 0.026 us/op   (prior: 3.114 +- 0.045)
  DDSpan bench:       2.344 +- 0.114 us/op   (prior: 2.364 +- 0.113)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chema-indexed handlers

Replaces the producer's early {@code (name, value)}-pair encoding with a
schema-based design: peer-tag values are captured into a parallel String
array, and the consumer applies the matching {@link TagCardinalityHandler}
by index using a {@link PeerTagSchema}'s parallel name/handler arrays.

This removes the {@code Map<String, TagCardinalityHandler>} the prior commit
left in {@code AggregateEntry} -- handler lookup is now a single array
dereference instead of a hashmap probe.

PeerTagSchema
-------------

New package-private class that holds:

  - {@code String[] names}    -- peer-tag names in stable order
  - {@code TagCardinalityHandler[] handlers} -- parallel to names

Two schemas exist: a static singleton {@code INTERNAL} for the internal-kind
{@code base.service} case, and a {@code CURRENT} schema for the peer-
aggregation kinds (client/producer/consumer) that lazily refreshes when
{@code features.peerTags()} returns a different set of names.

Each {@link SpanSnapshot} captures the schema reference it was built against
so producer and consumer agree on the indexing even if {@code CURRENT}
changes between capture and consumption.

A fast-path identity check (cached last input Set instance) keeps the
{@code currentSyncedTo} call cheap: when the producer hands in the same
Set instance as last time -- the steady-state case --
{@code currentSyncedTo} returns immediately without iterating names. The
{@code matches()} loop only runs when the Set instance changes, which in
production is rare (only on remote-config reconfiguration).

Snapshot shape
--------------

{@code SpanSnapshot.peerTagPairs} (a flat {@code [name0, value0, name1,
value1, ...]} array) is replaced by:

  - {@code PeerTagSchema peerTagSchema}  -- nullable; schema for the values
  - {@code String[] peerTagValues}       -- parallel to schema.names

The producer captures only values; the consumer constructs the encoded
{@code "name:value"} UTF8 forms via {@code schema.handler(i).register(value)}
on its own thread.

Consumer-side cleanups bundled in
---------------------------------

While here, also addresses the perf review items raised against the prior
commit:

  - {@code hashOf}'s peer-tag loop is now indexed iteration; no more
    iterator allocation per snapshot.
  - {@code Canonical} now owns a reusable {@code peerTagsBuffer} ArrayList
    that's cleared+refilled per {@code populate} call -- zero allocation
    on the hit path. The buffer is copied into an immutable list only on
    miss when the entry needs to own it long-term.
  - {@code Canonical.matches} uses indexed list comparison; no iterator
    alloc in {@code List.equals}.
  - The {@code HashMap<String, TagCardinalityHandler> PEER_TAG_HANDLERS}
    on {@code AggregateEntry} is gone, replaced by the {@link
    PeerTagSchema}'s parallel array layout.

Benchmark (2 forks x 5 iter x 15s)
----------------------------------

  SimpleSpan bench:   3.165 +- 0.032 us/op   (prior: 3.117 +- 0.026)
  DDSpan bench:       2.727 +- 0.018 us/op   (prior: 2.344 +- 0.114)

Some producer-side regression from the per-snapshot schema sync (volatile
read + identity check). The fast-path identity comparison keeps it small;
hoisting the sync out of the per-snapshot loop is possible but would change
behavior in the edge case where {@code features.peerTags()} returns
different Sets within a single trace (covered by an existing test). Choosing
correctness over the marginal speedup.

Tests
-----

AggregateTableTest's snapshot builder is updated to construct a schema +
values via {@code PeerTagSchema.currentSyncedTo}, exercising the same code
path as production. Existing peer-tag test in {@code
ConflatingMetricAggregatorTest} still passes unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Conflating" in the name dates from the prior design that used a Batch
pool + pending map to conflate up to 64 hits per inbox slot. That mechanism
is gone -- the producer now publishes one SpanSnapshot per span and the
consumer's AggregateTable is the conflation point. The new name matches the
existing protocol/metric terminology (HealthMetrics.onClientStat*,
stats.flush_payloads, etc.).

File renames:

  ConflatingMetricsAggregator.java         -> ClientStatsAggregator.java
  ConflatingMetricAggregatorTest.groovy    -> ClientStatsAggregatorTest.groovy
  ConflatingMetricsAggregatorBenchmark     -> ClientStatsAggregatorBenchmark
  ConflatingMetricsAggregatorDDSpan*       -> ClientStatsAggregatorDDSpan*

Plus all symbol references in MetricsAggregatorFactory and the test fixtures
that referenced the old class name.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small follow-ups carried over from a /techdebt pass:

- TracerHealthMetrics: previousCounts array was sized 51, but the prior
  commits added a 52nd reporter (statsInboxFull). Without this fix the new
  counter's report() call would throw ArrayIndexOutOfBoundsException; the
  Flush task swallows that exception, so the failure would be silent
  (statsInboxFull would just never make it to statsd).

- Aggregator: removes the now-dead public clearAggregates() method. The
  ClearSignal route from ClientStatsAggregator.disable() supplanted it
  several commits ago; the method had no remaining callers.

- TagCardinalityHandler: removes the unused register(TagMap.Entry) overload
  and its isValidType helper. The String-keyed overload covers all current
  callers (AggregateEntry's peer-tag canonicalization).

- PeerTagSchema: spotless-driven javadoc reflow only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ClientStatsAggregator.publish was calling features.peerTags() +
PeerTagSchema.currentSyncedTo for every span. Peer-tag configuration is
stable for the duration of a single trace publish in production --
DDAgentFeaturesDiscovery returns the same Set instance until remote-config
reconfiguration -- so the per-snapshot sync is wasted work.

Move the sync to once per publish(trace) and pass the resolved schema to
the inner publish(span, isTopLevel, peerAggSchema). INTERNAL-kind spans
still use the static PeerTagSchema.INTERNAL regardless.

Behavior boundary
-----------------

Schema changes from features.peerTags() now take effect at the next
publish(trace) call rather than mid-trace. Production-equivalent (a trace
takes microseconds to milliseconds; remote-config refreshes are seconds
apart), but a Spock test that used `>>> [...]` to mock different peerTags()
returns on successive calls within one trace no longer makes sense in the
new model. That test is rewritten to assert the production-relevant case:
peer-tag NAMES are stable, peer-tag VALUES vary per span, distinct value
combinations produce distinct aggregate buckets.

Benchmark (2 forks x 5 iter x 15s)
----------------------------------

  SimpleSpan bench:   3.133 +- 0.057 us/op   (prior: 3.165 +- 0.032)
  DDSpan bench:       2.454 +- 0.082 us/op   (prior: 2.727 +- 0.018)

Recovers ~270 ns/op on the DDSpan bench -- most of the regression introduced
by the per-snapshot lookup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JFR profiling showed ~21% of producer CPU time spent in tag-map lookups
during ClientStatsAggregator.publish. One of those lookups -- span.kind --
is redundant because DDSpanContext already caches the kind as a byte
ordinal that resolves to a String via a small array.

- Add CoreSpan.getSpanKindString() with a default that falls back to the
  tag map for non-DDSpan impls; DDSpan overrides to delegate to the
  context's cached resolution.
- Hoist schema.names array out of the capturePeerTagValues loop.
- Avoid an unnecessary toString() in isSynthetic by declaring
  SYNTHETICS_ORIGIN as String and using contentEquals.

Benchmark (ClientStatsAggregatorDDSpanBenchmark):
  before: 2.410 us/op
  after:  1.995 us/op  (~17% improvement)
  vs. master baseline (6.428 us/op): now ~3.2x faster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the producer/consumer split, the canonical-key trick that makes
cardinality-blocking actually save space, the once-per-trace peer-tag
schema sync, the role of each file in datadog.trace.common.metrics, and
the rationale behind the redesign from ConflatingMetricsAggregator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh added type: enhancement Enhancements and improvements tag: performance Performance related changes tag: no release notes Changes to exclude from release notes comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM labels May 15, 2026
* peerTagNames} differ from the cached set. Designed to be called from the producer hot path: the
* common case is a single volatile read and an array-length / set-contains comparison.
*/
static PeerTagSchema currentSyncedTo(Set<String> peerTagNames) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about this part. I'd like to figure out how to make schema regeneration clear at the feature discovery level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes tag: performance Performance related changes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant