Use bitmask SpanKindFilter for per-span eligibility in metrics aggregator#11380
Draft
dougqh wants to merge 5 commits into
Draft
Use bitmask SpanKindFilter for per-span eligibility in metrics aggregator#11380dougqh wants to merge 5 commits into
dougqh wants to merge 5 commits into
Conversation
ConflatingMetricsAggregator.publish does a handful of redundant operations on every span. None individually is large; together they show as ~2.5% on the existing JMH benchmark once the benchmark actually exercises span.kind. - dedup span.isTopLevel(): publish() reads it into a local, then shouldComputeMetric read it again. Pass the cached value in. - resolve spanKind to String once: master called toString() twice per span (once inside spanKindEligible, once at the getPeerTags call site) and used HashSet contains on a CharSequence (which routes through equals on String). Normalize to String up front and reuse. - lazy-allocate the peer-tag list: getPeerTags() always allocated an ArrayList sized to features.peerTags() even when the span had none of those tags set. Defer allocation until the first match; return Collections.emptyList() when none hit. MetricKey already treats null/empty peerTags as emptyList, so no behavior change. Drop the spanKindEligible helper — the HashSet.contains call inlines fine in shouldComputeMetric. Update the JMH benchmark to set span.kind=client on every span. Without it the filter path short-circuits before the peer-tag and toString work, so the wins above aren't measurable. With it: baseline 6.755 us/op (CI [6.560, 6.950], stdev 0.129) optimized 6.585 us/op (CI [6.536, 6.634], stdev 0.033) 2 forks x 5 iterations x 15s. ~2.5% mean improvement and much tighter variance fork-to-fork. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce SpanKindFilter -- a tiny builder-built immutable filter whose state is an int bitmask indexed by the span.kind ordinals already cached on DDSpanContext. Each include* on the builder sets one bit (1 << ordinal); the runtime check is a single AND against (1 << span's ordinal). CoreSpan.isKind(SpanKindFilter) is the new entry point. DDSpan overrides it to do the bit-test directly against the cached ordinal -- no virtual call, no tag-map lookup. The two existing test-only CoreSpan impls (SimpleSpan and TraceGenerator.PojoSpan, the latter in two source sets) implement isKind by reading the span.kind tag and delegating to SpanKindFilter.matches(String), which converts via DDSpanContext.spanKindOrdinalOf and does the same AND. Refactor: DDSpanContext.setSpanKindOrdinal(String) now delegates to a new package-private static spanKindOrdinalOf(String) so the same string-to-ordinal mapping serves both the tag interceptor path and SpanKindFilter.matches. This is groundwork -- nothing in the codebase calls isKind yet. The next commit will replace the HashSet-based eligibility checks in ConflatingMetricsAggregator with SpanKindFilter instances. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the two ELIGIBLE_SPAN_KINDS_FOR_* HashSet<String> constants and the SPAN_KIND_INTERNAL.equals check with three SpanKindFilter instances: METRICS_ELIGIBLE_KINDS, PEER_AGGREGATION_KINDS, INTERNAL_KIND. Eligibility checks now go through span.isKind(filter), which on DDSpan is a volatile byte read against the already-cached span.kind ordinal plus a single bit-test. Also defer the span.kind tag read: previously read at the top of the publish loop and threaded through both shouldComputeMetric and the inner publish. isKind no longer needs the string, so the read can move down into the inner publish where it's still needed for the SPAN_KINDS cache key / MetricKey. Supporting changes: - DDSpanContext.spanKindOrdinalOf(String) is now public so non-DDSpan CoreSpan impls can compute the ordinal at tag-write time. - SpanKindFilter gains a public matches(byte) fast-path overload that callers with a pre-computed ordinal use directly. - SimpleSpan caches the ordinal in setTag(SPAN_KIND, ...), mirroring what TagInterceptor does for DDSpanContext, and its isKind now hits the byte fast path. Without this, the JMH benchmark (which uses SimpleSpan) would re-derive the ordinal on every isKind call and overstate the cost. Benchmark on the bench updated last commit (kind=client on every span, 4 forks x 5 iter x 15s): prior commit 6.585 ± 0.049 us/op this commit 6.903 ± 0.096 us/op The slight regression is a SimpleSpan-via-groovy-dispatch artifact -- the interface call to isKind through CoreSpan, then through SimpleSpan, then through SpanKindFilter.matches, doesn't fold as aggressively as a HashSet contains on a static field. In production DDSpan.isKind inlines to a context field read + ordinal byte read + bit-test, so the production path is faster than the prior HashSet approach. A DDSpan-based benchmark would show this; the existing SimpleSpan-based one doesn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing ConflatingMetricsAggregatorBenchmark uses SimpleSpan, a groovy mock. That's enough for measuring queue/CHM/MetricKey work, but it conceals the production cost of CoreSpan.isKind: SimpleSpan's isKind goes through groovy interface dispatch into SpanKindFilter.matches, while DDSpan.isKind inlines to a context byte-read + bit-test. This new benchmark uses real DDSpan instances created through a CoreTracer (with a NoopWriter so finishing doesn't reach the agent). Same shape as the SimpleSpan bench (64-span trace, span.kind=client, peer.hostname set). Numbers (2 forks x 5 iter x 15s): master: 6.428 +- 0.189 us/op (HashSet eligibility checks) this branch: 6.343 +- 0.115 us/op (SpanKindFilter bitmask) About 1.3% faster on the production path. The SimpleSpan benchmark in the same conditions shows a ~2.2% slowdown -- the mock's dispatch shape gives a misleading signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make SpanKindFilter.kindMask and its constructor private now that DDSpan.isKind no longer needs direct field access -- it delegates to SpanKindFilter.matches(byte). The Builder.build() in the same outer class still constructs instances via the private constructor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
Set<String>-based span.kind eligibility checks inConflatingMetricsAggregatorwith a newSpanKindFilterbitmask primitive. Each filter is anintmask indexed by the span.kind ordinal already cached onDDSpanContext; checking eligibility onDDSpanbecomes a context byte-read + bit-test (no tag-map lookup, no HashSet hash/equals).Layered as 5 commits so each step is reviewable in isolation:
isTopLevel(), lazy-allocate the peer-tag list, collapse a duplicatespanKind.toString(). Also updates the existing JMH benchmark to setspan.kind=clienton every span so the eligibility path is actually exercised (without it the bench short-circuits before any of this work).SpanKindFilter(builder + bitmask), addsboolean isKind(SpanKindFilter)toCoreSpan,DDSpanoverrides with the fast path, test-only impls (SimpleSpan, twoTraceGenerator.PojoSpans) implement viaSpanKindFilter.matches(String). RefactorsDDSpanContext.setSpanKindOrdinalto exposespanKindOrdinalOf(String)as a reusable static helper.ELIGIBLE_SPAN_KINDS_FOR_*Set<String>constants and theSPAN_KIND_INTERNAL.equalscheck with threeSpanKindFilterinstances. Defers the span.kind tag read until insidepublish()whereMetricKeystill needs it.SimpleSpancaches the ordinal atsetTagtime so the JMH benchmark's mock isn't measuring its own dispatch shape.SimpleSpan(groovy mock), where the newisKindpath goes through a less-inlinable interface call. The newConflatingMetricsAggregatorDDSpanBenchmarkuses realDDSpaninstances created viaCoreTracer(with aNoopWriter), so the JIT exercises the production fast path.kindMaskand the constructor are nowprivate;DDSpan.isKinddelegates tofilter.matches(byte)rather than reaching the field directly.Benchmark results
ConflatingMetricsAggregatorDDSpanBenchmark, 2 forks × 5 iterations × 15s:~1.3% faster on the production path, with tighter fork-to-fork variance. The CIs overlap so the headline number is within noise, but the centers move the right way and the new path is structurally cheaper (byte read + bit-test vs tag-map read + HashSet.contains).
The
SimpleSpanbenchmark in the same conditions shows a ~2-3% slowdown — that's an artifact of the groovy mock's dispatch shape, not a production cost, which is why the DDSpan-flavored benchmark was added.Test plan
./gradlew :dd-trace-core:test --tests 'datadog.trace.common.metrics.*'passes./gradlew :dd-trace-core:test --tests 'datadog.trace.core.DDSpan*'passes./gradlew :dd-trace-core:compileJava :dd-trace-core:compileTestGroovy :dd-trace-core:compileJmhJava :dd-trace-core:compileTraceAgentTestGroovyall green./gradlew spotlessCheckclean🤖 Generated with Claude Code