Cache span.kind as byte ordinal for fast isOutbound()#11116
Cache span.kind as byte ordinal for fast isOutbound()#11116
Conversation
DDSpanContext.getTag("span.kind") was consuming ~14% of foreground CPU
in span creation stress tests. It was called from DDSpan.isOutbound()
on every root span start and finish, falling through the getTag() switch
to a full TagMap hash-table lookup with potential synchronization.
This change caches span.kind as a volatile byte ordinal on DDSpanContext
(same dual-store pattern as httpStatusCode). The TagInterceptor now
intercepts SPAN_KIND to set the ordinal, and isOutbound() does a simple
byte comparison instead of getTag() + String.equals(). Benchmark shows
isOutbound() at constant ~2.8ns regardless of span kind.
tag: no release note
tag: ai generated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| return; | ||
| } | ||
| // Use identity checks first (canonical constants), then fall back to equals | ||
| if (kind == Tags.SPAN_KIND_SERVER || Tags.SPAN_KIND_SERVER.equals(kind)) { |
There was a problem hiding this comment.
The identity checks are good, but might be better if we skip the equals check at first. Then fast path will be solely comprised of identify checks.
Only if none of strict identity checks match, would we then fallback to equality checks -- and for that we could use a string switch.
There was a problem hiding this comment.
Alternatively, it would nice to create a static helper function to perform the identity & equals check.
|
|
||
| private volatile short httpStatusCode; | ||
|
|
||
| // Cached span.kind ordinal for fast isOutbound() checks. |
There was a problem hiding this comment.
I prefer to reduce the visibility of these internal constants to package visible.
That's especially true for SPAN_KIND_UNSET and SPAN_KIND_CUSTOM because they don't correspond to actual values.
| } | ||
|
|
||
| public void removeTag(String tag) { | ||
| if (Tags.SPAN_KIND.equals(tag)) { |
There was a problem hiding this comment.
Might as well use the same identity comparison approach here, too.
Might be best to introduce a static helper to do the equals comparison.
| // UNSET or CUSTOM -- fall through to tag map | ||
| Object value; | ||
| synchronized (unsafeTags) { | ||
| value = unsafeGetTag(key); |
There was a problem hiding this comment.
At this point key is known to be Tags.SPAN_KIND, this may optimize slightly better if we use the Tags.SPAN_KIND constant here.
| case Tags.SPAN_KIND: | ||
| { | ||
| byte ordinal = spanKindOrdinal; | ||
| if (ordinal != SPAN_KIND_UNSET && ordinal != SPAN_KIND_CUSTOM) { |
There was a problem hiding this comment.
If tweak this to ordinal > SPAN_KIND_UNSET && ordinal < SPAN_KIND_CUSTOM, we can help the JIT with bounds check elimination.
There was a problem hiding this comment.
Or as an alternative, we could just always read from SPAN_KIND_VALUES. And then only if the read value is null fallback to TagMap.
- Reduce visibility of SPAN_KIND_UNSET and SPAN_KIND_CUSTOM to package-private (they don't correspond to actual span.kind values) - Restructure setSpanKind: identity checks first as a fast path, then string switch fallback for non-interned strings. Extracted into static spanKindToOrdinal() helper. - Use identity-first comparison in removeTag() for SPAN_KIND check - Use ordinal > SPAN_KIND_UNSET && ordinal < SPAN_KIND_CUSTOM in getTag() to help JIT with bounds check elimination - Use Tags.SPAN_KIND constant instead of key in fallthrough unsafeGetTag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 56 metrics, 15 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1062079
Total [baseline] (11.127 s) : 0, 11126680
Agent [candidate] (1.064 s) : 0, 1064208
Total [candidate] (11.107 s) : 0, 11106734
section appsec
Agent [baseline] (1.254 s) : 0, 1254117
Total [baseline] (11.178 s) : 0, 11177591
Agent [candidate] (1.249 s) : 0, 1249038
Total [candidate] (11.156 s) : 0, 11155767
section iast
Agent [baseline] (1.237 s) : 0, 1236733
Total [baseline] (11.328 s) : 0, 11328499
Agent [candidate] (1.226 s) : 0, 1225780
Total [candidate] (11.298 s) : 0, 11298242
section profiling
Agent [baseline] (1.184 s) : 0, 1183637
Total [baseline] (11.042 s) : 0, 11042123
Agent [candidate] (1.182 s) : 0, 1182349
Total [candidate] (11.07 s) : 0, 11070197
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.234 ms) : 0, 1234
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (636.017 ms) : 0, 636017
BytebuddyAgent [candidate] (636.546 ms) : 0, 636546
AgentMeter [baseline] (29.565 ms) : 0, 29565
AgentMeter [candidate] (29.638 ms) : 0, 29638
GlobalTracer [baseline] (250.097 ms) : 0, 250097
GlobalTracer [candidate] (250.057 ms) : 0, 250057
AppSec [baseline] (32.542 ms) : 0, 32542
AppSec [candidate] (32.372 ms) : 0, 32372
Debugger [baseline] (60.252 ms) : 0, 60252
Debugger [candidate] (60.253 ms) : 0, 60253
Remote Config [baseline] (605.507 µs) : 0, 606
Remote Config [candidate] (611.236 µs) : 0, 611
Telemetry [baseline] (8.076 ms) : 0, 8076
Telemetry [candidate] (8.191 ms) : 0, 8191
Flare Poller [baseline] (7.403 ms) : 0, 7403
Flare Poller [candidate] (9.128 ms) : 0, 9128
section appsec
crashtracking [baseline] (1.214 ms) : 0, 1214
crashtracking [candidate] (1.229 ms) : 0, 1229
BytebuddyAgent [baseline] (664.902 ms) : 0, 664902
BytebuddyAgent [candidate] (662.421 ms) : 0, 662421
AgentMeter [baseline] (12.145 ms) : 0, 12145
AgentMeter [candidate] (12.088 ms) : 0, 12088
GlobalTracer [baseline] (250.8 ms) : 0, 250800
GlobalTracer [candidate] (249.295 ms) : 0, 249295
IAST [baseline] (24.666 ms) : 0, 24666
IAST [candidate] (24.611 ms) : 0, 24611
AppSec [baseline] (185.177 ms) : 0, 185177
AppSec [candidate] (184.729 ms) : 0, 184729
Debugger [baseline] (66.305 ms) : 0, 66305
Debugger [candidate] (65.859 ms) : 0, 65859
Remote Config [baseline] (609.675 µs) : 0, 610
Remote Config [candidate] (599.954 µs) : 0, 600
Telemetry [baseline] (8.365 ms) : 0, 8365
Telemetry [candidate] (8.347 ms) : 0, 8347
Flare Poller [baseline] (3.521 ms) : 0, 3521
Flare Poller [candidate] (3.46 ms) : 0, 3460
section iast
crashtracking [baseline] (1.243 ms) : 0, 1243
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (810.27 ms) : 0, 810270
BytebuddyAgent [candidate] (802.743 ms) : 0, 802743
AgentMeter [baseline] (11.762 ms) : 0, 11762
AgentMeter [candidate] (11.401 ms) : 0, 11401
GlobalTracer [baseline] (241.101 ms) : 0, 241101
GlobalTracer [candidate] (239.758 ms) : 0, 239758
IAST [baseline] (26.221 ms) : 0, 26221
IAST [candidate] (25.815 ms) : 0, 25815
AppSec [baseline] (31.915 ms) : 0, 31915
AppSec [candidate] (32.559 ms) : 0, 32559
Debugger [baseline] (61.322 ms) : 0, 61322
Debugger [candidate] (59.829 ms) : 0, 59829
Remote Config [baseline] (528.283 µs) : 0, 528
Remote Config [candidate] (531.484 µs) : 0, 531
Telemetry [baseline] (12.368 ms) : 0, 12368
Telemetry [candidate] (12.099 ms) : 0, 12099
Flare Poller [baseline] (3.432 ms) : 0, 3432
Flare Poller [candidate] (3.64 ms) : 0, 3640
section profiling
crashtracking [baseline] (1.171 ms) : 0, 1171
crashtracking [candidate] (1.178 ms) : 0, 1178
BytebuddyAgent [baseline] (690.495 ms) : 0, 690495
BytebuddyAgent [candidate] (690.428 ms) : 0, 690428
AgentMeter [baseline] (9.115 ms) : 0, 9115
AgentMeter [candidate] (9.026 ms) : 0, 9026
GlobalTracer [baseline] (207.277 ms) : 0, 207277
GlobalTracer [candidate] (207.15 ms) : 0, 207150
AppSec [baseline] (32.727 ms) : 0, 32727
AppSec [candidate] (32.756 ms) : 0, 32756
Debugger [baseline] (65.667 ms) : 0, 65667
Debugger [candidate] (65.276 ms) : 0, 65276
Remote Config [baseline] (577.855 µs) : 0, 578
Remote Config [candidate] (570.778 µs) : 0, 571
Telemetry [baseline] (7.811 ms) : 0, 7811
Telemetry [candidate] (7.791 ms) : 0, 7791
Flare Poller [baseline] (3.565 ms) : 0, 3565
Flare Poller [candidate] (3.514 ms) : 0, 3514
ProfilingAgent [baseline] (94.125 ms) : 0, 94125
ProfilingAgent [candidate] (93.554 ms) : 0, 93554
Profiling [baseline] (94.703 ms) : 0, 94703
Profiling [candidate] (94.117 ms) : 0, 94117
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1061869
Total [baseline] (8.836 s) : 0, 8836035
Agent [candidate] (1.064 s) : 0, 1064247
Total [candidate] (8.848 s) : 0, 8848050
section iast
Agent [baseline] (1.235 s) : 0, 1234909
Total [baseline] (9.597 s) : 0, 9597291
Agent [candidate] (1.231 s) : 0, 1230679
Total [candidate] (9.534 s) : 0, 9533842
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.262 ms) : 0, 1262
crashtracking [candidate] (1.253 ms) : 0, 1253
BytebuddyAgent [baseline] (636.961 ms) : 0, 636961
BytebuddyAgent [candidate] (636.538 ms) : 0, 636538
AgentMeter [baseline] (29.58 ms) : 0, 29580
AgentMeter [candidate] (29.62 ms) : 0, 29620
GlobalTracer [baseline] (250.175 ms) : 0, 250175
GlobalTracer [candidate] (250.098 ms) : 0, 250098
AppSec [baseline] (32.575 ms) : 0, 32575
AppSec [candidate] (32.579 ms) : 0, 32579
Debugger [baseline] (59.559 ms) : 0, 59559
Debugger [candidate] (59.361 ms) : 0, 59361
Remote Config [baseline] (618.465 µs) : 0, 618
Remote Config [candidate] (588.273 µs) : 0, 588
Telemetry [baseline] (8.182 ms) : 0, 8182
Telemetry [candidate] (8.039 ms) : 0, 8039
Flare Poller [baseline] (6.636 ms) : 0, 6636
Flare Poller [candidate] (9.712 ms) : 0, 9712
section iast
crashtracking [baseline] (1.247 ms) : 0, 1247
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (808.683 ms) : 0, 808683
BytebuddyAgent [candidate] (807.128 ms) : 0, 807128
AgentMeter [baseline] (11.666 ms) : 0, 11666
AgentMeter [candidate] (11.659 ms) : 0, 11659
GlobalTracer [baseline] (241.618 ms) : 0, 241618
GlobalTracer [candidate] (239.937 ms) : 0, 239937
IAST [baseline] (26.129 ms) : 0, 26129
IAST [candidate] (26.502 ms) : 0, 26502
AppSec [baseline] (33.186 ms) : 0, 33186
AppSec [candidate] (31.265 ms) : 0, 31265
Debugger [baseline] (57.676 ms) : 0, 57676
Debugger [candidate] (57.301 ms) : 0, 57301
Remote Config [baseline] (1.146 ms) : 0, 1146
Remote Config [candidate] (518.292 µs) : 0, 518
Telemetry [baseline] (13.315 ms) : 0, 13315
Telemetry [candidate] (14.981 ms) : 0, 14981
Flare Poller [baseline] (3.506 ms) : 0, 3506
Flare Poller [candidate] (3.655 ms) : 0, 3655
LoadParameters
See matching parameters
SummaryFound 2 performance improvements and 0 performance regressions! Performance is the same for 18 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section baseline
no_agent (17.208 ms) : 17037, 17379
. : milestone, 17208,
appsec (18.659 ms) : 18471, 18846
. : milestone, 18659,
code_origins (17.773 ms) : 17600, 17946
. : milestone, 17773,
iast (19.085 ms) : 18896, 19273
. : milestone, 19085,
profiling (18.341 ms) : 18157, 18524
. : milestone, 18341,
tracing (17.832 ms) : 17655, 18008
. : milestone, 17832,
section candidate
no_agent (18.109 ms) : 17925, 18293
. : milestone, 18109,
appsec (18.576 ms) : 18388, 18763
. : milestone, 18576,
code_origins (17.822 ms) : 17646, 17999
. : milestone, 17822,
iast (17.709 ms) : 17534, 17884
. : milestone, 17709,
profiling (18.088 ms) : 17910, 18265
. : milestone, 18088,
tracing (17.935 ms) : 17757, 18112
. : milestone, 17935,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section baseline
no_agent (1.25 ms) : 1237, 1263
. : milestone, 1250,
iast (3.296 ms) : 3248, 3344
. : milestone, 3296,
iast_FULL (6.084 ms) : 6022, 6146
. : milestone, 6084,
iast_GLOBAL (3.791 ms) : 3727, 3855
. : milestone, 3791,
profiling (2.013 ms) : 1994, 2031
. : milestone, 2013,
tracing (1.888 ms) : 1872, 1905
. : milestone, 1888,
section candidate
no_agent (1.238 ms) : 1225, 1250
. : milestone, 1238,
iast (3.262 ms) : 3217, 3306
. : milestone, 3262,
iast_FULL (6.004 ms) : 5943, 6064
. : milestone, 6004,
iast_GLOBAL (3.67 ms) : 3611, 3729
. : milestone, 3670,
profiling (2.119 ms) : 2100, 2137
. : milestone, 2119,
tracing (1.888 ms) : 1873, 1904
. : milestone, 1888,
DacapoParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 0 unstable metrics.
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section baseline
no_agent (15.163 s) : 15163000, 15163000
. : milestone, 15163000,
appsec (14.748 s) : 14748000, 14748000
. : milestone, 14748000,
iast (18.271 s) : 18271000, 18271000
. : milestone, 18271000,
iast_GLOBAL (18.417 s) : 18417000, 18417000
. : milestone, 18417000,
profiling (15.098 s) : 15098000, 15098000
. : milestone, 15098000,
tracing (15.052 s) : 15052000, 15052000
. : milestone, 15052000,
section candidate
no_agent (15.029 s) : 15029000, 15029000
. : milestone, 15029000,
appsec (14.688 s) : 14688000, 14688000
. : milestone, 14688000,
iast (18.719 s) : 18719000, 18719000
. : milestone, 18719000,
iast_GLOBAL (18.202 s) : 18202000, 18202000
. : milestone, 18202000,
profiling (14.828 s) : 14828000, 14828000
. : milestone, 14828000,
tracing (15.014 s) : 15014000, 15014000
. : milestone, 15014000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~a8f31c4270, baseline=1.62.0-SNAPSHOT~4666c89336
dateFormat X
axisFormat %s
section baseline
no_agent (1.487 ms) : 1476, 1499
. : milestone, 1487,
appsec (3.773 ms) : 3554, 3993
. : milestone, 3773,
iast (2.265 ms) : 2196, 2333
. : milestone, 2265,
iast_GLOBAL (2.324 ms) : 2254, 2394
. : milestone, 2324,
profiling (2.094 ms) : 2039, 2149
. : milestone, 2094,
tracing (2.086 ms) : 2032, 2139
. : milestone, 2086,
section candidate
no_agent (1.487 ms) : 1475, 1499
. : milestone, 1487,
appsec (2.536 ms) : 2482, 2591
. : milestone, 2536,
iast (2.282 ms) : 2212, 2351
. : milestone, 2282,
iast_GLOBAL (2.319 ms) : 2250, 2389
. : milestone, 2319,
profiling (2.101 ms) : 2046, 2156
. : milestone, 2101,
tracing (2.078 ms) : 2024, 2131
. : milestone, 2078,
|
Summary
span.kindas avolatile byteordinal onDDSpanContext, following the existinghttpStatusCodedual-store patternDDSpan.isOutbound()fromgetTag() + String.equals()to a constant-time byte comparison (~2.8ns vs ~10ns)SPAN_KINDinterception inTagInterceptorto populate the cache onsetTag()IsOutboundBenchmarkJMH benchmarkMotivation:
DDSpanContext.getTag("span.kind")consumed ~14% of foreground CPU in a 16-thread span creation stress test. It was called fromCoreTracer.onRootSpanStartedandonRootSpanFinishedon every root span, falling through thegetTag()switch to a fullTagMaphash-table lookup with potential synchronization.Benchmark results (JDK 1.8.0_382)
Test plan
*DDSpan*tests pass*TagInterceptor*tests pass🤖 Generated with Claude Code