Fix calltrace_storage counters accumulating unboundedly across rotations by jbachorik · Pull Request #427 · DataDog/java-profiler

jbachorik · 2026-03-20T17:37:07Z

What does this PR do?:
Fixes calltrace_storage_bytes and calltrace_storage_traces counters that grew monotonically (cumulative total ever allocated) instead of reflecting the current live memory footprint.

Motivation:
clearTableOnly() frees the allocator chunks but never decremented the global counters. During each processTraces() rotation:

Preserved traces are re-inserted via putWithExistingId() → counters incremented again
Old standby and active tables are cleared → counters NOT decremented

Over time (e.g. ~2 rotations/min × 20 hours) the counters accumulated the cumulative total of all trace bytes ever allocated, which was misreported as 3.35 GB of live memory when the actual footprint was tens of MB.

Additional Notes:
clearTableOnly() is called from both the direct clear() path and the deferred-free path in processTraces(), so a single fix covers both cases.

The decrement pass uses the same slot iteration as collect() and runs after waitForAllRefCountsToClear(), which guarantees no concurrent writers — so the byte/trace counts are stable and exact.

How to test the change?:
Run the profiler for several upload cycles and verify calltrace_storage_bytes and calltrace_storage_traces stay bounded rather than growing indefinitely. The values should reflect the actual live traces in the three hash tables at any point.

For Datadog employees:

If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.
JIRA: [JIRA-XXXX]

Unsure? Have a question? Request a review!

clearTableOnly() was freeing memory without decrementing the global CALLTRACE_STORAGE_BYTES/TRACES counters, causing them to grow monotonically (cumulative total ever allocated, not current live size). Add a decrement pass in clearTableOnly() using the same table iteration as collect(), computed after waitForAllRefCountsToClear() ensures no concurrent writers. Both the direct clear() path and the deferred-free processTraces() path go through clearTableOnly(), so a single fix covers both. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-03-20T17:51:02Z

CI Test Results

Run: #24658333136 | Commit: 72a70ab | Duration: 24m 28s (longest job)

✅ All 32 test jobs passed

Status Overview

JDK	glibc-aarch64/debug	glibc-amd64/debug	musl-aarch64/debug	musl-amd64/debug
8	-	✅	-	-
8-ibm	-	✅	-	-
8-j9	✅	✅	-	-
8-librca	-	-	✅	✅
8-orcl	-	✅	-	-
11	-	✅	-	-
11-j9	✅	✅	-	-
11-librca	-	-	✅	✅
17	✅	✅	-	-
17-graal	✅	✅	-	-
17-j9	✅	✅	-	-
17-librca	-	-	✅	✅
21	✅	✅	-	-
21-graal	✅	✅	-	-
21-librca	-	-	✅	✅
25	✅	✅	-	-
25-graal	✅	✅	-	-
25-librca	-	-	✅	✅

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0

Updated: 2026-04-20 09:42:13 UTC

- Deduplicate trace pointers in clearTableOnly() to avoid over-decrement when put() reuses the same CallTrace* across chained tables - Update test assertions: counters are 0 after stop (correct for leak detection), not in range [1,100] Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Live counters are 0 after stop because processTraces() decrements them during finishChunk(). The recording already captures pre-cleanup values (writeCounters runs before writeCpool/processTraces). Read those instead. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Fixes calltrace_storage_bytes / calltrace_storage_traces so they reflect current live calltrace storage instead of monotonically accumulating across processTraces() rotations, and updates tests to validate counter values from the JFR recording (pre-final-cleanup).

Changes:

Decrement calltrace_storage_* counters when clearing calltrace hash tables to prevent unbounded accumulation across rotations.
Gate unwind timing measurements in profiler.cpp behind #ifdef COUNTERS to avoid overhead when counters are disabled.
Update CPU/wallclock context tests to assert recorded (JFR) counter values via a new AbstractProfilerTest#getRecordedCounterValue() helper.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
ddprof-test/src/test/java/com/datadoghq/profiler/wallclock/BaseContextWallClockTest.java	Switches counter assertions to JFR-recorded values (pre-cleanup).
ddprof-test/src/test/java/com/datadoghq/profiler/cpu/ContextCpuTest.java	Switches counter assertions to JFR-recorded values (pre-cleanup).
ddprof-test/src/test/java/com/datadoghq/profiler/AbstractProfilerTest.java	Adds helper to read `datadog.ProfilerCounter` values from the recording.
ddprof-lib/src/main/cpp/profiler.cpp	Wraps unwind timing measurements with `#ifdef COUNTERS`.
ddprof-lib/src/main/cpp/callTraceHashTable.h / .cpp	Adds counter-decrement pass during table clears to prevent counter accumulation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-20T07:37:22Z

+  long long freed_traces = 0;
+  std::unordered_set<CallTrace*> seen;


decrementCounters() builds an unordered_set of seen CallTrace* but never reserves capacity. On large tables this will cause repeated rehashing while iterating every slot. Since LongHashTable::size() is available, consider summing sizes across the chain and calling seen.reserve(estimatedDistinct) before insertion to keep the clear/rotation overhead bounded.

Suggested change

long long freed_traces = 0;

std::unordered_set<CallTrace*> seen;

long long freed_traces = 0;

size_t estimated_entries = 0;

for (LongHashTable *t = _table; t != nullptr; t = t->prev()) {

estimated_entries += t->size();

}

std::unordered_set<CallTrace*> seen;

seen.reserve(estimated_entries);

Addressed — pre-computed the sum of sizes across the chain and called seen.reserve(estimated_entries) to avoid repeated rehashing on large tables.

Copilot · 2026-04-20T07:37:23Z

+            assertInRange(test.getRecordedCounterValue("calltrace_storage_traces"), 1, 100);
+            assertInRange(test.getRecordedCounterValue("calltrace_storage_bytes"), 1024, 8 * 1024 * 1024);
+            // live counters are 0 after stop (all traces freed - correct, non-leaking behaviour)
            Map<String, Long> debugCounters = profiler.getDebugCounters();


The comment says live counters are 0 after stopProfiler(), but the test no longer asserts anything about calltrace_storage_traces/bytes in debugCounters. Since this PR changes counter semantics, it would be good to assert they are actually 0 here (and thus verify the post-cleanup state matches the intended non-leaking behavior).

Suggested change

Map<String, Long> debugCounters = profiler.getDebugCounters();

Map<String, Long> debugCounters = profiler.getDebugCounters();

assertEquals(0, debugCounters.get("calltrace_storage_traces"));

assertEquals(0, debugCounters.get("calltrace_storage_bytes"));

Addressed — added assertions for calltrace_storage_traces and calltrace_storage_bytes being 0 after getDebugCounters() to verify the non-leaking post-cleanup state.

Copilot · 2026-04-20T07:37:23Z

+        assertInRange(getRecordedCounterValue("calltrace_storage_traces"), 1, 100);
+        assertInRange(getRecordedCounterValue("calltrace_storage_bytes"), 1024, 8 * 1024 * 1024);
+        // live counters are 0 after stop (all traces freed - correct, non-leaking behaviour)
        Map<String, Long> debugCounters = profiler.getDebugCounters();


The test now validates the recorded (pre-cleanup) counter values, but it no longer asserts the live calltrace_storage_* counters after stopProfiler(). Given the fix is about counters not accumulating/leaking, consider asserting debugCounters.get("calltrace_storage_traces") and debugCounters.get("calltrace_storage_bytes") are 0 here (consistent with the nearby comment and linear_allocator_* == 0).

Suggested change

Map<String, Long> debugCounters = profiler.getDebugCounters();

Map<String, Long> debugCounters = profiler.getDebugCounters();

assertEquals(0, debugCounters.get("calltrace_storage_traces"));

assertEquals(0, debugCounters.get("calltrace_storage_bytes"));

Addressed — same fix: added assertions for calltrace_storage_traces and calltrace_storage_bytes being 0 after stopProfiler(), consistent with the nearby comment and linear_allocator checks.

- callTraceHashTable.cpp:116 — pre-reserve unordered_set to avoid rehashing - BaseContextWallClockTest.java:204 — assert calltrace_storage_* counters are 0 after stop - ContextCpuTest.java:114 — assert calltrace_storage_* counters are 0 after stop Co-Authored-By: muse <muse@noreply>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

ddprof-lib/src/main/cpp/callTraceHashTable.cpp:154

In clearTableOnly(), the loop that “clears previous chain pointers” doesn’t actually traverse the whole chain: once table->setPrev(nullptr) runs, the for loop’s increment (table = table->prev()) becomes nullptr and exits after the first iteration. prev_table is computed but never used. Consider iterating via the saved prev_table (or explicitly documenting/simplifying to only break the head link) so the code matches the intent and avoids confusion.

  // Clear previous chain pointers to prevent traversal during deallocation
  for (LongHashTable *table = _table; table != nullptr; table = table->prev()) {
    LongHashTable *prev_table = table->prev();
    if (prev_table != nullptr) {
      table->setPrev(nullptr);  // Clear link before deallocation
    }
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jbachorik added the AI label Mar 20, 2026

jbachorik changed the title ~~Fix calltrace_storage counters accumulating unboundedly across rotations~~ [WIP] Fix calltrace_storage counters accumulating unboundedly across rotations Mar 22, 2026

jbachorik and others added 6 commits March 22, 2026 20:37

Guard counter decrement loop with #ifdef COUNTERS

26e08e0

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Guard TSC timing in hot path with #ifdef COUNTERS

e9e6271

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Extract counter decrement logic into decrementCounters()

5d2efda

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into jb/fix_calltrace_counters2

bf80991

jbachorik requested a review from Copilot April 20, 2026 07:31

Copilot started reviewing on behalf of jbachorik April 20, 2026 07:31 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

jbachorik added AI: Author Signed-off no-release-notes labels Apr 20, 2026

jbachorik marked this pull request as ready for review April 20, 2026 07:50

jbachorik requested a review from a team as a code owner April 20, 2026 07:50

jbachorik changed the title ~~[WIP] Fix calltrace_storage counters accumulating unboundedly across rotations~~ Fix calltrace_storage counters accumulating unboundedly across rotations Apr 20, 2026

jbachorik requested a review from Copilot April 20, 2026 08:27

Copilot started reviewing on behalf of jbachorik April 20, 2026 08:28 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

jbachorik added the no-review label Apr 20, 2026

dd-octo-sts bot approved these changes Apr 20, 2026

View reviewed changes

jbachorik merged commit f9ea8f2 into main Apr 20, 2026
146 checks passed

jbachorik deleted the jb/fix_calltrace_counters2 branch April 20, 2026 09:15

github-actions bot added this to the 1.41.0 milestone Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix calltrace_storage counters accumulating unboundedly across rotations#427

Fix calltrace_storage counters accumulating unboundedly across rotations#427
jbachorik merged 8 commits intomainfrom
jb/fix_calltrace_counters2

jbachorik commented Mar 20, 2026

Uh oh!

dd-octo-sts bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

jbachorik Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

jbachorik Apr 20, 2026

Uh oh!

Copilot AI Apr 20, 2026

Uh oh!

jbachorik Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		long long freed_traces = 0;
		std::unordered_set<CallTrace*> seen;

Conversation

jbachorik commented Mar 20, 2026

Uh oh!

dd-octo-sts bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Test Results

Status Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

jbachorik Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

jbachorik Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

jbachorik Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dd-octo-sts bot commented Mar 20, 2026 •

edited

Loading