Skip to content

Potentially infinite loop and store trace in unowned slot in CallTraceHashTable::putWithExistingId()#578

Open
zhengyu123 wants to merge 2 commits into
mainfrom
zgu/calltrace_storage
Open

Potentially infinite loop and store trace in unowned slot in CallTraceHashTable::putWithExistingId()#578
zhengyu123 wants to merge 2 commits into
mainfrom
zgu/calltrace_storage

Conversation

@zhengyu123
Copy link
Copy Markdown
Contributor

@zhengyu123 zhengyu123 commented Jun 5, 2026

What does this PR do?:
This PR fixes a corner case, that may result in infinite loop and/or store trace in unowned slot.

Motivation:
Improve stability.

Additional Notes:

How to test the change?:

  • Regular CI tests
  • New test cases for this corner cases.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-14915

Unsure? Have a question? Request a review!

@zhengyu123 zhengyu123 requested a review from a team as a code owner June 5, 2026 00:39
@zhengyu123 zhengyu123 marked this pull request as draft June 5, 2026 00:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 699ab3f1e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +485 to 487
} else {
// No more slot, break out
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record overflow when preservation exhausts slots

When the scratch table is full during liveness preservation, this new exit path just breaks out and drops the preserved trace. This can happen because putWithExistingId() never expands the scratch/standby table, while the active table can grow beyond the initial 65k slots; if more live traces are preserved than fit in scratch, the next collect() will emit only the entries that fit and no overflow sentinel because _overflow was not incremented. Please handle this the same way as put()'s full-probe path so callers can at least see the overflow trace instead of silently losing live trace continuity.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbachorik Any reason it does not expand table here?

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 5, 2026

CI Test Results

Run: #27016835318 | Commit: 71a83dc | Duration: 15m 6s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-05 13:29:08 UTC

@zhengyu123 zhengyu123 marked this pull request as ready for review June 5, 2026 14:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53f193d0df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


bool ok = done.load();
if (!ok) {
t.detach();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid detaching a worker that still uses fixture state

If this timeout ever fires (the regression this test is meant to catch, or just a slow sanitizer/CI run), t.detach() lets the lambda continue executing storage->processTraces() and then write done while the test returns and TearDown() destroys storage and the stack locals captured by reference. That turns the intended timeout failure into a use-after-free/crash or a background thread corrupting later tests; use a lifetime-safe watchdog strategy instead of detaching a thread that captured fixture state.

Useful? React with 👍 / 👎.

@zhengyu123 zhengyu123 requested a review from jbachorik June 5, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant