Skip to content

Mark check raw file injection as @Flaky on Zulu 8#11375

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 4 commits into
masterfrom
brian.marks/flaky-zulu8-log-injection
May 15, 2026
Merged

Mark check raw file injection as @Flaky on Zulu 8#11375
gh-worker-dd-mergequeue-cf854d[bot] merged 4 commits into
masterfrom
brian.marks/flaky-zulu8-log-injection

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented May 15, 2026

What Does This Do

Adds JavaVirtualMachine.isZulu8() and extends the @Flaky condition on LogInjectionSmokeTest.check raw file injection to include Zulu 8 alongside IBM 8 and Oracle JDK 8.

Motivation

CI Visibility data for the last 30 days shows 23 failures of check raw file injection on master, all 100% on Zulu 8 (1.8.0_482 and 1.8.0_492). Failure mode: logLines.size() == 7 got 3. Spread evenly across the three JUL→Log4j2 backend variants (JULInterfaceLog4j2{Backend,BackendNoTags,LatestBackend}).

Root cause analysis: a JDK 8 race between java.util.logging.LogManager.<clinit> and ClassLoader.initSystemClassLoader() when something during agent premain triggers LogManager class loading. On Zulu 8 the trigger is the backported JFR's use of java.util.logging from premain code paths. The agent's existing waitForJUL guard at Agent.java:382-396 mitigates this most of the time, but leaks at ~0.25% — when the leak fires, the user's -Djava.util.logging.manager property is silently ignored and JUL falls back to the default LogManager, so subsequent Logger.info(...) calls don't reach Log4j2's appenders.

The same failure mode already justifies @Flaky on Oracle JDK 8 (same JFR backport) and IBM 8 (where OkHttp transitively loads IBMSASL → JUL); Zulu 8 belongs alongside them.

Additional Notes

  • This suppresses the flaky CI signal but does NOT fix the underlying agent race. Real-world Zulu 8 users running with a custom LogManager (Log4j2 JUL bridge, JBoss LogManager, etc.) plus dd-java-agent still experience the same intermittent JUL routing loss. A proper fix in Agent.java (e.g. pre-initializing LogManager early in premain, or fully deferring OkHttp/network init out of premain) should follow up.
  • Local repro attempts (1000 iter Mac arm64) and CI repro attempts (~140 gradle iter + 300 direct-JVM iter) did NOT capture the failure — bug rate is too low to reliably hit. Diagnosis is from CI Visibility statistical data + agent code analysis (Agent.java:376-379 explicitly documents this race) + JDK 8 source review of LogManager.<clinit> and ClassLoader.initSystemClassLoader().

Contributor Checklist

  • Format the title according to the contribution guidelines
  • Assign the type: and (comp: or inst:) labels — type: bug, comp: logging, tag: ai generated, tag: no release notes
  • Avoid using close, fix, or any linking keywords when referencing an issue
  • Update the CODEOWNERS file on source file addition, migration, or deletion — N/A (no file additions)
  • Update public documentation with any new configuration flags or behaviors — N/A (test-only change)

Jira ticket: N/A

The smoketest fails ~0.25% of the time on Zulu 8 with
`logLines.size() == 7` (got 3) — root cause is a JDK 8 race between
java.util.logging.LogManager.<clinit> and ClassLoader.initSystemClassLoader()
when a JFR-instrumented class (Zulu 8 backports JFR) loads JUL during
agent premain. The Agent.java waitForJUL guard mitigates most cases but
leaks ~0.25%. The same failure mode already justifies @flaky on Oracle
JDK 8 (same JFR backport) and IBM 8 (IBMSASL triggers the same race);
Zulu 8 belongs alongside them.

This adds JavaVirtualMachine.isZulu8() + a test + extends the @flaky
condition. No production code path changes — Zulu 8 users on a custom
LogManager (e.g. Log4j2 JUL bridge) are still potentially affected by
the underlying agent race; this only suppresses the flaky CI signal.
@bm1549 bm1549 added type: bug Bug report and fix tag: no release notes Changes to exclude from release notes comp: logging Tracer internal logging tag: ai generated Largely based on code generated by an AI or LLM labels May 15, 2026
@bm1549 bm1549 marked this pull request as ready for review May 15, 2026 17:22
@bm1549 bm1549 requested review from a team as code owners May 15, 2026 17:22
@bm1549 bm1549 requested review from amarziali and mtoffl01 and removed request for a team May 15, 2026 17:22
@bm1549 bm1549 added this pull request to the merge queue May 15, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 15, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 15, 2026

View all feedbacks in Devflow UI.

2026-05-15 20:00:06 UTC ℹ️ Start processing command /merge


2026-05-15 20:00:11 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).


2026-05-15 21:13:48 UTC ℹ️ MergeQueue: This merge request was merged

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 15, 2026
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 19ee457 into master May 15, 2026
574 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the brian.marks/flaky-zulu8-log-injection branch May 15, 2026 21:13
@github-actions github-actions Bot added this to the 1.63.0 milestone May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: logging Tracer internal logging tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants