JAVA-6155 add thread dumps to evergreen logs on failed test cases#1932
JAVA-6155 add thread dumps to evergreen logs on failed test cases#1932strogiyotec wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a JUnit 5 TestWatcher extension that emits a full JVM thread dump to the test logs whenever a test fails, making deadlocks and hung-thread scenarios easier to diagnose in Evergreen and locally.
Changes:
- Add
ThreadDumpOnFailureExtension(JUnit JupiterTestWatcher) that logs a thread dump on test failure. - Register the extension via
META-INF/services/...and enable JUnit Jupiter extension autodetection viajunit-platform.properties.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| driver-core/src/test/unit/com/mongodb/internal/diagnostics/ThreadDumpOnFailureExtension.java | Adds the failure hook and thread-dump collection logic. |
| driver-core/src/test/resources/META-INF/services/org.junit.jupiter.api.extension.Extension | Registers the extension for ServiceLoader-based autodetection. |
| driver-core/src/test/resources/junit-platform.properties | Enables JUnit Jupiter extension autodetection so the extension is picked up automatically. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public void testFailed(final ExtensionContext context, final Throwable cause) { | ||
| String testName = context.getDisplayName(); | ||
| String threadDump = getAllThreadsDump(); | ||
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); |
There was a problem hiding this comment.
testFailed ignores the provided cause, so the log entry won’t include the exception/stack trace that triggered the failure. Logging the throwable (e.g., using the error(String, Throwable) overload) would keep the failure reason adjacent to the thread dump in Evergreen/local logs.
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); | |
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump, cause); |
There was a problem hiding this comment.
the cause will already be provided on assertion fail I don't think duplicating it for thread dumps makes sense
| public void testFailed(final ExtensionContext context, final Throwable cause) { | ||
| String testName = context.getDisplayName(); | ||
| String threadDump = getAllThreadsDump(); | ||
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); | ||
| } | ||
|
|
||
| private static String getAllThreadsDump() { | ||
| final ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean(); | ||
| ThreadInfo[] threadInfos = threadMXBean.dumpAllThreads( | ||
| threadMXBean.isObjectMonitorUsageSupported(), | ||
| threadMXBean.isSynchronizerUsageSupported() | ||
| ); |
There was a problem hiding this comment.
getAllThreadsDump() (via ThreadMXBean.dumpAllThreads) can throw at runtime (e.g., SecurityException). If this happens inside testFailed, the extension itself may error and you’ll lose the diagnostic output. Consider wrapping thread-dump generation in a try/catch and logging a clear fallback message so the original test failure reporting isn’t impacted.
There was a problem hiding this comment.
good point, wrapped it with try catch
| */ | ||
| public final class ThreadDumpOnFailureExtension implements TestWatcher { | ||
|
|
||
| private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getName()); |
There was a problem hiding this comment.
Logger naming is inconsistent with other driver-core tests: most use a short suffix like ClassName.class.getSimpleName() (e.g., ConnectionPoolTest / ConnectionPoolAsyncTest). Passing class.getName() here yields a very long logger name (org.mongodb.driver.com.mongodb...) that’s harder to filter/read in logs; consider switching to getSimpleName() or a conventional test logger suffix.
| private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getName()); | |
| private static final Logger LOGGER = Loggers.getLogger(ThreadDumpOnFailureExtension.class.getSimpleName()); |
There was a problem hiding this comment.
replaced to use simple name
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import java.lang.management.ManagementFactory; | ||
| import java.lang.management.ThreadInfo; | ||
| import java.lang.management.ThreadMXBean; | ||
|
|
||
|
|
||
| import com.mongodb.internal.diagnostics.logging.Logger; | ||
| import com.mongodb.internal.diagnostics.logging.Loggers; | ||
|
|
||
| import org.junit.jupiter.api.extension.ExtensionContext; | ||
| import org.junit.jupiter.api.extension.TestWatcher; | ||
|
|
||
|
|
There was a problem hiding this comment.
Import order/grouping here is inconsistent with the repo’s established convention (typically project com.mongodb.* imports first, then org.*, then java.*, e.g. driver-core/src/test/unit/com/mongodb/internal/connection/ConnectionPoolTest.java:19-28). Reorder the imports to match the surrounding codebase style to keep diffs consistent and reduce merge noise.
| import java.lang.management.ManagementFactory; | |
| import java.lang.management.ThreadInfo; | |
| import java.lang.management.ThreadMXBean; | |
| import com.mongodb.internal.diagnostics.logging.Logger; | |
| import com.mongodb.internal.diagnostics.logging.Loggers; | |
| import org.junit.jupiter.api.extension.ExtensionContext; | |
| import org.junit.jupiter.api.extension.TestWatcher; | |
| import com.mongodb.internal.diagnostics.logging.Logger; | |
| import com.mongodb.internal.diagnostics.logging.Loggers; | |
| import org.junit.jupiter.api.extension.ExtensionContext; | |
| import org.junit.jupiter.api.extension.TestWatcher; | |
| import java.lang.management.ManagementFactory; | |
| import java.lang.management.ThreadInfo; | |
| import java.lang.management.ThreadMXBean; |
| public void testFailed(final ExtensionContext context, final Throwable cause) { | ||
| String testName = context.getDisplayName(); | ||
| String threadDump = getAllThreadsDump(); | ||
| LOGGER.error("Test failed: " + testName + "\nThread dump:\n" + threadDump); | ||
| } |
There was a problem hiding this comment.
This logs a full JVM thread dump on every failed test. When failures cascade (e.g., environment outage causing many tests to fail), this can produce very large Evergreen logs and make the original failures harder to find. Consider throttling (e.g., only dump once per JVM via an AtomicBoolean, or once per test class) and/or adding a clear header/footer delimiter so each dump is easy to locate in the log stream.
| @@ -0,0 +1 @@ | |||
| junit.jupiter.extensions.autodetection.enabled=true | |||
There was a problem hiding this comment.
junit-platform.properties is included in driver-core’s testArtifacts JAR (see buildSrc/src/main/kotlin/conventions/test-artifacts.gradle.kts:26-29), and several other modules depend on :driver-core:testArtifacts for their tests (e.g., driver-sync/build.gradle.kts:37-39). That means this setting may unintentionally enable JUnit Jupiter extension autodetection beyond driver-core depending on classpath resource resolution. If the intent is to scope this behavior to a specific module/test task, consider enabling it via the Gradle Test task/system property instead, or placing a shared properties file in a deliberately shared test-resources location.
| junit.jupiter.extensions.autodetection.enabled=true | |
| # NOTE: | |
| # This file is included in driver-core's testArtifacts JAR and is therefore | |
| # visible on the test classpath of other modules that depend on :driver-core:testArtifacts. | |
| # Enabling JUnit Jupiter extension autodetection here would unintentionally | |
| # turn it on for those modules as well. | |
| # | |
| # If you need autodetection, configure it in the specific module or Gradle | |
| # Test task instead (e.g. via system property junit.jupiter.extensions.autodetection.enabled=true), | |
| # or in a module-local junit-platform.properties that is not shared via testArtifacts. | |
| # | |
| # Example (do NOT uncomment here unless you really want it shared globally): | |
| # junit.jupiter.extensions.autodetection.enabled=true |
There was a problem hiding this comment.
AI review (Opus 3.8 1M)
driver-core/src/test/resources/junit-platform.properties
- 🟡
[important]L1: Autodetection leaks to downstream modules viatestArtifacts.
Verified:buildSrc/.../test-artifacts.gradle.ktsbuilds the test JAR withfrom(sourceSets.test.get().output), which bundles processed test resources (including this file and theMETA-INF/services
registration) alongside compiled test classes.driver-sync/build.gradle.kts:38(and other modules) declaretestImplementation(project(path = ":driver-core", configuration = "testArtifacts")). Net effect:
junit.jupiter.extensions.autodetection.enabled=trueand theThreadDumpOnFailureExtensionget activated for every consuming module's test run, not just driver-core. Two risks: (a) unintended global
activation the author may not have planned for; (b) if any consuming module ships its ownjunit-platform.properties, JUnit Platform's classpath resolution of a singlejunit-platform.propertiesbecomes
order-dependent/non-deterministic. If global activation is actually desired, that's defensible — but it should be a deliberate, documented decision. Otherwise scope it via the GradleTesttask system
property (systemProperty("junit.jupiter.extensions.autodetection.enabled", true)) in the relevant module(s).
driver-core/src/test/unit/com/mongodb/internal/diagnostics/ThreadDumpOnFailureExtension.java
-
🟡
[important]L44-54:ThreadInfo.toString()truncates each stack to 8 frames.
The JDK'sThreadInfo.toString()is hardcoded to emit at mostMAX_FRAMES = 8stack frames per thread, then prints.... Given the stated motivation (diagnosing a deadlock), 8 frames is often too shallow
to see where threads are actually blocked. For a complete dump, iterate frames yourself, e.g. build the header fromThreadInfoand append the fullinfo.getStackTrace(). The lock/monitor info you correctly
enabled (below) also only renders for frames within that 8-frame window when usingtoString(). -
🟢
[nit]L46-49: Captures dumps only for tests that fail, not tests that hang.
TestWatcher.testFailedfires only when a test produces a failure. A true deadlock that hangs until the Gradle/JVM test timeout kills the process never reachestestFailed, so no dump is produced for that
case. Worth being aware of relative to the deadlock-diagnosis goal; a JVM-level/@Timeout-driven approach would be needed to cover hangs. Not blocking for the failure-path use case. -
🟢
[nit]L20-31: Import grouping and double blank lines.
Imports are groupedjava.*first thencom.mongodb.*/org.*, with doubled blank lines (L21-22, L29-30). I checkedconfig/checkstyle/checkstyle.xml— onlyRedundantImport/UnusedImportsare enforced,
so this will not fail CI. Purely cosmetic; tidy up the consecutive blank lines if touching the file. -
🟢
[nit]L48-52: No delimiter between concatenated thread entries.
sb.append(info)relies on eachThreadInfo.toString()trailing newline. A clear header/footer (e.g.===== THREAD DUMP (test: X) =====) would make dumps far easier to locate and bound in large Evergreen
logs. -
🎉
[praise]L46-49: Good defensivedumpAllThreadscall — passingisObjectMonitorUsageSupported()/isSynchronizerUsageSupported()captures lock-ownership info, which is exactly what's useful for
deadlock analysis. Nicely done, and theSecurityExceptionfallback keeps the extension from masking the real test failure.
Prior Review Comments
- 🎉
[praise]Extension.java:50 — Wrap thread-dump generation in try/catch. Addressed — current code catchesSecurityExceptionand returns a fallback message. - 🎉
[praise]Extension.java:36 — Logger naming should use a short suffix. Addressed — nowLoggers.getLogger(ThreadDumpOnFailureExtension.class.getSimpleName()). Confirmed against
Loggers.getLogger(prependsorg.mongodb.driver.), sogetSimpleName()is the right convention. - 💡
[suggestion]Extension.java:42 — Log thecausetoo. Disputed by the author ("the cause will already be provided on assertion fail"). Reasonable: JUnit already reports the failure cause
independently, so this is largely redundant. Minor counterpoint — logging the cause keeps it adjacent to the dump in the same log line for correlation — but not worth blocking. - 🟡
[important]junit-platform.properties:1 — testArtifacts JAR propagates this setting to other modules. Unaddressed. Merged with the L1 finding above; I independently verified the build wiring
confirms the concern. - 🟢
[nit]Extension.java:43 — Full dump on every failed test can flood logs on cascading failures. Unaddressed. Valid for failure storms; consider throttling (once per JVM viaAtomicBoolean)
and/or the delimiter suggested above. Low priority for a diagnostic util. - 🟢
[nit]Extension.java:30 — Import order inconsistent with repo convention. Unaddressed. Cosmetic only (not Checkstyle-enforced) — see nit above.
Decision:
junit-platform.propertiespropagation — confirm whether enabling autodetection + the extension across all:driver-coretestArtifactsconsumers (driver-sync, etc.) is intended. If yes, add a
comment in the properties file documenting it. If no, scope it via a per-module GradleTestsystem property instead.- 8-frame truncation — replace
ThreadInfo.toString()with explicit iteration overinfo.getStackTrace()so deep deadlock stacks aren't cut off at 8 frames.
AI review copilot (GPT 5.5)
Medium — driver-core/src/test/resources/junit-platform.properties:1Placing junit-platform.properties under driver-core/src/test/resources means it is packaged into driver-core:testArtifacts, which is consumed by driver-sync, driver-reactive-streams, driver-scala, anddriver-legacy. Those modules will unintentionally inherit JUnit extension autodetection and load ThreadDumpOnFailureExtension for all their test failures.
That changes behavior outside driver-core and could produce very large logs during broad/cascading failures. Suggested fix: avoid classpath-wide autodetection from shared test artifacts, or configure it per module via Gradle/system property if global behavior is intended.
JAVA-6155
This PR adds thread dumps whenever test fails
The motivation behind this PR was a failing test case
poolClearedExceptionMustBeRetryableClaude identified a deadlock that I wasn't able to reproduce locally
With this PR develop could easily see thread dumps of all the threads directly in evergreen and locally
Here is an example of a patch that deliberately makes a test case fail and it's output