gh-151475: Fix data race in faulthandler watchdog on free-threaded builds#152123
Open
timurmamedov1 wants to merge 1 commit into
Open
gh-151475: Fix data race in faulthandler watchdog on free-threaded builds#152123timurmamedov1 wants to merge 1 commit into
timurmamedov1 wants to merge 1 commit into
Conversation
…ded builds Add a PyMutex to serialize dump_traceback_later() and cancel_dump_traceback_later() calls. Without this, concurrent arm/cancel from multiple threads corrupts the cancel_event/running lock handshake, causing an abort from unlocking an unheld lock.
1923738 to
5014d9f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
dump_traceback_later()andcancel_dump_traceback_later()mutate the watchdog state in_PyRuntime.faulthandler.threadwithout synchronization. On free-threaded builds, concurrent calls corrupt thecancel_event/runninglock handshake, since two threads can both allocatecancel_event(leaking one lock), or release a lock they don't hold, causing a fatal abort.Fix by adding a
PyMutexto_faulthandler_runtime_stateand acquiring it in bothdump_traceback_laterandcancel_dump_traceback_laterentry points. The mutex is acquired after argument validation (which may call into Python) but before any shared state access.These are not hot paths and none run in signal-handler context, so a simple mutex is sufficient.
faulthandler: data races inenable()/disable()anddump_traceback_later()under free threading #151475