fix(concurrent): resolve ThreadPoolExecutor shutdown deadlock in signal handlers#151482
fix(concurrent): resolve ThreadPoolExecutor shutdown deadlock in signal handlers#151482Synteri wants to merge 1 commit into
Conversation
|
The following commit authors need to sign the Contributor License Agreement: |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates ThreadPoolExecutor shutdown/submission behavior to better handle shutdown races and avoid potential deadlocks by introducing reentrancy-aware shutdown logic.
Changes:
- Switch
_shutdown_lockfromLocktoRLockto allow re-entrant shutdown paths. - Add a post-enqueue shutdown check in
submit()that cancels/raises if shutdown has begun. - Detect “reentrant” shutdown and skip synchronous thread joining in that case.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self._work_queue.put(w) | ||
| if self._shutdown or _shutdown: | ||
| f.cancel() | ||
| raise RuntimeError('cannot schedule new futures after shutdown') | ||
| self._adjust_thread_count() |
| # Detect if we are called reentrantly (e.g. from a signal handler on a thread | ||
| # already holding self._shutdown_lock) | ||
| reentrant = self._shutdown_lock._is_owned() |
| if wait and not reentrant: | ||
| for t in self._threads: | ||
| t.join() |
| self._work_queue.put(None) | ||
| if wait: | ||
|
|
||
| # If we are reentrant, we cannot join threads synchronously because the current |
| if self._shutdown or _shutdown: | ||
| f.cancel() | ||
| raise RuntimeError('cannot schedule new futures after shutdown') |
e9ab197 to
10c03d3
Compare
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Description
This PR resolves a deadlock that occurs when attempting to shut down a
ThreadPoolExecutorfrom within an OS signal handler (such asSIGTERMorSIGINT) if the main thread is interrupted while already executingsubmit()(which holds the executor's internal_shutdown_lock).Root Cause
Because
_shutdown_lockis a standard, non-reentrantthreading.Lock(), synchronous signal handlers executed on the main thread will deadlock if they attempt to acquire the lock a second time.Simply changing the lock to an
RLock()is insufficient because it introduces correctness and task leakage issues:shutdown(wait=True)reentrantly, it blocks the main thread to join worker threads, which can deadlock if those worker threads are waiting for the GIL.shutdown()executes and terminates the worker threads, when the signal handler exits,submit()resumes and places a task on the queue. This task will leak and hang indefinitely since all worker threads are already dead.Solution
This patch implements a safe, reentrancy-aware shutdown mechanism:
_shutdown_locktothreading.RLock(): Prevents self-deadlocks on lock acquisition when executing inside signal handlers.shutdown(): Usesself._shutdown_lock._is_owned()to check if the lock is already owned by the calling thread. If reentrant, it skips the synchronoust.join()loop to prevent blocking the interrupted thread. (Includes a safety fallback if the Python runtime does not implement_is_owned()).submit(): Ifself._shutdownis set toTruereentrantly (detected at the end of thesubmit()critical section), we cancel the future, setw.task = Noneto clear references to user task positional/keyword arguments (avoiding memory leaks), and raise aRuntimeError.Verification
We verified the fix by simulating tight-loop signal delivery (
SIGINT) during execution of a high-throughputsubmitloop._shutdown_lockand hangs the process.RuntimeError, and the process terminates successfully with exit code0.