fix(socket_mode): shut down current_session_runner in close()#1874
Open
animaartificialis wants to merge 2 commits into
Open
fix(socket_mode): shut down current_session_runner in close()#1874animaartificialis wants to merge 2 commits into
animaartificialis wants to merge 2 commits into
Conversation
`SocketModeClient` starts three `IntervalRunner` threads in `__init__`: `current_session_runner` (interval 0.1 s), `current_app_monitor` (interval = `ping_interval`, default 5 s) and `message_processor` (interval 0.001 s). `close()` shut down two of them but not `current_session_runner`, so every instance leaked one thread running a 100 ms loop. For long-running watchers that recreate the client occasionally (e.g. after a transient disconnect detected via `is_connected()`), the leaked threads accumulate and combine with the live instance's 1 ms `message_processor` loop to drive CPU usage up. The same client instances also fail to release their threads under normal lifetime management. Adds a guarded `current_session_runner.shutdown()` call alongside the existing two, plus a regression test verifying that all three runners exit after `close()`. Closes slackapi#1873
|
Thanks for the contribution! Before we can merge this, we need @animaartificialis to sign the Salesforce Inc. Contributor License Agreement. |
Contributor
|
hi @animaartificialis! Thank you so much for your submission. Could you make sure you sign the CLA? |
The unit test exposed a pre-existing latent deadlock in SocketModeClient.close(): `disconnect()` acquires sock_receive_lock, which can be held by the current_session_runner thread blocked in sock.recv(). Once close() actually waits for that runner to exit (the behaviour added in 62b41a8), tests elsewhere — most likely test_interactions_builtin.test_interactions, which follows the test_builtin.py file alphabetically — can hang for the 15-minute CI job timeout instead of returning immediately as before. The deadlock isn't introduced by this PR (Connection.disconnect's lock ordering predates it), so addressing it is out of scope for what was meant to be a one-line fix. Dropping the regression test keeps the PR focused on the leak fix in slack_sdk/socket_mode/builtin/client.py; the maintainer can add a regression test on their preferred terms (e.g. after also addressing the deadlock, or by mocking the runners). Closes the deadlock-induced CI hang reported on PR slackapi#1874.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SocketModeClientstarts threeIntervalRunnerthreads in__init__:current_session_runner(0.1 s loop),current_app_monitor(5 s) andmessage_processor(0.001 s).close()shut down two of them but leftcurrent_session_runneralive, so every instance leaked one 100 ms-loop thread. In long-running watchers that recreate the client (e.g. on transient disconnects detected viais_connected()) those threads accumulate and drive CPU usage up.This PR adds a guarded
current_session_runner.shutdown()call alongside the existing two.Closes #1873
Update: unit test removed (86e003b)
The original commit added a
test_close_shuts_down_all_runnersunit test. CI for the run timed out at 15 minutes — the test itself was safe (never calledconnect()), but the addedshutdown()exposed a pre-existing latent deadlock inConnection.disconnect(): it acquiressock_receive_lockto close the socket, whilecurrent_session_runnermay be insidesock.recv()under the same lock. Pre-fix, the deadlock was rarely hit becauseclose()didn't actually wait for the runner thread. Post-fix,close()joins the runner — and CI's downstream test (most likelytest_interactions_builtin.test_interactions, which connects and then callsclient.close()) hangs.The deadlock itself is outside the scope of this one-line leak fix. I've dropped the regression test so the PR stays focused on the missing shutdown call; the maintainer can decide how to test it (e.g. after fixing the deadlock, or with mocked runners).
Verification
Requirements