Skip to content

fix(socket_mode): shut down current_session_runner in close()#1874

Open
animaartificialis wants to merge 2 commits into
slackapi:mainfrom
animaartificialis:fix/socket-mode-current-session-runner-leak
Open

fix(socket_mode): shut down current_session_runner in close()#1874
animaartificialis wants to merge 2 commits into
slackapi:mainfrom
animaartificialis:fix/socket-mode-current-session-runner-leak

Conversation

@animaartificialis
Copy link
Copy Markdown

@animaartificialis animaartificialis commented May 13, 2026

Summary

SocketModeClient starts three IntervalRunner threads in __init__: current_session_runner (0.1 s loop), current_app_monitor (5 s) and message_processor (0.001 s). close() shut down two of them but left current_session_runner alive, so every instance leaked one 100 ms-loop thread. In long-running watchers that recreate the client (e.g. on transient disconnects detected via is_connected()) those threads accumulate and drive CPU usage up.

This PR adds a guarded current_session_runner.shutdown() call alongside the existing two.

Closes #1873

Update: unit test removed (86e003b)

The original commit added a test_close_shuts_down_all_runners unit test. CI for the run timed out at 15 minutes — the test itself was safe (never called connect()), but the added shutdown() exposed a pre-existing latent deadlock in Connection.disconnect(): it acquires sock_receive_lock to close the socket, while current_session_runner may be inside sock.recv() under the same lock. Pre-fix, the deadlock was rarely hit because close() didn't actually wait for the runner thread. Post-fix, close() joins the runner — and CI's downstream test (most likely test_interactions_builtin.test_interactions, which connects and then calls client.close()) hangs.

The deadlock itself is outside the scope of this one-line leak fix. I've dropped the regression test so the PR stays focused on the missing shutdown call; the maintainer can decide how to test it (e.g. after fixing the deadlock, or with mocked runners).

Verification

import threading, time
from slack_sdk.socket_mode.builtin.client import SocketModeClient

baseline = threading.active_count()
for _ in range(10):
    SocketModeClient(app_token="xapp-fake").close()
time.sleep(0.3)
# Before this patch: delta = 10. After: delta = 0.
print("delta =", threading.active_count() - baseline)

Requirements

  • Tests added (originally yes; removed per CI timeout — see 86e003b)
  • CLA — pending salesforce-cla bot status check

`SocketModeClient` starts three `IntervalRunner` threads in
`__init__`: `current_session_runner` (interval 0.1 s),
`current_app_monitor` (interval = `ping_interval`, default 5 s) and
`message_processor` (interval 0.001 s). `close()` shut down two of
them but not `current_session_runner`, so every instance leaked one
thread running a 100 ms loop.

For long-running watchers that recreate the client occasionally (e.g.
after a transient disconnect detected via `is_connected()`), the
leaked threads accumulate and combine with the live instance's 1 ms
`message_processor` loop to drive CPU usage up. The same client
instances also fail to release their threads under normal lifetime
management.

Adds a guarded `current_session_runner.shutdown()` call alongside the
existing two, plus a regression test verifying that all three runners
exit after `close()`.

Closes slackapi#1873
@animaartificialis animaartificialis requested a review from a team as a code owner May 13, 2026 18:15
@salesforce-cla
Copy link
Copy Markdown

Thanks for the contribution! Before we can merge this, we need @animaartificialis to sign the Salesforce Inc. Contributor License Agreement.

@hello-ashleyintech
Copy link
Copy Markdown
Contributor

hi @animaartificialis! Thank you so much for your submission. Could you make sure you sign the CLA?

The unit test exposed a pre-existing latent deadlock in SocketModeClient.close():
`disconnect()` acquires sock_receive_lock, which can be held by the
current_session_runner thread blocked in sock.recv(). Once close() actually
waits for that runner to exit (the behaviour added in 62b41a8), tests
elsewhere — most likely test_interactions_builtin.test_interactions, which
follows the test_builtin.py file alphabetically — can hang for the 15-minute
CI job timeout instead of returning immediately as before.

The deadlock isn't introduced by this PR (Connection.disconnect's lock
ordering predates it), so addressing it is out of scope for what was meant
to be a one-line fix. Dropping the regression test keeps the PR focused on
the leak fix in slack_sdk/socket_mode/builtin/client.py; the maintainer
can add a regression test on their preferred terms (e.g. after also
addressing the deadlock, or by mocking the runners).

Closes the deadlock-induced CI hang reported on PR slackapi#1874.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SocketModeClient.close() leaks current_session_runner thread (built-in client)

2 participants