fix(opencode): robust process exit detection for child processes#15757
fix(opencode): robust process exit detection for child processes#15757NachoFLizaur wants to merge 2 commits into
Conversation
|
The following comment was made by an LLM, it may be inaccurate: Based on my searches, I found one potentially related PR: Related PR Found:
All other search results only returned the current PR (#15757) itself. The searches did not find any open PRs that are direct duplicates of this process exit detection fix. PR #13186 is adjacent in addressing process-related issues but focuses on memory leaks and cleanup handlers rather than the specific exit detection problem. |
|
fixed description according to template |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
f36ae38 to
9da397e
Compare
43f75bc to
e4b57cf
Compare
Port robust process exit detection from PR anomalyco#15757 to fix zombie/stuck child processes in containers where Bun fails to deliver exit events. - Add polling watchdog to bash tool and Process.spawn that detects process exit via kill(pid, 0) when event-loop events are missed - Add process registry (active map) with stale/reap exports for server-level watchdog to detect and clean up stuck bash processes - Improve Shell.killTree with alive() helper and proper SIGKILL escalation after SIGTERM timeout - Add session-level watchdog interval in prompt loop to periodically reap stale bash processes Based on the work in anomalyco#15757. Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Complete the port of PR anomalyco#15757 with remaining pieces: - Add stdio end event redundancy as third fallback for exit detection (fires when pipe file descriptors close, independent of exit events) - Add diagnostic log.info calls at spawn, abort, timeout, and each exit detection path for debugging container issues - Add comprehensive tests: defensive patterns, polling watchdog isolation, Shell.killTree, server-level watchdog (stale/reap), stdio end events, and Process.spawn defensive patterns - Skip truncation tests on Windows (matching upstream) Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Port robust process exit detection from PR anomalyco#15757 to fix zombie/stuck child processes in containers where Bun fails to deliver exit events. - Add polling watchdog to bash tool and Process.spawn that detects process exit via kill(pid, 0) when event-loop events are missed - Add process registry (active map) with stale/reap exports for server-level watchdog to detect and clean up stuck bash processes - Improve Shell.killTree with alive() helper and proper SIGKILL escalation after SIGTERM timeout - Add session-level watchdog interval in prompt loop to periodically reap stale bash processes Based on the work in anomalyco#15757. Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Complete the port of PR anomalyco#15757 with remaining pieces: - Add stdio end event redundancy as third fallback for exit detection (fires when pipe file descriptors close, independent of exit events) - Add diagnostic log.info calls at spawn, abort, timeout, and each exit detection path for debugging container issues - Add comprehensive tests: defensive patterns, polling watchdog isolation, Shell.killTree, server-level watchdog (stale/reap), stdio end events, and Process.spawn defensive patterns - Skip truncation tests on Windows (matching upstream) Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Port robust process exit detection from PR anomalyco#15757 to fix zombie/stuck child processes in containers where Bun fails to deliver exit events. - Add polling watchdog to bash tool and Process.spawn that detects process exit via kill(pid, 0) when event-loop events are missed - Add process registry (active map) with stale/reap exports for server-level watchdog to detect and clean up stuck bash processes - Improve Shell.killTree with alive() helper and proper SIGKILL escalation after SIGTERM timeout - Add session-level watchdog interval in prompt loop to periodically reap stale bash processes Based on the work in anomalyco#15757. Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Complete the port of PR anomalyco#15757 with remaining pieces: - Add stdio end event redundancy as third fallback for exit detection (fires when pipe file descriptors close, independent of exit events) - Add diagnostic log.info calls at spawn, abort, timeout, and each exit detection path for debugging container issues - Add comprehensive tests: defensive patterns, polling watchdog isolation, Shell.killTree, server-level watchdog (stale/reap), stdio end events, and Process.spawn defensive patterns - Skip truncation tests on Windows (matching upstream) Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Port robust process exit detection from PR anomalyco#15757 to fix zombie/stuck child processes in containers where Bun fails to deliver exit events. - Add polling watchdog to bash tool and Process.spawn that detects process exit via kill(pid, 0) when event-loop events are missed - Add process registry (active map) with stale/reap exports for server-level watchdog to detect and clean up stuck bash processes - Improve Shell.killTree with alive() helper and proper SIGKILL escalation after SIGTERM timeout - Add session-level watchdog interval in prompt loop to periodically reap stale bash processes Based on the work in anomalyco#15757. Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Complete the port of PR anomalyco#15757 with remaining pieces: - Add stdio end event redundancy as third fallback for exit detection (fires when pipe file descriptors close, independent of exit events) - Add diagnostic log.info calls at spawn, abort, timeout, and each exit detection path for debugging container issues - Add comprehensive tests: defensive patterns, polling watchdog isolation, Shell.killTree, server-level watchdog (stale/reap), stdio end events, and Process.spawn defensive patterns - Skip truncation tests on Windows (matching upstream) Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
09089e5 to
d603698
Compare
e3df5ee to
9bc816b
Compare
|
Automated PR Cleanup Thank you for contributing to opencode. Due to the high volume of PRs from users and AI agents, we periodically close older PRs using automated criteria so maintainers can focus review time on the most active and community-supported contributions. This PR was closed because it matched the following cleanup criteria:
PRs created within the last month are not affected by this cleanup. If you believe this PR was closed incorrectly, or if you are still actively working on it, please leave a comment explaining why it should be reopened. A maintainer can review and reopen it if appropriate. Thanks again for taking the time to contribute. |
Issue for this PR
Closes #14769
Possible related issue that might get fixed by this: #15675 (it seems related based on the description)
Type of change
What does this PR do?
The bash tool and
Process.spawnrely solely on Bun'sproc.once("exit")to detect when a child process finishes. In containerized environments, this event is not reliably delivered (due to Bun), causing the completion promise to hang forever, tool calls get permanently stuck atstatus=running.The fix removes the runtime dependency on Bun for process exit detection, replacing it with a multi-layer strategy that uses OS-level primitives:
closeevent: independent signal path fromexitin Bun's internalsproc.exitCodeandprocess.kill(pid, 0)to detect dead processes via OS-level signals, completely bypassing Bun's event systemprompt.ts): external safety net that force-completes processes stuck past their timeoutAll detection paths funnel through a single
done()function guarded by aresolvedboolean, ensuring exactly-once Promise resolution.Also improves
Shell.killTree()with analive()check before escalating from SIGTERM to SIGKILL.How did you verify your code works?
Polling watchdog isolation tests directly prove exit detection works WITHOUT exit/close events firing (simulates the exact Bun failure mode).
Regression tests verify normal bash tool execution is unaffected by the additional detection layers. Shell.killTree tests verify SIGTERM->alive check->SIGKILL escalation.
I also tested it in containerized opencode serve mode, bash tool calls now complete reliably instead of hanging.
Screenshots / recordings
No UI changes.
Checklist