fix: improve background process handling for agent tools#23132
Merged
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
Models frequently use shell '&' instead of run_in_background=true, causing processes to fork and exit immediately as untracked orphans. This stems from two issues: no guidance in tool descriptions, and no process group isolation in the agent process manager. Changes: 1. Add description tags to ExecuteArgs so the model sees explicit guidance on when to use run_in_background and why shell '&' is broken. 2. Update the execute tool description to reinforce this guidance. 3. Detect trailing '&' in commands and auto-promote to background mode, stripping the '&'. This is defense-in-depth for when the model ignores the description. 4. Add process group isolation (Setpgid) to spawned processes via platform-specific files (proc_other.go / proc_windows.go), matching what the SSH subsystem already does with Setsid. 5. Signal the process group (-pid) instead of just the PID when delivering kill/terminate signals, ensuring child processes (e.g. from shell pipelines) are also cleaned up.
5d9769f to
edb20e6
Compare
sreya
approved these changes
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Models frequently use shell
&instead ofrun_in_background=truewhen starting long-running processes through/agents, causing them to die shortly after starting. This happens because:ExecuteArgsstruct had zerodescriptiontags. The model sawrun_in_background: boolean (optional)with no explanation of when/why to use it.&is silently broken —sh -c "command &"forks the process, the shell exits immediately, and the forked child becomes an orphan not tracked by the process manager.Setsid: trueon spawned processes, but the agent process manager set noSysProcAttrat all. Signals only hit the top-levelsh, not child processes.Investigation
Compared our implementation against openai/codex and coder/mux:
session_idrun_in_backgroundwith rich descriptionrun_in_backgroundwith no description&handlingsetsid()+killpg()detached: true+killProcessTree()setsid()on every spawnset -m; nohup ... setsidfor backgroundSysProcAttrat allkillpg(pgid, sig)— entire groupkill -15 -\$pid— negative PIDproc.cmd.Process.Signal()— PID onlyChanges
Fix 1: Add descriptions to
ExecuteArgs(highest impact)The model now sees explicit guidance: "Use for long-running processes like dev servers, file watchers, or builds. Do NOT use shell & — it will not work correctly."
Fix 2: Update tool description
The top-level execute tool description now reinforces: "Use run_in_background=true for long-running processes. Never use shell '&' for backgrounding."
Fix 3: Detect trailing
&and auto-promote to backgroundDefense-in-depth: if the model still uses
command &, we strip the&and promote torun_in_background=trueautomatically. Correctly distinguishes&from&&.Fix 4: Process group isolation (
Setpgid)New platform-specific files (
proc_other.go/proc_windows.go) following the same pattern asagentssh/exec_other.go. Every spawned process gets its own process group.Fix 5: Process group signaling
signal()now usessyscall.Kill(-pid, sig)on Unix to signal the entire process group, ensuring child processes from shell pipelines are also cleaned up.Testing
All existing
agent/agentproctests pass. Both packages compile cleanly.