Skip to content

fix: improve background process handling for agent tools#23132

Merged
kylecarbs merged 1 commit into
mainfrom
fix/agent-background-process-handling
Mar 16, 2026
Merged

fix: improve background process handling for agent tools#23132
kylecarbs merged 1 commit into
mainfrom
fix/agent-background-process-handling

Conversation

@kylecarbs
Copy link
Copy Markdown
Member

Problem

Models frequently use shell & instead of run_in_background=true when starting long-running processes through /agents, causing them to die shortly after starting. This happens because:

  1. No guidance in tool schema — The ExecuteArgs struct had zero description tags. The model saw run_in_background: boolean (optional) with no explanation of when/why to use it.
  2. Shell & is silently brokensh -c "command &" forks the process, the shell exits immediately, and the forked child becomes an orphan not tracked by the process manager.
  3. No process group isolation — The SSH subsystem sets Setsid: true on spawned processes, but the agent process manager set no SysProcAttr at all. Signals only hit the top-level sh, not child processes.

Investigation

Compared our implementation against openai/codex and coder/mux:

Aspect codex mux coder/coder (before)
Background flag Yield/resume with session_id run_in_background with rich description run_in_background with no description
& handling setsid() + killpg() detached: true + killProcessTree() Nothing — orphaned children escape
Process isolation setsid() on every spawn set -m; nohup ... setsid for background No SysProcAttr at all
Signal delivery killpg(pgid, sig) — entire group kill -15 -\$pid — negative PID proc.cmd.Process.Signal()PID only

Changes

Fix 1: Add descriptions to ExecuteArgs (highest impact)

The model now sees explicit guidance: "Use for long-running processes like dev servers, file watchers, or builds. Do NOT use shell & — it will not work correctly."

Fix 2: Update tool description

The top-level execute tool description now reinforces: "Use run_in_background=true for long-running processes. Never use shell '&' for backgrounding."

Fix 3: Detect trailing & and auto-promote to background

Defense-in-depth: if the model still uses command &, we strip the & and promote to run_in_background=true automatically. Correctly distinguishes & from &&.

Fix 4: Process group isolation (Setpgid)

New platform-specific files (proc_other.go / proc_windows.go) following the same pattern as agentssh/exec_other.go. Every spawned process gets its own process group.

Fix 5: Process group signaling

signal() now uses syscall.Kill(-pid, sig) on Unix to signal the entire process group, ensuring child processes from shell pipelines are also cleaned up.

Testing

All existing agent/agentproc tests pass. Both packages compile cleanly.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 16, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Models frequently use shell '&' instead of run_in_background=true,
causing processes to fork and exit immediately as untracked orphans.
This stems from two issues: no guidance in tool descriptions, and
no process group isolation in the agent process manager.

Changes:

1. Add description tags to ExecuteArgs so the model sees explicit
   guidance on when to use run_in_background and why shell '&' is
   broken.

2. Update the execute tool description to reinforce this guidance.

3. Detect trailing '&' in commands and auto-promote to background
   mode, stripping the '&'. This is defense-in-depth for when the
   model ignores the description.

4. Add process group isolation (Setpgid) to spawned processes via
   platform-specific files (proc_other.go / proc_windows.go),
   matching what the SSH subsystem already does with Setsid.

5. Signal the process group (-pid) instead of just the PID when
   delivering kill/terminate signals, ensuring child processes
   (e.g. from shell pipelines) are also cleaned up.
@kylecarbs kylecarbs force-pushed the fix/agent-background-process-handling branch from 5d9769f to edb20e6 Compare March 16, 2026 17:49
@kylecarbs kylecarbs merged commit 6972d07 into main Mar 16, 2026
25 checks passed
@kylecarbs kylecarbs deleted the fix/agent-background-process-handling branch March 16, 2026 20:22
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants