feat: improve sub-agent orchestration tools by mafredri · Pull Request #26673 · coder/coder

mafredri · 2026-06-24T18:42:09Z

The orchestration tools (spawn_agent, wait_agent, message_agent, close_agent) work, but they communicate badly, so the orchestrator acts on the wrong story and abandons work that is still running. This is measured, not hypothesized: in the personal-agent chat snapshots ~23% of wait_agent calls time out and ~10% of delegated agents are never cleanly collected. A retry-disciplined harness orphans ~0% with the same descriptions, which puts the fix at the tool response and the missing guidance, not the description alone.

What this does:

list_agents (new, root-only): paginated (limit/offset, default 10, total/has_more), most-recently-active first, archived excluded. Lets the orchestrator recover its fleet after a compaction drops the spawned chat_ids. Available in plan mode since it is read-only.
wait_agent payloads: on timeout it returns an informational (non-error) payload with status, timed_out, and retry guidance instead of a bare error; on error status it returns a structured, recoverable-aware payload (last_error, report, guidance) so transient failures get resumed via message_agent rather than read as terminal. The recording-on-timeout behavior is unchanged.
close_agent to interrupt_agent: matches the codebase vocabulary (InterruptChat, ErrInterrupted, StatusInterrupted) and stops implying destruction; the response returns "interrupted" instead of "terminated". A hidden close_agent alias (ToolNameAliases on ExecuteLocalToolsOptions, resolved once in executeSingleTool) keeps old histories dispatching without advertising the old name.
Descriptions: message_agent now explains queue-by-default and interrupt: true; wait_agent and spawn_agent explain that agents persist and can be reused.
Hygiene: a <subagent-orchestration> section in the system prompt and a sentence on spawn_agent so spawned agents are not abandoned in a working state.
Frontend: renders interrupt_agent and list_agents tool calls; close_agent is kept for rendering existing history.

The backend chatd package and the touched frontend tests pass; gofmt, go vet, and the full pre-commit (gen/fmt/lint/build) are green. One pre-existing Storybook story (MCP Tool Completed) fails independently of this change and touches no files here.

Implementation plan and key decisions

Five slices: (1) rename + hidden alias + response field, (2) description rewrites and the wait_agent timeout/error payloads, (3) list_agents backend, (4) frontend descriptor, (5) hygiene guidance.

Notable decisions:

D9 (timeout payload): the give-up is decided against the response, not the description, so the timeout returns a status-carrying informational payload, not an error.
D12 (error payload): an errored, non-archived agent resumes when messaged, so surface last_error and let the orchestrator judge recoverability rather than auto-classifying.
D11 (list_agents shape): cap with limit/offset like read_file, fixed updated_at DESC sort with an id tiebreak, no order_by (no built-in tool exposes a sort). Sorting and paging happen in the handler so the shared GetChildChatsByParentIDs query (used by the chats sidebar) is untouched.
D7 (alias): included. Without it a stray close_agent would only cost one self-correcting step, but the mechanism is a single localized field, so old histories dispatch cleanly.

Anchors in the source plan drifted against coder/coder HEAD; line numbers were re-confirmed before editing. Two siblings the plan did not name were also updated: chatprompt.isSubagentLifecycleToolName and the plan-mode help text in subagent_catalog.go. TestWaitAgentTimeoutLeavesRecordingRunning encoded the old timeout-as-error behavior and was updated to the new contract.

The full deep-plan, decision log, and product analysis live in a personal, gitignored repo and are not linked here.

Implements CODAGT-512.

🤖 This PR was created with the help of Coder Agents, and will be reviewed by a human. 🏂🏻

The orchestration tools work, but they tell the orchestrator the wrong story, so it acts on the misframing. wait_agent says "final response" and returns a bare error on timeout, which reads as failure rather than "still working." spawn_agent frames a spawn-wait-done lifecycle, so the orchestrator never learns agents persist and can be reused. close_agent sounds like it destroys when it only interrupts, message_agent hides its queue and interrupt behavior, and there is no way to list spawned agents to recover the fleet after a compaction. This is measured, not hypothesized: in the personal-agent chat snapshots ~23% of wait_agent calls time out and ~10% of delegated agents are never cleanly collected. A retry-disciplined harness orphans ~0% with the same descriptions, so the give-up is decided against the tool response and the missing hygiene guidance, not against the description. The fixes land where the decision is made. wait_agent returns an informational timeout payload and a recoverable-aware error payload instead of a bare error. A new list_agents tool recovers the fleet. close_agent becomes interrupt_agent, with a hidden alias so old histories still dispatch. The descriptions and a <subagent-orchestration> section in the system prompt teach persistence, queuing, and not abandoning a working agent. Implements CODAGT-512.

linear-code · 2026-06-24T18:42:13Z

CODAGT-512

github-actions Bot assigned mafredri Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve sub-agent orchestration tools#26673

feat: improve sub-agent orchestration tools#26673
mafredri wants to merge 1 commit into
mainfrom
mathias/codagt-512-improve-sub-agent-orchestration-tools-list_agents-rename

mafredri commented Jun 24, 2026

Uh oh!

linear-code Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mafredri commented Jun 24, 2026

Uh oh!

linear-code Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant