feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message by iceteaSA · Pull Request #35400 · anomalyco/opencode

iceteaSA · 2026-07-05T09:56:15Z

Issue for this PR

Related to #19215 (agent-team coordination) — the wake-on-message piece. The other three features are new task-tool capabilities without a tracking issue.

Stacked PR. This sits on top of #32693 (session-to-session), which sits on #32517 → #32425 → #32192. Against dev the diff shows a large commit/file count, but only the top ~14 commits (task-tool + tests) are new here — everything below is the #32693 stack. Review against #32693, or wait for the base stack to land. Of the four features, only wake-on-message depends on the base stack (it uses the Messaging service from #32192/#32517); task_return, terse completion, and sparse context are self-contained and would apply on dev directly.

Type of change

New feature

What does this PR do?

Four controls over what flows in and out of a Task dispatch. All default to today's behavior.

task_return tool — structured child results. A subagent calls task_return({ result: {...} }) to set a free-form JSON object (≤4KB) on its own session row; it lands in the completion frame and on a new optional result field of the task.completed event. This is the output symmetry to the metadata input param — reviewers/coordinators can read a machine result instead of regex-parsing a "VERDICT:" line out of prose. Oversize is a model-visible tool error, not a silent truncation. The result also renders in the tool call itself, not only the completion frame.
Terse completion frames — completion: "full" | "terse". A background subagent's full completion body is injected verbatim into the parent, which rewrites the parent's cache tail every completion — on orchestration-heavy runs that's dozens of 2–6K-token cache busts. Terse mode replaces the frame body with a digest: the structured task_return result, the last 500 chars of the final message (verdict lines live at the tail), and a pointer to the child session for the full output. The parent can always resume the child to read the rest.
Sparse dispatch context — context: "full" | "sparse". A general-style dispatch pays ~35K first-turn input tokens; ~12–14K of it is instructional payload a scoped child never uses. Sparse keeps the project AGENTS.md chain, the agent prompt, tool schemas, and env; it drops global instruction files, the skills block, and the MCP-instructions block, and skips assembling them rather than discarding the output. Plugin system-prompt transforms still run (they're post-assembly), so plugins keep their say.
Wake-on-message — wake_on_message: true. Today a sibling/coordinator message to an idle child sits undelivered until something else wakes it. With this flag, an idle child is woken to drain its inbox when a message lands. Opt-in per dispatch (an always-on wake reintroduces interrupt storms), budgeted at 5 wakes per run (refreshed when the parent resumes the child), and never wakes a child whose ancestor session chain is gone. The inbox drops an identical sender+body redelivery within a bounded window before it can re-wake, so distinct messages each wake but a duplicate does not.

Terse and sparse both read from a new task config family (task.completion / task.context) with precedence global config → project config → agent frontmatter → dispatch param, so an orchestration-heavy setup sets the default once and overrides per dispatch.

How did you verify your code works?

Red-first throughout.

Full suite green: bun test across test/tool/ test/session/ test/messaging/ test/s2s/ — 795 pass / 0 fail; typecheck clean on opencode, core, schema, tui. Two migrations (result, context_mode), each column mapped through all six parallel session-row sites (V1 fromRow/toRow, projector, V1 SessionInfo, V2 fromRow, V2 Info) — a missed site is silent data loss, and one was caught on the V2 read path in review before it shipped.
Wake-on-message — the instance-context trap. The first implementation forked the wake via a bare Effect.runFork, which runs against the default runtime and does not carry the per-request InstanceRef the Messaging/SessionStatus services resolve through — so prompt.loop would die on a missing-InstanceRef defect in production while every spy-handler unit test passed green. A live probe (child-ref= MISSING) confirmed it. The fix captures the instance context with attach(...).pipe(Effect.forkIn(scope)) (the pattern the s2s poller uses) and adds an integration test that stands up the real SessionPrompt layer and drives a real prompt.loop drain — it goes red if either the registration or the context-carrying fork is reverted.
Mutation-checked seams: the terse-frame branch, the sparse skills/mcp-drop branch, and the wake idle-predicate each fail their test when reverted.

Live measurements (rebuilt binary). End to end on real dispatches:

Sparse context: a general dispatch assembled 38,266 tokens on full vs 26,524 on sparse — ~31% off, matching the predicted ~12K drop (global instructions + skills + MCP blocks). context_mode round-trips the session row, so a sparse child stays sparse across resume.
Terse completion: an A/B on the same child confirmed the terse frame drops the early body and keeps the last 500 chars + the session pointer. Across 111 real completion frames this session, terse would save ~7.1K tokens / ~35% of frame volume — concentrated, though: the average frame was only 183 tokens (small returns dominate) and a single 5,967-token frame was ~5.8K of that total. Terse pays off on the occasional large return, not uniformly per call.
task_return: a live dispatch set a structured result that persisted to the session.result column ({"verdict":"ok",...}) and rendered in both the completion frame and the tool call.
Wake-on-message: a background child was prompted once, went idle, and a sibling message to its inbox woke it with no resume from the parent — the database shows three assistant turns on that child and it echoed the exact message body back. The wake fires on the enqueue.

Independent cross-process corroboration. An independent OpenCode configuration deployment running as a separate process verified several of these against this feature branch, from the other side:

A downstream consumer plugin (33/33 tests, typecheck clean) reads the parsed task.completed result field. In one run a reviewer emitted only a task_return — no prose "VERDICT:" line — and the plugin still wrote a correct scored row (REJECT / must=3 / should=1) computed purely from the structured result, with no regex text to fall back on. The same task_return object appeared in all three sinks — completion frame, session.result column, and the parsed event field — and the enrichment payload was well-formed with no off-contract fields (small sample).
An independent sparse A/B on a different agent measured ~22% assembled-context reduction (76,937 → 59,769) — the same order as the 31% above, on a different process. It also surfaced a caveat: sparse cold-starts the prompt cache. The first sparse dispatch pays its input fresh (28,153 tokens vs a warm full dispatch's 265) and only collapses to ~120 on an identical second sparse call, so sparse's saving is amortization-gated — order-of-ten dispatches to break even at these cache rates. Tool registration survived sparse (a sparse child still discovered and used a custom tool whose doc was dropped).
An independent wake round-trip extends the single observation above. A worker dispatched with wake_on_message: true returned to completed, then a sibling coordinator sent four distinct work items fire-and-forget. The worker woke four times and processed all four in send order — including two sent in the same second, delivered ordered with none dropped — reaching 13 assistant turns on a child prompted once. Its own message-sequence counter ran 1→4 across the wakes, so session context persisted between wakes rather than resetting per delivery, and the worker collected results to a file the parent read rather than messaging back.

Limits: terse frame-token savings were measured only here, not independently reproduced. The numbers above are assembled-token and frame-volume counts, not a direct measurement of cache-tail-rewrite cost on a long-lived parent, which the tool-test harness can't observe. Duplicate delivery was exercised only through the enqueue dedup (identical sender+body within the window); a duplicate outside the window, or two distinct items with identical body text, is not covered — a concern only for mutating work modes.

Screenshots / recordings

Not a UI change of note (the task_return result now shows in the tool call output; no new UI surface).

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR (the base-stack commits in the diff are the declared feat(opencode): session-to-session messaging — communicate between two running sessions #32693 dependency — see the note at the top)

Experimental capability for a parent agent or human operator to steer, gracefully cancel, or hard-abort a specific running Task subagent mid-run, without affecting the parent or sibling subagents. Core: - Interrupt service (session/interrupt.ts): process-local registry holding one pending interrupt per child plus a terminal record; steer/cancel frame renderers and a visible-marker renderer, both with origin attribution (user vs parent); reason length-capped and XML-escaped at every sink (frames AND the visible marker). - The child consumes pending interrupts at the runLoop turn boundary: steer injects a <steer> frame and a visible "Steered by ..." marker and continues; cancel injects <cancel> + a visible marker, records a terminal, and force-breaks within a grace window. abortChild writes a visible "Aborted by ..." marker (model/agent derived from the child's latest user message), records a terminal, and cancels the BackgroundJob. Agent tools (gated by permission.interrupt): - task_steer / task_cancel / task_abort (origin=parent). Human paths: - POST /session/:id/interrupt (intent steer|cancel|abort, origin=user), restricted to subagent sessions, gated by the experimental flag, and rejecting non-running children. - TUI: esc on a subagent opens a Steer/Cancel/Abort menu, then a reason prompt; markers render as "... by user". Bound at the session route via a uniquely-named gather bucket (the keymap gather() caches by name). Visible interrupt markers render as a distinct "Interrupt" line (tagged via part.metadata.interrupt), not as user prose. Whole feature gated by OPENCODE_EXPERIMENTAL_SUBAGENT_INTERRUPT (off by default): agent tools, HTTP endpoint, and TUI affordance. Limitations: agent-driven steer/cancel applies to background children only (a foreground child blocks the parent turn); cancel is boundary-soft (use task_abort / Abort for a child stuck in a long tool call).

…parent

…g, delivery errors)

…integration test

The sender-echo markers duplicated information already shown by the message tool call itself (✉ Sent to parent / ✉ Replied to subagent sat right under the visible tool call), and the subagent's "Reply from parent" marker was written twice — once by the parent's reply branch and again by the subagent's own send path. Keep only the incoming markers: the parent sees "✉ Message from subagent", the recipient subagent sees "✉ Reply from parent", each once. Drop the now-unused marker direction field.

…r before adding inbox markers

…tree-cap

…dation, fire-and-forget only

…skip-on-cancel)

…ow + gating verification

…ranch comment

experimentalS2S runtime flag; s2s_inbox/s2s_token/s2s_allow tables (hand-written migration + Drizzle s2s.sql.ts mirror so fresh-DB CREATE and upgrade paths agree); session_slug_unique migration neutralized to DROP INDEX (slugs are not unique). Store is one statement per method: atomic single-winner claims via UPDATE…RETURNING with drained_at IS NULL / accepted_by IS NULL guards, TTL enforced in the claimToken WHERE clause, and deleteInbox so a delivered row is hard-deleted (distinct from a merely-claimed crashed row). S2SCapsule v1 envelope with forward/back-compat serde and optional sender_name. UUIDv7 generator.

…up wiring Per-instance wake poller (C′) lazily forked from SessionPrompt.loop via attach so it captures the live fiber's InstanceRef; runLoop turn-boundary drain (D) of s2s_inbox in-context; 60s reaper that reopens ONLY crashed claims (delivered rows are deleted). LayerNode.group exposes only DIRECT children, so S2SStore/Messaging/SessionStatus are spliced as direct members of every prompt-serving group (app httpapi + control-plane workspace) — this is what made cross-process delivery actually work. marker.ts: shared Marker.render + escapeAttr (escapes " ' for untrusted attribute values so a peer cannot break out of the <external-context> name=/session= attributes); escape() for element content/visible markers. Slug-decoupled: Messaging.enqueue lazily inits the inbox queue and registerSlug is dropped from the loop, so s2s rides session_id only and the slug registry stays coordinator-messaging-owned. s2s frames carry the sender session name + addressable session_id.

s2s tool (invite/accept/msg/leave/relay) gated behind experimentalS2S: single-use 10-min invite tokens, durable bidirectional s2s_allow consent, peers addressed by globally-unique session_id (accept reports the inviter's id). Same-process sends hit the in-process inbox; cross-process persist to s2s_inbox for the recipient's poller. Outbound 50/hr is a SOFT per-process throttle (documented as such in code + s2s.txt); the durable cross-process bound is the recipient INBOX_CAP (exact now that delivered rows are deleted). Registry wires S2SStore into ToolRegistry; message tool gains peer-slug send (message_allow). TUI renders the ✉ inbox marker (session name + id) and the session-list surface.

…s on session deletion

…eted events for comms dashboard (cherry picked from commit 939ffdd61748fed5ae41429be9d0b80e9ea3992a)

- interrupt.ts: remove defaultLayer (deleted at dev), fix node to object form - interrupt.test.ts: migrate from EventV2Bridge.defaultLayer to LayerNode.compile - task-interrupt.test.ts: migrate from Layer.mergeAll(defaultLayer) to LayerNode group/compile - task.test.ts: add Interrupt.node to test group (registry gained the dep)

- Rewrite LayerNode.make positional→object form in poller/store - Add defaultLayer re-exports (raw layer) to 37 modules - Export layer variable for modules used via .layer access - Rewrite coordinator-messaging + s2s tests to LayerNode.group pattern - Switch tests to testEffectShared for Database memoMap sharing - Add NodePath to runLoopInfra for CrossSpawnSpawner deps - Create task-event.ts schema + manifest registration - Regenerate SDK types (task.completed, messaging.peer_sent, s2s.delivered) - Fix topology-repro.test.ts positional LayerNode.make + buildLayer→compile

…ator suites SessionProjector.node added to the s2s/coordinator runLoop harnesses (session readback needs projection at current dev), the reaper harness gets an explicit EventV2 layer, the poller runLoop shares the file-level :memory: database via node replacement, and the fork-fiber-sensitive suites (poller, coordinator runLoop) build per-test layers (testEffect) instead of a shared memoMap build so forked poller/drain fibers see the same instances as the test assertions.

Drops the defaultLayer = layer re-exports from 28 modules nothing consumes; the 8 that remain (Agent, Config, Session, Truncate, Messaging, S2SStore, EventV2Bridge, CrossSpawnSpawner) back the three s2s test harnesses that still compose with Layer.mergeAll. Follow-up: convert those harnesses to LayerNode and delete the bridge entirely.

Add optional slug, agent, model, variant, elapsedMs, tokens, and cost fields to the task.completed event. A shared completedPayload helper reads session metadata and sums tokens/cost from child assistant messages, used at all 9 publish sites instead of duplicated logic.

Wrap completedPayload assembly in Effect.exit so any defect during session/message reads falls back to the base payload instead of killing the publish path. Use optional chaining and nullish defaults for token/cost field access to tolerate missing or malformed assistant rows. Rewrite test to observe the actual published event via Deferred + listen instead of relying on message persistence alone.

Three blocking bugs in wake-on-message (97ae2bd), all in the instance-context/fork wiring: 1. registerWakeHandler was called at layer-build time without yield*, so the Effect was constructed and discarded. Moved the (now yield*ed) registration into loop()'s body, which always runs inside a fiber that has InstanceRef from the caller (HTTP request / CLI run) — mirroring how the C-prime wake-poller is wired in the same function. Re-registering on every loop() call is idempotent. 2. Messaging.wakeIfIdle forked the handler via bare Effect.runFork, which starts a fresh top-level runtime with an empty context and drops InstanceRef, so the forked prompt.loop died the instant it touched any InstanceState-scoped service. Replaced both call sites with attach(handler(target)).pipe(Effect.forkIn(scope)) — attach re-provides the caller fiber's InstanceRef/WorkspaceRef before the fork runs, and scope is a new layer-lifetime Scope so the fiber doesn't leak past Messaging's lifetime. 3. wake.test.ts only exercised spy handlers against a minimal Messaging-only layer — it never ran the real registration line or a real prompt.loop, so 786 tests passed over an entirely dead feature. Added wake-real-path.test.ts, which builds the full run-loop layer, drives a real prompt.loop, sets a real wake policy, and enqueues through the real Messaging.enqueue; it asserts the drain's observable side effect (the ✉ inbox marker) appears without a second explicit loop() call. Verified red/green against both bugs individually before landing the fix.

Pre-existing base drift, not introduced by task-signals: the hardcoded Latest.size assertion was already failing at base 11e2b8a (Received 97, Expected 88) — the fork's accumulated messaging/interrupt/s2s/task events grew the manifest without updating this constant. Empirically verified identical size (97) at base and HEAD. Fixed here so the integration branches go green.

iceteaSA added 30 commits July 2, 2026 08:32

feat(core): add background-job message channel

bd20636

feat(opencode): add Messaging service for agent-to-agent replies

34d072f

fix(opencode): forward background-job message channel through wrapper

a0f0f9f

fix(opencode): make agent-messaging send/message interruption-safe

ce73a14

feat: add experimentalAgentMessaging flag and message permission key

e936543

feat(opencode): add message tool

27d4f42

feat(opencode): register message tool and yield subagent messages to …

31c1aab

…parent

fix(opencode): harden agent messaging (authz, send race, body escapin…

5af7506

…g, delivery errors)

fix(opencode): close send publish interruption window; add messaging …

0943acc

…integration test

chore(sdk): regenerate for agent messaging

5306e4e

feat(tui): visible transcript markers for agent messages

e275410

refactor(tui,session): extract one shared Marker.render({kind}) helpe…

ea05b93

…r before adding inbox markers

feat(messaging): slug→SessionID registry + per-child allow-list storage

53ff727

feat(messaging): per-session FIFO inbox with budget, cap, LRU dedup, …

10f49bf

…tree-cap

feat(messaging): bounded Inbox.await for coordinator fan-in

6b18f5c

feat(message): peer-slug sibling send with allow-list + parentID vali…

e278e42

…dation, fire-and-forget only

test(tool): refresh parameters snapshot for message_allow addition

c86c454

feat(session): drain coordinator inbox at runLoop boundary (batched, …

b8fad1b

…skip-on-cancel)

test(message): e2e collaborative-implementer coordinator-messaging fl…

56b1581

…ow + gating verification

docs(messaging,message): document awaitInbox bounded-behavior + fix b…

8954050

…ranch comment

fix(s2s): bounded SQLITE_BUSY retry on inbox writes + GC orphaned row…

930f0b3

…s on session deletion

feat(opencode): emit messaging.peer_sent + s2s.delivered + task.compl…

8538c92

…eted events for comms dashboard (cherry picked from commit 939ffdd61748fed5ae41429be9d0b80e9ea3992a)

fix: adapt agent-messaging to current dev APIs

83a88ca

fix(opencode): repair message test layer composability for dev APIs

b50ed6d

iceteaSA added 23 commits July 2, 2026 08:59

refactor(schema): move messaging events to the event manifest

ada1e95

refactor(schema): move interrupt events to the event manifest

4f33c64

chore(sdk): regenerate

27db15c

test(opencode): cover task.completed enrichment fallback

11e2b8a

feat(core): add result column to session

b2d7624

chore: generate

84e2d98

fix(core): read result on the v2 session path

0a4383c

feat(opencode): add task_return tool for structured child results

2f3508a

test(opencode): cover task_return oversize error is model-visible

c2fc8a0

feat(opencode): terse completion mode for task dispatches

a6dc248

feat(opencode): sparse dispatch context mode

3f5c9da

chore: generate

990d7a2

fix(opencode): test sparse assembly and skip skills/mcp work in sparse

cde10b8

feat(opencode): wake-on-message for idle task children

97ae2bd

chore(sdk): regenerate

d712253

feat(opencode): show task_return result in the tool call output

4283a72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message#35400

feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message#35400
iceteaSA wants to merge 53 commits into
anomalyco:devfrom
iceteaSA:task-signals

iceteaSA commented Jul 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

iceteaSA commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

iceteaSA commented Jul 5, 2026 •

edited

Loading