feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message#35400
Draft
iceteaSA wants to merge 53 commits into
Draft
feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message#35400iceteaSA wants to merge 53 commits into
iceteaSA wants to merge 53 commits into
Conversation
Experimental capability for a parent agent or human operator to steer, gracefully cancel, or hard-abort a specific running Task subagent mid-run, without affecting the parent or sibling subagents. Core: - Interrupt service (session/interrupt.ts): process-local registry holding one pending interrupt per child plus a terminal record; steer/cancel frame renderers and a visible-marker renderer, both with origin attribution (user vs parent); reason length-capped and XML-escaped at every sink (frames AND the visible marker). - The child consumes pending interrupts at the runLoop turn boundary: steer injects a <steer> frame and a visible "Steered by ..." marker and continues; cancel injects <cancel> + a visible marker, records a terminal, and force-breaks within a grace window. abortChild writes a visible "Aborted by ..." marker (model/agent derived from the child's latest user message), records a terminal, and cancels the BackgroundJob. Agent tools (gated by permission.interrupt): - task_steer / task_cancel / task_abort (origin=parent). Human paths: - POST /session/:id/interrupt (intent steer|cancel|abort, origin=user), restricted to subagent sessions, gated by the experimental flag, and rejecting non-running children. - TUI: esc on a subagent opens a Steer/Cancel/Abort menu, then a reason prompt; markers render as "... by user". Bound at the session route via a uniquely-named gather bucket (the keymap gather() caches by name). Visible interrupt markers render as a distinct "Interrupt" line (tagged via part.metadata.interrupt), not as user prose. Whole feature gated by OPENCODE_EXPERIMENTAL_SUBAGENT_INTERRUPT (off by default): agent tools, HTTP endpoint, and TUI affordance. Limitations: agent-driven steer/cancel applies to background children only (a foreground child blocks the parent turn); cancel is boundary-soft (use task_abort / Abort for a child stuck in a long tool call).
…g, delivery errors)
The sender-echo markers duplicated information already shown by the message tool call itself (✉ Sent to parent / ✉ Replied to subagent sat right under the visible tool call), and the subagent's "Reply from parent" marker was written twice — once by the parent's reply branch and again by the subagent's own send path. Keep only the incoming markers: the parent sees "✉ Message from subagent", the recipient subagent sees "✉ Reply from parent", each once. Drop the now-unused marker direction field.
…r before adding inbox markers
…dation, fire-and-forget only
…ow + gating verification
experimentalS2S runtime flag; s2s_inbox/s2s_token/s2s_allow tables (hand-written migration + Drizzle s2s.sql.ts mirror so fresh-DB CREATE and upgrade paths agree); session_slug_unique migration neutralized to DROP INDEX (slugs are not unique). Store is one statement per method: atomic single-winner claims via UPDATE…RETURNING with drained_at IS NULL / accepted_by IS NULL guards, TTL enforced in the claimToken WHERE clause, and deleteInbox so a delivered row is hard-deleted (distinct from a merely-claimed crashed row). S2SCapsule v1 envelope with forward/back-compat serde and optional sender_name. UUIDv7 generator.
…up wiring Per-instance wake poller (C′) lazily forked from SessionPrompt.loop via attach so it captures the live fiber's InstanceRef; runLoop turn-boundary drain (D) of s2s_inbox in-context; 60s reaper that reopens ONLY crashed claims (delivered rows are deleted). LayerNode.group exposes only DIRECT children, so S2SStore/Messaging/SessionStatus are spliced as direct members of every prompt-serving group (app httpapi + control-plane workspace) — this is what made cross-process delivery actually work. marker.ts: shared Marker.render + escapeAttr (escapes " ' for untrusted attribute values so a peer cannot break out of the <external-context> name=/session= attributes); escape() for element content/visible markers. Slug-decoupled: Messaging.enqueue lazily inits the inbox queue and registerSlug is dropped from the loop, so s2s rides session_id only and the slug registry stays coordinator-messaging-owned. s2s frames carry the sender session name + addressable session_id.
s2s tool (invite/accept/msg/leave/relay) gated behind experimentalS2S: single-use 10-min invite tokens, durable bidirectional s2s_allow consent, peers addressed by globally-unique session_id (accept reports the inviter's id). Same-process sends hit the in-process inbox; cross-process persist to s2s_inbox for the recipient's poller. Outbound 50/hr is a SOFT per-process throttle (documented as such in code + s2s.txt); the durable cross-process bound is the recipient INBOX_CAP (exact now that delivered rows are deleted). Registry wires S2SStore into ToolRegistry; message tool gains peer-slug send (message_allow). TUI renders the ✉ inbox marker (session name + id) and the session-list surface.
…s on session deletion
…eted events for comms dashboard (cherry picked from commit 939ffdd61748fed5ae41429be9d0b80e9ea3992a)
- interrupt.ts: remove defaultLayer (deleted at dev), fix node to object form - interrupt.test.ts: migrate from EventV2Bridge.defaultLayer to LayerNode.compile - task-interrupt.test.ts: migrate from Layer.mergeAll(defaultLayer) to LayerNode group/compile - task.test.ts: add Interrupt.node to test group (registry gained the dep)
- Rewrite LayerNode.make positional→object form in poller/store - Add defaultLayer re-exports (raw layer) to 37 modules - Export layer variable for modules used via .layer access - Rewrite coordinator-messaging + s2s tests to LayerNode.group pattern - Switch tests to testEffectShared for Database memoMap sharing - Add NodePath to runLoopInfra for CrossSpawnSpawner deps - Create task-event.ts schema + manifest registration - Regenerate SDK types (task.completed, messaging.peer_sent, s2s.delivered) - Fix topology-repro.test.ts positional LayerNode.make + buildLayer→compile
…ator suites SessionProjector.node added to the s2s/coordinator runLoop harnesses (session readback needs projection at current dev), the reaper harness gets an explicit EventV2 layer, the poller runLoop shares the file-level :memory: database via node replacement, and the fork-fiber-sensitive suites (poller, coordinator runLoop) build per-test layers (testEffect) instead of a shared memoMap build so forked poller/drain fibers see the same instances as the test assertions.
Drops the defaultLayer = layer re-exports from 28 modules nothing consumes; the 8 that remain (Agent, Config, Session, Truncate, Messaging, S2SStore, EventV2Bridge, CrossSpawnSpawner) back the three s2s test harnesses that still compose with Layer.mergeAll. Follow-up: convert those harnesses to LayerNode and delete the bridge entirely.
Add optional slug, agent, model, variant, elapsedMs, tokens, and cost fields to the task.completed event. A shared completedPayload helper reads session metadata and sums tokens/cost from child assistant messages, used at all 9 publish sites instead of duplicated logic.
Wrap completedPayload assembly in Effect.exit so any defect during session/message reads falls back to the base payload instead of killing the publish path. Use optional chaining and nullish defaults for token/cost field access to tolerate missing or malformed assistant rows. Rewrite test to observe the actual published event via Deferred + listen instead of relying on message persistence alone.
Three blocking bugs in wake-on-message (97ae2bd), all in the instance-context/fork wiring: 1. registerWakeHandler was called at layer-build time without yield*, so the Effect was constructed and discarded. Moved the (now yield*ed) registration into loop()'s body, which always runs inside a fiber that has InstanceRef from the caller (HTTP request / CLI run) — mirroring how the C-prime wake-poller is wired in the same function. Re-registering on every loop() call is idempotent. 2. Messaging.wakeIfIdle forked the handler via bare Effect.runFork, which starts a fresh top-level runtime with an empty context and drops InstanceRef, so the forked prompt.loop died the instant it touched any InstanceState-scoped service. Replaced both call sites with attach(handler(target)).pipe(Effect.forkIn(scope)) — attach re-provides the caller fiber's InstanceRef/WorkspaceRef before the fork runs, and scope is a new layer-lifetime Scope so the fiber doesn't leak past Messaging's lifetime. 3. wake.test.ts only exercised spy handlers against a minimal Messaging-only layer — it never ran the real registration line or a real prompt.loop, so 786 tests passed over an entirely dead feature. Added wake-real-path.test.ts, which builds the full run-loop layer, drives a real prompt.loop, sets a real wake policy, and enqueues through the real Messaging.enqueue; it asserts the drain's observable side effect (the ✉ inbox marker) appears without a second explicit loop() call. Verified red/green against both bugs individually before landing the fix.
Pre-existing base drift, not introduced by task-signals: the hardcoded Latest.size assertion was already failing at base 11e2b8a (Received 97, Expected 88) — the fork's accumulated messaging/interrupt/s2s/task events grew the manifest without updating this constant. Empirically verified identical size (97) at base and HEAD. Fixed here so the integration branches go green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Related to #19215 (agent-team coordination) — the wake-on-message piece. The other three features are new task-tool capabilities without a tracking issue.
Type of change
What does this PR do?
Four controls over what flows in and out of a Task dispatch. All default to today's behavior.
task_returntool — structured child results. A subagent callstask_return({ result: {...} })to set a free-form JSON object (≤4KB) on its own session row; it lands in the completion frame and on a new optionalresultfield of thetask.completedevent. This is the output symmetry to themetadatainput param — reviewers/coordinators can read a machine result instead of regex-parsing a "VERDICT:" line out of prose. Oversize is a model-visible tool error, not a silent truncation. The result also renders in the tool call itself, not only the completion frame.completion: "full" | "terse". A background subagent's full completion body is injected verbatim into the parent, which rewrites the parent's cache tail every completion — on orchestration-heavy runs that's dozens of 2–6K-token cache busts. Terse mode replaces the frame body with a digest: the structuredtask_returnresult, the last 500 chars of the final message (verdict lines live at the tail), and a pointer to the child session for the full output. The parent can always resume the child to read the rest.context: "full" | "sparse". Ageneral-style dispatch pays ~35K first-turn input tokens; ~12–14K of it is instructional payload a scoped child never uses. Sparse keeps the project AGENTS.md chain, the agent prompt, tool schemas, and env; it drops global instruction files, the skills block, and the MCP-instructions block, and skips assembling them rather than discarding the output. Plugin system-prompt transforms still run (they're post-assembly), so plugins keep their say.wake_on_message: true. Today a sibling/coordinator message to an idle child sits undelivered until something else wakes it. With this flag, an idle child is woken to drain its inbox when a message lands. Opt-in per dispatch (an always-on wake reintroduces interrupt storms), budgeted at 5 wakes per run (refreshed when the parent resumes the child), and never wakes a child whose ancestor session chain is gone. The inbox drops an identical sender+body redelivery within a bounded window before it can re-wake, so distinct messages each wake but a duplicate does not.Terse and sparse both read from a new
taskconfig family (task.completion/task.context) with precedence global config → project config → agent frontmatter → dispatch param, so an orchestration-heavy setup sets the default once and overrides per dispatch.How did you verify your code works?
Red-first throughout.
bun testacrosstest/tool/ test/session/ test/messaging/ test/s2s/— 795 pass / 0 fail; typecheck clean onopencode,core,schema,tui. Two migrations (result,context_mode), each column mapped through all six parallel session-row sites (V1 fromRow/toRow, projector, V1 SessionInfo, V2 fromRow, V2 Info) — a missed site is silent data loss, and one was caught on the V2 read path in review before it shipped.Effect.runFork, which runs against the default runtime and does not carry the per-requestInstanceReftheMessaging/SessionStatusservices resolve through — soprompt.loopwould die on a missing-InstanceRefdefect in production while every spy-handler unit test passed green. A live probe (child-ref= MISSING) confirmed it. The fix captures the instance context withattach(...).pipe(Effect.forkIn(scope))(the pattern the s2s poller uses) and adds an integration test that stands up the realSessionPromptlayer and drives a realprompt.loopdrain — it goes red if either the registration or the context-carrying fork is reverted.Live measurements (rebuilt binary). End to end on real dispatches:
generaldispatch assembled 38,266 tokens on full vs 26,524 on sparse — ~31% off, matching the predicted ~12K drop (global instructions + skills + MCP blocks).context_moderound-trips the session row, so a sparse child stays sparse across resume.task_return: a live dispatch set a structured result that persisted to thesession.resultcolumn ({"verdict":"ok",...}) and rendered in both the completion frame and the tool call.Independent cross-process corroboration. An independent OpenCode configuration deployment running as a separate process verified several of these against this feature branch, from the other side:
task.completedresultfield. In one run a reviewer emitted only atask_return— no prose "VERDICT:" line — and the plugin still wrote a correct scored row (REJECT / must=3 / should=1) computed purely from the structured result, with no regex text to fall back on. The sametask_returnobject appeared in all three sinks — completion frame,session.resultcolumn, and the parsed event field — and the enrichment payload was well-formed with no off-contract fields (small sample).wake_on_message: truereturned tocompleted, then a sibling coordinator sent four distinct work items fire-and-forget. The worker woke four times and processed all four in send order — including two sent in the same second, delivered ordered with none dropped — reaching 13 assistant turns on a child prompted once. Its own message-sequence counter ran 1→4 across the wakes, so session context persisted between wakes rather than resetting per delivery, and the worker collected results to a file the parent read rather than messaging back.Limits: terse frame-token savings were measured only here, not independently reproduced. The numbers above are assembled-token and frame-volume counts, not a direct measurement of cache-tail-rewrite cost on a long-lived parent, which the tool-test harness can't observe. Duplicate delivery was exercised only through the enqueue dedup (identical sender+body within the window); a duplicate outside the window, or two distinct items with identical body text, is not covered — a concern only for mutating work modes.
Screenshots / recordings
Not a UI change of note (the
task_returnresult now shows in the tool call output; no new UI surface).Checklist