diff --git a/docs/ai-chat/changelog.mdx b/docs/ai-chat/changelog.mdx
index a972ca368a..08a1832ff6 100644
--- a/docs/ai-chat/changelog.mdx
+++ b/docs/ai-chat/changelog.mdx
@@ -4,6 +4,54 @@ sidebarTitle: "Changelog"
description: "Pre-release updates for AI chat agents."
---
+
+
+## HITL continuations — slim wire by default + field-level merge
+
+`chat.addToolOutput(...)` and `chat.addToolApproveResponse(...)` continuations on reasoning-heavy agent loops used to fail two ways: either the wire body crossed the `/in/append` cap (encrypted reasoning blobs + tool input routinely > 512 KiB), or apps that slimmed the wire as a workaround landed a tool call with no `arguments` on the next LLM step (the per-turn merge replaced the hydrated message wholesale instead of overlaying only the new tool-state advance). Both modes are fixed.
+
+The transport (`TriggerChatTransport.sendMessages`, `AgentChat.sendRaw`) now slims the assistant message itself on `submit-message` turns whose assistant carries resolved or approval-responded tool parts. The wire shape ships as `{ id, role: "assistant", parts: [] }` — `state` plus `output` / `errorText` / `approval`, depending on the new state. Everything else (reasoning blobs, prior text, tool `input`, provider metadata) is reconstructed server-side from `hydrateMessages` or the durable snapshot. Continuation payloads typically drop from 600 KiB – 1 MiB to ~1 KiB.
+
+The per-turn merge now overlays only the tool-part state advances (`output-available` / `output-error` / `approval-responded` / `output-denied`) from the wire copy onto the matching hydrated entry. Hydrated `input`, text, reasoning, and provider metadata stay put. The agent still accepts a fuller `UIMessage` on the wire (the merge only reads the resolved fields), so custom transports that ship more don't break — they just waste bytes.
+
+### `hydrateMessages` upsert-by-id
+
+If your `hydrateMessages` hook persists the incoming message, **upsert by id** — don't unconditionally push. HITL continuations ship the existing assistant's id with a slim payload; a blind `stored.push(newMsg)` duplicates the row in the chain you return, the merge updates the first match, and the slim duplicate hits `toModelMessages` with no `input`.
+
+A new `upsertIncomingMessage` helper is exported from `@trigger.dev/sdk/ai` to handle this for the common case:
+
+```ts
+import { chat, upsertIncomingMessage } from "@trigger.dev/sdk/ai";
+
+chat.agent({
+ hydrateMessages: async ({ chatId, trigger, incomingMessages }) => {
+ const record = await db.chat.findUnique({ where: { id: chatId } });
+ const stored = record?.messages ?? [];
+ if (upsertIncomingMessage(stored, { trigger, incomingMessages })) {
+ await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
+ }
+ return stored;
+ },
+});
+```
+
+The helper pushes fresh user messages, no-ops on HITL continuations (so the runtime can overlay the new tool-state advance), and skips on non-`submit-message` triggers. Returns `true` if it mutated `stored`. The examples in [lifecycle hooks](/ai-chat/lifecycle-hooks#hydratemessages), [Database persistence](/ai-chat/patterns/database-persistence#alternative-hydratemessages), and [Persistence and replay](/ai-chat/patterns/persistence-and-replay) have all been updated. Custom hydrate logic (branching, rollback, etc.) can still write the upsert by hand — the helper is a convenience for the common shape.
+
+### `onValidateMessages` slim wire caveat
+
+The slim wire is what arrives in `onValidateMessages` on HITL turns. `validateUIMessages` from `ai` rejects the slim shape (the AI SDK schema requires `input` on resolved tool parts), so filter to user messages first (or skip validation entirely on those turns). See the updated example in [lifecycle hooks](/ai-chat/lifecycle-hooks#onvalidatemessages).
+
+### `/in/append` 413 + precise cap
+
+In parallel:
+
+- The 413 response now carries CORS headers, so browser fetches can read the status instead of failing as opaque `TypeError: Failed to fetch`. App-side retry-on-disconnect loops no longer spin forever on a permanently-rejected payload.
+- The per-record cap is now computed precisely against S2's actual ceiling instead of the conservative 512 KiB floor. Legitimate ~600 – 900 KiB tool outputs (search results, file content) now succeed; pathological all-quote content that would double under JSON escape still rejects cleanly with a clear error.
+
+See the updated [413 row in the client protocol](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions).
+
+
+
## v4.5.0-rc.1 — two bug fixes
diff --git a/docs/ai-chat/client-protocol.mdx b/docs/ai-chat/client-protocol.mdx
index 548f428339..0a94327b78 100644
--- a/docs/ai-chat/client-protocol.mdx
+++ b/docs/ai-chat/client-protocol.mdx
@@ -692,7 +692,7 @@ The body is a JSON-serialized [`ChatInputChunk`](#chatinputchunk) — a tagged u
| `401` | Missing or invalid `Authorization` header. |
| `403` | Token doesn't carry `write:sessions:{externalId}`. |
| `409` | The session is closed — `{ "ok": false, "error": "Cannot append to a closed session" }`. |
-| `413` | Body exceeds 512 KiB. A normal `kind: "message"` payload is a few KB; if you hit this you're shipping more than one message per record. |
+| `413` | Body exceeds 1 MiB **or** the wrapped record would exceed S2's ~1 MiB per-record metered ceiling. A normal `kind: "message"` payload is a few KB; if you hit this you're shipping more than one message per record or pushing a single tool output that's itself oversized. Carries CORS headers so browser fetches can read the status. |
| `500` | Transient backend failure on the durable stream. Safe to retry — appends are idempotent on `(externalId, X-Part-Id)` if you set the optional `X-Part-Id` request header (the built-in clients set it from a UUID). |
@@ -851,7 +851,7 @@ The agent trims trailing assistant messages from its accumulator and re-streams
### Tool approval responses
-When a tool requires approval (`needsApproval: true`), the agent streams the tool call with an `approval-requested` state and completes the turn. After the user approves or denies, send the **updated assistant message** (with `approval-responded` tool parts) back as a `kind: "message"` chunk — singular, not the full chain:
+When a tool requires approval (`needsApproval: true`), the agent streams the tool call with an `approval-requested` state and completes the turn. After the user approves or denies, send the **updated assistant message** back as a `kind: "message"` chunk — singular, not the full chain. The minimum shape the agent reads is just the resolved tool parts:
```json
{
@@ -861,12 +861,10 @@ When a tool requires approval (`needsApproval: true`), the agent streams the too
"id": "asst-msg-1",
"role": "assistant",
"parts": [
- { "type": "text", "text": "I'll send that email for you." },
{
"type": "tool-sendEmail",
"toolCallId": "call-1",
"state": "approval-responded",
- "input": { "to": "user@example.com", "subject": "Hello" },
"approval": { "id": "approval-1", "approved": true }
}
]
@@ -878,7 +876,11 @@ When a tool requires approval (`needsApproval: true`), the agent streams the too
}
```
-The agent matches the incoming message by `id` against the rebuilt accumulator. If a match is found, it **replaces** the existing message instead of appending.
+The agent matches the incoming message by `id` against the rebuilt accumulator (or hydrated chain) and **overlays the tool-state advance** onto the matching entry — `state` plus `output` / `errorText` / `approval`, depending on the new state. Hydrated `input`, text, reasoning, and provider metadata stay put. This is what makes the slim shape above sufficient: the agent rebuilds everything else from the snapshot or from your `hydrateMessages` hook.
+
+The same shape applies to HITL `addToolOutput` answers — substitute `state: "output-available"` and `output: ` for the approval pair above. Single-tool HITL `addToolOutput` continuation payloads are typically ~1 KiB on the wire.
+
+The built-in transports (`TriggerChatTransport`, `AgentChat`) ship the slim shape by default on `submit-message` continuations. Custom transports can ship a fuller `UIMessage` — the agent still only reads the resolved tool-part fields — but the slim shape is the most efficient and avoids brushing the per-record cap on reasoning-heavy turns.
The message `id` must match the one the agent assigned during streaming. `TriggerChatTransport` keeps IDs in sync automatically. Custom transports should use the `messageId` from the stream's `start` chunk.
@@ -938,7 +940,7 @@ To bridge that gap, the head-start route handler ships **full UIMessage history*
Two reasons this exception is safe:
-1. **The route handler runs against the customer's own HTTP endpoint**, not `/realtime/v1/sessions/{id}/in/append`. The 512 KiB body cap on the realtime route doesn't apply.
+1. **The route handler runs against the customer's own HTTP endpoint**, not `/realtime/v1/sessions/{id}/in/append`. The per-record cap on the realtime route doesn't apply.
2. **`headStartMessages` is only honored on `trigger: "handover-prepare"`**. The runtime ignores the field on every other trigger — the one-message-per-record rule still holds for normal turns.
After turn 1 completes, the snapshot is written and turn 2+ run as a normal single-message-per-record chat.
@@ -1067,7 +1069,7 @@ No. `seq_num` is monotonic across the entire session — turn 1 might emit seq 0
-512 KiB. A typical `kind: "message"` is a few KB. If you're brushing the cap you're shipping more than one message per record, which the protocol forbids. The headStart path (`trigger: "handover-prepare"`) sends through the customer's own HTTP route handler, not `.in/append`, so the cap doesn't apply there.
+The HTTP body is capped at 1 MiB as a DoS guard. The actual ceiling is at the storage layer: each `.in/append` becomes a single S2 record, metered as `8 + body_bytes_after_JSON_wrap`, capped at 1 MiB. So the practical limit on the raw HTTP body sits around ~1023 KiB for content with low JSON-escape overhead (ASCII, base64) and ~512 KiB for content that escapes heavily (all quotes / backslashes). A typical `kind: "message"` is a few KiB. If you're brushing the cap you're either shipping a single tool output that's itself oversized — see [Large payloads](/ai-chat/patterns/large-payloads) — or you're shipping more than one message per record, which the protocol forbids. The 413 response carries CORS headers so browser fetches can read the status. The headStart path (`trigger: "handover-prepare"`) sends through the customer's own HTTP route handler, not `.in/append`, so the cap doesn't apply there.
## See also
diff --git a/docs/ai-chat/lifecycle-hooks.mdx b/docs/ai-chat/lifecycle-hooks.mdx
index c6ea62cbc8..f1e9ce9361 100644
--- a/docs/ai-chat/lifecycle-hooks.mdx
+++ b/docs/ai-chat/lifecycle-hooks.mdx
@@ -242,7 +242,11 @@ import { validateUIMessages } from "ai";
export const myChat = chat.agent({
id: "my-chat",
onValidateMessages: async ({ messages }) => {
- return validateUIMessages({ messages, tools: chatTools });
+ const userMessages = messages.filter((m) => m.role === "user");
+ if (userMessages.length > 0) {
+ await validateUIMessages({ messages: userMessages, tools: chatTools });
+ }
+ return messages;
},
run: async ({ messages, signal }) => {
return streamText({ model: anthropic("claude-sonnet-4-5"), messages, tools: chatTools, abortSignal: signal });
@@ -250,6 +254,10 @@ export const myChat = chat.agent({
});
```
+
+ On HITL continuations (`addToolOutput` / `addToolApproveResponse`) the assistant entry in `messages` is **slim** — `state` + `output` / `errorText` / `approval` only, no `input` or other parts. `validateUIMessages` against the AI SDK schema rejects that shape (the schema requires `input` on resolved tool parts), so filter to user messages first (or skip validation entirely on those turns). The example above does the filter.
+
+
`onValidateMessages` fires **before** `onTurnStart` and message accumulation. If you need to validate messages loaded from a database, do the loading in `onChatStart` or `onPreload` and let `onValidateMessages` validate the full incoming set each turn.
@@ -272,16 +280,15 @@ Use this when the backend should be the source of truth for message history: abu
| `previousRunId` | `string \| undefined` | The previous run ID (if continuation) |
```ts
+import { chat, upsertIncomingMessage } from "@trigger.dev/sdk/ai";
+
export const myChat = chat.agent({
id: "my-chat",
hydrateMessages: async ({ chatId, trigger, incomingMessages }) => {
const record = await db.chat.findUnique({ where: { id: chatId } });
const stored = record?.messages ?? [];
- // Append the new user message and persist
- if (trigger === "submit-message" && incomingMessages.length > 0) {
- const newMsg = incomingMessages[incomingMessages.length - 1]!;
- stored.push(newMsg);
+ if (upsertIncomingMessage(stored, { trigger, incomingMessages })) {
await db.chat.update({
where: { id: chatId },
data: { messages: stored },
@@ -296,9 +303,13 @@ export const myChat = chat.agent({
});
```
+`upsertIncomingMessage` (exported from `@trigger.dev/sdk/ai`) handles the three cases that matter — fresh user messages get pushed, HITL continuations (`addToolOutput` / `addToolApproveResponse`) no-op because the incoming wire shares the existing assistant's id and the runtime overlays the new tool-state advance onto that entry, and non-`submit-message` triggers (`regenerate-message` / `action`) skip persistence. It returns `true` when it mutated `stored`, so the caller knows whether to persist.
+
+If you need branching, rollback, or other custom hydrate logic, you can still write the upsert by hand — `upsertIncomingMessage` is a convenience for the common case, not the only supported shape.
+
**Lifecycle position:** `onValidateMessages` → **`hydrateMessages`** → `onChatStart` (chat's first message only) → `onTurnStart` → `run()`
-After the hook returns, any incoming wire message whose ID matches a hydrated message is auto-merged. This makes [tool approvals](/ai-chat/frontend#tool-approvals) work transparently with hydration.
+After the hook returns, the runtime overlays the wire's tool-state advances (`output-available` / `output-error` / `approval-responded` / `output-denied`) onto matching hydrated entries by id. Everything else on the hydrated entry — text, reasoning, tool `input`, providerMetadata — stays put. This makes [tool approvals](/ai-chat/frontend#tool-approvals) and HITL `addToolOutput` continuations work transparently: ship a slim resolution on the wire, the agent merges the new state onto your DB-backed copy.
`hydrateMessages` also fires for [action](/ai-chat/actions) turns (`trigger: "action"`) with empty `incomingMessages`. This lets the action handler work with the latest DB state.
diff --git a/docs/ai-chat/patterns/database-persistence.mdx b/docs/ai-chat/patterns/database-persistence.mdx
index 5ee32f8a6b..0bfc447f31 100644
--- a/docs/ai-chat/patterns/database-persistence.mdx
+++ b/docs/ai-chat/patterns/database-persistence.mdx
@@ -178,14 +178,19 @@ For apps that need the backend to be the single source of truth for message hist
With hydration, the hook loads messages from your database on every turn. The frontend's messages are ignored (except for the new user message, which arrives in `incomingMessages`):
```ts
+import { chat, upsertIncomingMessage } from "@trigger.dev/sdk/ai";
+
export const myChat = chat.agent({
id: "my-chat",
hydrateMessages: async ({ chatId, trigger, incomingMessages }) => {
const record = await db.chat.findUnique({ where: { id: chatId } });
const stored = record?.messages ?? [];
- if (trigger === "submit-message" && incomingMessages.length > 0) {
- stored.push(incomingMessages[incomingMessages.length - 1]!);
+ // `upsertIncomingMessage` pushes a fresh user message and no-ops
+ // on HITL continuations (the runtime overlays the new tool-state
+ // advance onto the existing entry). See lifecycle hooks for the
+ // full pattern: /ai-chat/lifecycle-hooks#hydratemessages
+ if (upsertIncomingMessage(stored, { trigger, incomingMessages })) {
await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
}
diff --git a/docs/ai-chat/patterns/persistence-and-replay.mdx b/docs/ai-chat/patterns/persistence-and-replay.mdx
index f1008dda26..4e1bdf4084 100644
--- a/docs/ai-chat/patterns/persistence-and-replay.mdx
+++ b/docs/ai-chat/patterns/persistence-and-replay.mdx
@@ -131,7 +131,7 @@ If `onAction` mutates `chat.history.*` and then the run crashes before the next
When the customer registers a [`hydrateMessages`](/ai-chat/lifecycle-hooks#hydratemessages) hook, the runtime trusts the hook to be the source of truth for history. Snapshot read and replay are **skipped entirely** at boot. The hook fires per turn, returns the canonical chain from the customer's database, and the accumulator is set to whatever the hook returned.
```ts
-import { chat } from "@trigger.dev/sdk/ai";
+import { chat, upsertIncomingMessage } from "@trigger.dev/sdk/ai";
import { db } from "@/lib/db";
export const myChat = chat.agent({
@@ -139,8 +139,9 @@ export const myChat = chat.agent({
hydrateMessages: async ({ chatId, trigger, incomingMessages }) => {
const stored = (await db.chat.findUnique({ where: { id: chatId } }))?.messages ?? [];
- if (trigger === "submit-message" && incomingMessages.length > 0) {
- stored.push(incomingMessages[0]!);
+ // See lifecycle-hooks for the full upsert pattern + rationale:
+ // /ai-chat/lifecycle-hooks#hydratemessages
+ if (upsertIncomingMessage(stored, { trigger, incomingMessages })) {
await db.chat.update({ where: { id: chatId }, data: { messages: stored } });
}
diff --git a/docs/ai-chat/patterns/trusted-edge-signals.mdx b/docs/ai-chat/patterns/trusted-edge-signals.mdx
index 1dd5f97d3f..181a389547 100644
--- a/docs/ai-chat/patterns/trusted-edge-signals.mdx
+++ b/docs/ai-chat/patterns/trusted-edge-signals.mdx
@@ -115,7 +115,7 @@ The body is a JSON-serialized `ChatInputChunk`. The proxy parses it, checks `kin
}
```
-Both bodies stay well under the [512 KiB cap on `/in/append`](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions) — a typical trust object is ~200 bytes.
+Both bodies stay well under the [per-record cap on `/in/append`](/ai-chat/client-protocol#step-3-send-messages-stops-and-actions) — a typical trust object is ~200 bytes.
Other paths — `.out` SSE, `/api/v1/auth/jwt/claims`, anything else — pass through the proxy untouched. The SSE stream in particular must not be buffered; preserve the response body as-is.