fix(sdk): chat HITL continuations no longer break the next LLM call#3719
fix(sdk): chat HITL continuations no longer break the next LLM call#3719ericallam wants to merge 4 commits into
Conversation
🦋 Changeset detectedLatest commit: 073dbc2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 32 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Currently processing new changes in this PR. This may take a few minutes, please wait... ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR implements HITL message continuation handling for chat agents on reasoning-heavy turns. Outgoing assistant messages with advanced tool parts are slimmed to minimal resolution fields for Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/trigger-sdk/test/mockChatAgent.test.ts`:
- Around line 478-479: Replace the runtime dynamic imports of z and tool by
using static top-level imports: remove the `const { z } = await import("zod")`
and `const { tool } = await import("ai")` lines and add `import { z } from
"zod"` and `import { tool } from "ai"` at the top of the test file; if the file
already has static imports from "ai", merge or deduplicate so `tool` is imported
from that existing import instead of re-importing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b055c4c5-8daa-48de-a39d-85dbb9393238
📒 Files selected for processing (2)
packages/trigger-sdk/src/v3/ai.tspackages/trigger-sdk/test/mockChatAgent.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
- GitHub Check: packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (9)
packages/trigger-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic imports. Only use dynamicimport()when circular dependencies cannot be resolved otherwise, code splitting is needed for performance, or the module must be loaded conditionally at runtime.
Import from@trigger.dev/coreusing subpaths only - never import from the root.
When writing Trigger.dev tasks, always import from@trigger.dev/sdk. Never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Add agentcrumbs markers (//@Crumbsor `#region `@crumbs) as you write code, not just when debugging. They stay on the branch throughout development and are stripped byagentcrumbs stripbefore merge.
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
**/*.{test,spec}.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use vitest for all tests in the Trigger.dev repository
Files:
packages/trigger-sdk/test/mockChatAgent.test.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
packages/trigger-sdk/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (packages/trigger-sdk/CLAUDE.md)
Always import from
@trigger.dev/sdk. Never use@trigger.dev/sdk/v3(deprecated path alias)
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}
📄 CodeRabbit inference engine (AGENTS.md)
Code formatting must be enforced using Prettier before committing
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
**/*.test.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Unit tests should use vitest framework
Tests should avoid mocks or stubs and use helpers from@internal/testcontainerswhen Redis or Postgres are needed
**/*.test.{ts,tsx,js,jsx}: Never mock anything in tests - use testcontainers instead.
Test files should be placed next to source files (e.g.,MyService.ts->MyService.test.ts).
Files:
packages/trigger-sdk/test/mockChatAgent.test.ts
packages/**/*
📄 CodeRabbit inference engine (CLAUDE.md)
When modifying any public package (
packages/*orintegrations/*), add a changeset usingpnpm run changeset:add. Default to patch for bug fixes and minor changes.
Files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
🧠 Learnings (10)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In this repo’s trigger.dev codebase, the “never mock — use testcontainers” guideline should only be applied to integration tests that talk to real external services (e.g., Redis, Postgres, S2). For unit tests that validate in-memory logic (e.g., deduplication/cache behavior in StandardRealtimeStreamsManager and similar module-boundary call counting), it is allowed to use Vitest mocks like `vi.fn()` and to stub/mock `ApiClient` objects to count calls or simulate in-process collaborators. Do not flag `vi.fn()`-based mocks as policy violations in these unit-test scenarios; reserve the rule for true external-service integration tests.
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In the triggerdotdev/trigger.dev repo, the policy “Never mock anything — use testcontainers instead” should only be enforced for integration tests that interact with real external services (e.g., Redis, Postgres) via actual infrastructure. For unit tests that exercise pure in-memory logic (e.g., cache semantics) it is OK to stub collaborators such as `ApiClient` using Vitest (`vi.fn()`) to assert call counts or control behavior. Do not flag `vi.fn()`-based `ApiClient` stubs in unit tests as violations of the testcontainers policy.
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.ts
📚 Learning: 2026-05-19T22:37:47.286Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3671
File: packages/trigger-sdk/test/recovery-boot.test.ts:456-457
Timestamp: 2026-05-19T22:37:47.286Z
Learning: In `packages/trigger-sdk` (Trigger.dev SDK), `logger.warn` (and other SDK logger methods) should route to the Trigger.dev structured logger sink, not to `console.warn`. In SDK tests, `vi.spyOn(console, "warn")` (or similar console spies) should only be used to suppress stray console output; reviewers should not suggest asserting on `console.warn` spies to verify SDK-internal warning/fallback log behavior. Use the SDK’s structured-logger outputs/capture approach instead of console spies.
Applied to files:
packages/trigger-sdk/test/mockChatAgent.test.tspackages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-03-31T21:37:27.212Z
Learnt from: isshaddad
Repo: triggerdotdev/trigger.dev PR: 3283
File: docs/migration-n8n.mdx:19-21
Timestamp: 2026-03-31T21:37:27.212Z
Learning: When reviewing code in `packages/trigger-sdk/src/v3`, treat `tasks.triggerAndWait()` and `tasks.batchTriggerAndWait()` as real exported APIs. They are defined in `shared.ts` and re-exported via the `tasks` object in `tasks.ts`, and they take the task ID string as their first argument (not a task instance). This is distinct from the instance methods `yourTask.triggerAndWait()` and `yourTask.batchTriggerAndWait()`. Do not flag calls to `tasks.triggerAndWait()` or `tasks.batchTriggerAndWait()` as non-existent or incorrectly invoked.
Applied to files:
packages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-05-17T08:08:12.370Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3644
File: packages/trigger-sdk/src/v3/ai.ts:8695-8746
Timestamp: 2026-05-17T08:08:12.370Z
Learning: In the Trigger v3 session resume/streams logic, ensure session resumption uses sequence cursors rather than timestamps. Specifically: for each turn-complete control record written to `session.out`, include a `session-in-event-id` header whose value is the committed-consume cursor (`session.in.lastDispatchedSeqNum`). On boot/resume, scan `session.out` for the latest turn-complete record, read the `session-in-event-id` header, and seed the `sessionStreams` manager for `.in` using both `lastSeqNum` and `lastDispatchedSeqNum` so previously processed user messages are not replayed. Do not use `setMinTimestamp`/`lastOutTimestamp` for resume ordering in this flow.
Applied to files:
packages/trigger-sdk/src/v3/ai.ts
📚 Learning: 2026-05-18T14:19:56.437Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3655
File: packages/trigger-sdk/src/v3/ai.ts:8667-8731
Timestamp: 2026-05-18T14:19:56.437Z
Learning: In the Trigger SDK (v3) when making raw `fetch` calls to the Trigger API (including override paths such as `createChatStartSessionAction`), set the request headers to match `ApiClient`: `Content-Type`, `Authorization`, and `x-trigger-source: "sdk"`. Also forward the current preview branch by setting `x-trigger-branch` to `apiClientManager.branchName`. Prefer using the shared `overrideRequestHeaders(accessToken)` helper instead of manually constructing headers, so requests route correctly to preview environments.
Applied to files:
packages/trigger-sdk/src/v3/ai.ts
🔇 Additional comments (2)
packages/trigger-sdk/src/v3/ai.ts (1)
2177-2187: LGTM!packages/trigger-sdk/test/mockChatAgent.test.ts (1)
472-477: LGTM!Also applies to: 480-556, 635-638
Summary
Multi-step reasoning agents with HITL tools (OpenAI Responses with
store: false, Anthropic extended thinking, etc.) failed onchat.addToolOutput(...)continuations — either the wire payload blew the.in/appendcap (reasoning blobs + tool inputs routinely > 512 KiB), or app-side slimming workarounds got overwritten server-side and the next LLM call landed a tool call with noarguments. Both modes are fixed.Design
The per-turn merge in
chat.agentnow overlays only the tool-part state advances (output-available/output-error/approval-responded/output-denied) from the wire copy onto the hydrated/snapshot chain. Previously it replaced the entire message, which droppedinput, reasoning, and text from the LLM's view whenever the wire was slim.In parallel,
TriggerChatTransport.sendMessagesandAgentChat.sendRawnow slim the assistant message themselves onsubmit-messagecontinuations: ship{ id, role, parts: [<resolved tool part only>] }, everything else reconstructed server-side fromhydrateMessagesor the durable snapshot. Continuation payloads drop from 600 KiB – 1 MiB to ~1 KiB.references/ai-chataiChatHydrated.hydrateMessagesnow upserts by id instead of pushing. With slim continuations, a blind push duplicates the assistant id in the returned chain — the merge updates the first match, the slim duplicate goes straight totoModelMessageswith noinput, and the LLM 4xx's. This is the canonical pattern customers should mirror in their own hydrate implementations.Test plan
references/ai-chat: 19 customer-side smoke tests green; HITL wire bodies confirmed at ~1 KiB (was 600 KiB+); no provider 4xx errors across OpenAI Responses or Anthropic