feat(webapp): mollifier API mutations on buffered runs#3756
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| }); | ||
| if (buffered) { | ||
| taskRun = buffered as unknown as TaskRun; | ||
| } |
There was a problem hiding this comment.
Mirrored from CodeRabbit on the original PR (#3709, discussion r3290070588):
Validate buffered fallback payload before treating it as TaskRun.
This branch trusts a synthetic/buffered object via as unknown as TaskRun and sends it directly into ReplayTaskRunService. Add a zod parse at this boundary (for the fields replay actually needs) before assignment to avoid runtime shape mismatches from buffer snapshots.
As per coding guidelines: "Use zod for validation in packages/core and apps/webapp".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/webapp/app/routes/api.v1.runs`.$runParam.replay.ts around lines 50 - 62,
The buffered fallback result is unsafely cast to TaskRun and passed into
ReplayTaskRunService; add a Zod validation step to assert the subset of fields
ReplayTaskRunService actually consumes (validate the `buffered` object returned
by findRunByIdWithMollifierFallback), parse it with a schema in apps/webapp or
packages/core, and only assign to taskRun after zod.parse succeeds (avoid using
`as unknown as TaskRun`); reference the buffered variable, the
findRunByIdWithMollifierFallback call, TaskRun type, and the
ReplayTaskRunService usage so the validated shape matches what the service
expects.
There was a problem hiding this comment.
Addressed in f4b6064. Added a BufferedReplayInputSchema Zod schema covering the subset of TaskRun fields ReplayTaskRunService.call actually reads (id, friendlyId, runtimeEnvironmentId, taskIdentifier, payload, payloadType, queue, isTest, traceId, spanId, engine, runTags + nullable concurrencyKey/workerQueue/machinePreset/realtimeStreamsVersion). safeParse against the buffered fallback before the cast; on schema-fail we log the issue list and the route 404s rather than letting a half-shaped object reach the service.
There was a problem hiding this comment.
@d-cs, confirmed — the implementation in the current file looks great. BufferedReplayInputSchema covers exactly the fields ReplayTaskRunService.call consumes, safeParse gates the cast, and a schema mismatch now surfaces as a logged 404 rather than a silent runtime failure deep in the service.
🧠 Learnings used
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.
Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.
Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
1838229 to
af0aeeb
Compare
b8ead31 to
109fbd7
Compare
af0aeeb to
857bba3
Compare
109fbd7 to
c7a66bd
Compare
857bba3 to
21babc8
Compare
094d006 to
9914976
Compare
0919f7a to
f36c576
Compare
f4b6064 to
0547ba9
Compare
c8ab214 to
047b240
Compare
0547ba9 to
0708ce5
Compare
047b240 to
e57bc5e
Compare
0708ce5 to
396552e
Compare
| // The snapshot doesn't carry rootTaskRunId; fall back to parent | ||
| // as a rough proxy (matches the existing service's nil-coalesce | ||
| // behaviour where rootTaskRun defaults to the parent). Phase D | ||
| // / future work could thread rootTaskRunId through the snapshot. | ||
| routeOperationsToRun(bufferedEntry.parentTaskRunId, body.rootOperations, env), | ||
| ]); |
There was a problem hiding this comment.
🔴 rootOperations silently routed to parent run instead of actual root run
The buffer path sends body.rootOperations to bufferedEntry.parentTaskRunId (line 166), identical to how parentOperations are routed (line 161). The PG-side service (app/services/metadata/updateMetadata.server.ts:344) routes rootOperations to taskRun.rootTaskRun?.id ?? taskRun.id — i.e. the actual root run, falling back to the current run itself. The SyntheticRun type already carries rootTaskRunFriendlyId (app/v3/mollifier/readFallback.server.ts:88), which is populated from the snapshot (readFallback.server.ts:206), but it is never used here.
The comment on lines 162-165 incorrectly claims this "matches the existing service's nil-coalesce behaviour where rootTaskRun defaults to the parent." The PG service falls back to taskRun.id (self), not to parent.
For task hierarchies with depth ≥ 2 (e.g. grandchild → child → root), metadata.root.set(...) from a buffered grandchild will silently mutate the child's (parent's) metadata instead of the root's.
| // The snapshot doesn't carry rootTaskRunId; fall back to parent | |
| // as a rough proxy (matches the existing service's nil-coalesce | |
| // behaviour where rootTaskRun defaults to the parent). Phase D | |
| // / future work could thread rootTaskRunId through the snapshot. | |
| routeOperationsToRun(bufferedEntry.parentTaskRunId, body.rootOperations, env), | |
| ]); | |
| // The snapshot doesn't carry rootTaskRunId as an internal ID, | |
| // but does carry rootTaskRunFriendlyId. Use it for root ops | |
| // (the PG service uses taskRun.rootTaskRun?.id ?? taskRun.id). | |
| // If rootTaskRunFriendlyId is absent, skip routing root ops | |
| // rather than misrouting them to the parent. | |
| routeOperationsToRun(bufferedEntry.rootTaskRunFriendlyId, body.rootOperations, env), |
Was this helpful? React with 👍 or 👎 to provide feedback.
| @@ -0,0 +1,186 @@ | |||
| import { describe, expect, it, vi } from "vitest"; | |||
|
|
|||
| vi.mock("~/db.server", () => ({ prisma: {}, $replica: {} })); | |||
There was a problem hiding this comment.
🟡 New test files use vi.mock() and stubs, violating mandatory testing rules
All four new test files (mollifierApplyMetadataMutation.test.ts, mollifierMutateWithFallback.test.ts, mollifierResetIdempotencyKey.test.ts, mollifierResolveRunForMutation.test.ts) extensively use vi.mock("~/db.server"), vi.fn(), and hand-rolled stubs. This violates the explicit rules in tests.md ("Do not mock anything"), CLAUDE.md ("Never mock anything — use testcontainers instead"), and AGENTS.md. Per REVIEW.md calibration, this is 🟡 for isolated unit tests. Pre-existing mollifier test files in apps/webapp/test/ follow the same pattern, so the convention is established within this subsystem but still rule-violating.
Prompt for agents
All four new mollifier test files (mollifierApplyMetadataMutation.test.ts, mollifierMutateWithFallback.test.ts, mollifierResetIdempotencyKey.test.ts, mollifierResolveRunForMutation.test.ts) use vi.mock and vi.fn for Redis/Postgres stubs, violating the repo-wide rule to never mock anything and instead use testcontainers from @internal/testcontainers. The source modules under test (applyMetadataMutation.server.ts, mutateWithFallback.server.ts, etc.) already accept injected dependencies (buffer, prismaWriter, prismaReplica, etc.), which makes them testable without module-level mocks. The recommended approach is to use redisTest/postgresTest/containerTest helpers from @internal/testcontainers to stand up real Redis and Postgres instances and pass them as dependency injections, eliminating all vi.mock calls. Note that pre-existing mollifier test files in apps/webapp/test/ follow the same mock pattern, so a cleanup pass across all of them may be warranted.
Was this helpful? React with 👍 or 👎 to provide feedback.
| bufferPatch: { | ||
| type: "set_delay", | ||
| delayUntil: delayUntil.toISOString(), | ||
| }, |
There was a problem hiding this comment.
🚩 Reschedule buffer path skips DELAYED status precondition check
The PG path delegates to RescheduleTaskRunService.call (app/v3/services/rescheduleTaskRun.server.ts:10) which enforces taskRun.status !== "DELAYED" before proceeding. The buffer path applies the set_delay patch unconditionally via mutateWithFallback without any equivalent status check. For a buffered run that was originally triggered without a delay (and thus has no delayUntil in its snapshot), a reschedule API call would succeed in the buffer path, injecting a delayUntil into the snapshot. When the drainer materializes this run, it may create it with an unintended delay. This is a behavior inconsistency between PG and buffer paths but may be acceptable since buffered runs are pre-execution and conceptually "pending" — a delay on a not-yet-started run is arguably valid. Worth confirming with the drainer's handling of delayUntil in snapshots.
Was this helpful? React with 👍 or 👎 to provide feedback.
Cancel, replay, reschedule, metadata, tags, and idempotency-key-reset now succeed against a run that's still in the mollifier buffer. Mutations are applied to the buffered snapshot via Lua CAS; the drainer carries the mutation forward when it replays. Stacked on the reads PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- metadata route: drop the \`as unknown as Parameters<...>\` cast on the parent/root operations path. Widen \`routeOperationsToRun\`'s env parameter to \`AuthenticatedEnvironment\` so the service's typed signature carries through; the caller always has the full env in scope. - replay route: validate the buffered fallback against a Zod \`BufferedReplayInputSchema\` covering the fields \`ReplayTaskRunService.call\` actually reads (id, friendlyId, runtimeEnvironmentId, taskIdentifier, payload, payloadType, queue, isTest, traceId, spanId, engine, runTags + nullable concurrencyKey/workerQueue/machinePreset/realtimeStreamsVersion). Schema-fail logs the issue list and 404s rather than passing a half-shaped object into the service. - resetIdempotencyKey: distinguish "PG-empty + buffer-cleared-nothing" (genuine 404) from "PG-empty + buffer-unreachable" (partial outage — 503 with retry hint). The previous behaviour silently returned 404 on outage, hiding the partial failure and leaving a buffered key effectively un-reset. New regression test covers all four branches (PG-hit + buffer-throws, PG-empty + buffer-hit, PG-empty + buffer-clean-miss, PG-empty + buffer-outage, mollifier-disabled). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
396552e to
eb2a777
Compare
- metadata route was routing rootOperations to bufferedEntry.parentTaskRunId with a comment claiming PG's nil-coalesce defaults to parent. PG actually defaults to taskRun.id (self), so a buffered grandchild metadata.root.set() was silently mutating the child's metadata instead of the root's. SyntheticRun already carries rootTaskRunFriendlyId from the snapshot — use it, falling back to the run itself (matching PG) when absent. - reschedule route's PG path delegates to RescheduleTaskRunService which enforces `status !== "DELAYED"` and 422s otherwise. The buffer path had no equivalent guard, so a customer could inject delayUntil into the snapshot of an undelayed buffered run and the drainer would materialise it with an unintended delay. Added a pre-fetch through findRunByIdWithMollifierFallback and 422 when the buffered run has no delayUntil. SyntheticRun doesn't carry a "DELAYED" status enum (only QUEUED|FAILED|CANCELED) so the gate reads the snapshot's delayUntil field directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Cancel, replay, reschedule, metadata, tags, and idempotency-key-reset now succeed against a run that's still in the mollifier buffer. Mutations are applied to the buffered snapshot via Lua CAS; the drainer carries the mutation forward when it replays.
Primitives added:
mutateWithFallback— PG-first / buffer-fallback resolver with bounded-wait safety net for entries that transition mid-mutation.applyMetadataMutation— buffered metadata PUT mirroring the PG-side retry loop with CAS atomicity.resolveRunForMutation— discriminated-union resolver used by routefindResourceso the route builder's pre-action 404 check sees buffered runs.Routes wired (whole files, no GET/POST splits):
api.v2.runs.\$runParam.cancel.tsapi.v1.runs.\$runParam.replay.tsapi.v1.runs.\$runParam.reschedule.tsapi.v1.runs.\$runId.metadata.tsapi.v1.runs.\$runId.tags.tsresetIdempotencyKey.server.tsStacked on the reads PR.
Test plan