Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .server-changes/idempotent-background-worker-registration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
area: webapp
type: fix
---

Make `POST /api/v1/deployments/:deploymentId/background-workers` idempotent (for sequential requests only) so client-side retries no longer collide on the `BackgroundWorker` `(project, env, version)` unique index. Helps make deployments more resilient against the class of indexing failures that surfaces in the dashboard as "Indexing timed out", e.g. during transient database issues.
56 changes: 27 additions & 29 deletions apps/webapp/app/v3/services/createBackgroundWorker.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,31 +34,8 @@ import { tryCatch } from "@trigger.dev/core/v3";
import { engine } from "../runEngine.server";
import { scheduleEngine } from "../scheduleEngine.server";

/**
* Strip BackgroundWorkerMetadata down to the slice that's actually read after
* storage. Everything else is duplicated to dedicated columns/tables
* (BackgroundWorker.{contentHash,cliVersion,sdkVersion,runtime,runtimeVersion},
* BackgroundWorkerTask, BackgroundWorkerFile, TaskQueue, Prompt). Today the
* only post-write reader is changeCurrentDeployment.server.ts, which feeds
* tasks[].schedule into syncDeclarativeSchedules. packageVersion, contentHash,
* and tasks[].filePath are kept solely to satisfy BackgroundWorkerMetadata's
* required fields when the column is parsed back.
*/
export function stripBackgroundWorkerMetadataForStorage(
metadata: BackgroundWorkerMetadata
): Prisma.InputJsonValue {
return {
packageVersion: metadata.packageVersion,
contentHash: metadata.contentHash,
tasks: metadata.tasks
.filter((t) => t.schedule)
.map((t) => ({
id: t.id,
filePath: t.filePath,
schedule: t.schedule,
})),
};
}
import { stripBackgroundWorkerMetadataForStorage } from "./stripBackgroundWorkerMetadataForStorage.server";
export { stripBackgroundWorkerMetadataForStorage };

export class CreateBackgroundWorkerService extends BaseService {
private readonly _taskMetaCache: TaskMetadataCache;
Expand Down Expand Up @@ -356,8 +333,13 @@ async function createWorkerTask(
prisma: PrismaClientOrTransaction,
tasksToBackgroundFiles?: Map<string, string>
): Promise<TaskMetadataEntry | null> {
// Hoisted so the P2002 catch branch can return the same entry shape.
let queue: TaskQueue | undefined;
let resolvedTriggerSource: "SCHEDULED" | "AGENT" | "STANDARD" | undefined;
let resolvedTtl: string | null | undefined;

try {
let queue = queues.find((queue) => queue.name === task.queue?.name);
queue = queues.find((queue) => queue.name === task.queue?.name);

if (!queue) {
// Create a TaskQueue
Expand All @@ -374,14 +356,14 @@ async function createWorkerTask(
);
}

const resolvedTriggerSource =
resolvedTriggerSource =
task.triggerSource === "schedule"
? ("SCHEDULED" as const)
: task.triggerSource === "agent"
? ("AGENT" as const)
: ("STANDARD" as const);

const resolvedTtl =
resolvedTtl =
typeof task.ttl === "number" ? stringifyDuration(task.ttl) ?? null : task.ttl ?? null;

await prisma.backgroundWorkerTask.create({
Expand Down Expand Up @@ -418,10 +400,26 @@ async function createWorkerTask(
if (error instanceof Prisma.PrismaClientKnownRequestError) {
// The error code for unique constraint violation in Prisma is P2002
if (error.code === "P2002") {
logger.warn("Task already exists", {
// Retry landing after the first attempt's row was already written.
const existing = await prisma.backgroundWorkerTask.findFirst({
where: { workerId: worker.id, slug: task.id },
select: { id: true },
});

logger.warn("Attempted to recreate background worker task", {
task,
worker,
});

if (existing && queue && resolvedTriggerSource && resolvedTtl !== undefined) {
return {
slug: task.id,
ttl: resolvedTtl,
triggerSource: resolvedTriggerSource,
queueId: queue.id,
queueName: queue.name,
};
}
} else {
logger.error("Prisma Error creating background worker task", {
error: {
Expand Down
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 #failBackgroundWorkerDeployment lacks status guard, can regress deployment from DEPLOYING to FAILED on overlapping retries

The BUILDING→DEPLOYING transition was correctly guarded with updateMany + status: "BUILDING" (line 196-209), but #failBackgroundWorkerDeployment still uses an unconditional update (line 235). Before this PR, a retry would fail fast on P2002 at backgroundWorker.create and never reach the error handlers. Now, findOrCreateBackgroundWorker allows retries to proceed past worker creation into file/resource/schedule creation. If a client-side timeout causes the first request (A) and the retry (B) to overlap on the server, and B encounters a transient error in createBackgroundFiles/createWorkerResources/syncDeclarativeSchedules, B calls #failBackgroundWorkerDeployment which unconditionally sets the deployment to FAILED — even if A has already transitioned it to DEPLOYING. A's subsequent updateMany then finds 0 rows (status is FAILED, not BUILDING) and silently skips the transition, leaving the deployment stuck in FAILED.

Concrete scenario timeline
  1. Request A: creates worker, begins creating files/resources
  2. Client timeout → Request B starts
  3. Request B: findOrCreateBackgroundWorker returns existing worker
  4. Request A: updateMany succeeds → deployment now DEPLOYING
  5. Request B: transient DB error in createBackgroundFiles
  6. Request B: #failBackgroundWorkerDeployment → unconditional update sets FAILED
  7. Deployment stuck in FAILED despite A's successful work

(Refers to lines 234-252)

Prompt for agents
The #failBackgroundWorkerDeployment method at line 234-252 in createDeploymentBackgroundWorkerV4.server.ts uses an unconditional prisma.workerDeployment.update to set status to FAILED. This should use the same optimistic concurrency control pattern as the BUILDING→DEPLOYING transition at lines 196-209: use updateMany with a status: BUILDING guard in the where clause, so that if the deployment has already moved past BUILDING (e.g. to DEPLOYING by a concurrent first request), the FAILED transition is a no-op rather than a state regression. The TimeoutDeploymentService.dequeue call can remain unconditional since dequeuing an already-dequeued timer is harmless. Note that the method currently throws the error after the update; this control flow should be preserved.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
import { CreateBackgroundWorkerRequestBody, logger, tryCatch } from "@trigger.dev/core/v3";
import { BackgroundWorkerId } from "@trigger.dev/core/v3/isomorphic";
import type { BackgroundWorker, PrismaClientOrTransaction, WorkerDeployment } from "@trigger.dev/database";
import type {
BackgroundWorker,
PrismaClientOrTransaction,
WorkerDeployment,
} from "@trigger.dev/database";
import { AuthenticatedEnvironment } from "~/services/apiAuth.server";
import { type TaskMetadataCache } from "~/services/taskMetadataCache.server";
import { taskMetadataCacheInstance } from "~/services/taskMetadataCacheInstance.server";
import { BaseService, ServiceValidationError } from "./baseService.server";
import {
createBackgroundFiles,
createWorkerResources,
stripBackgroundWorkerMetadataForStorage,
syncDeclarativeSchedules,
} from "./createBackgroundWorker.server";
import { findOrCreateBackgroundWorker } from "./createDeploymentBackgroundWorkerV4/findOrCreateBackgroundWorker.server";
import { TimeoutDeploymentService } from "./timeoutDeployment.server";
import { env } from "~/env.server";

Expand Down Expand Up @@ -50,6 +53,11 @@ export class CreateDeploymentBackgroundWorkerServiceV4 extends BaseService {
});

if (!deployment) {
logger.warn("createDeploymentBackgroundWorker: deployment not found", {
deploymentId,
environmentId: environment.id,
projectId: environment.projectId,
});
return;
}

Expand All @@ -70,25 +78,21 @@ export class CreateDeploymentBackgroundWorkerServiceV4 extends BaseService {
}

if (deployment.status !== "BUILDING") {
logger.warn("createDeploymentBackgroundWorker: deployment not in BUILDING state", {
deploymentId,
deploymentStatus: deployment.status,
environmentId: environment.id,
projectId: environment.projectId,
});
return;
}

const backgroundWorker = await this._prisma.backgroundWorker.create({
data: {
...BackgroundWorkerId.generate(),
version: deployment.version,
runtimeEnvironmentId: environment.id,
projectId: environment.projectId,
metadata: stripBackgroundWorkerMetadataForStorage(body.metadata),
contentHash: body.metadata.contentHash,
cliVersion: body.metadata.cliPackageVersion,
sdkVersion: body.metadata.packageVersion,
supportsLazyAttempts: body.supportsLazyAttempts,
engine: body.engine,
runtime: body.metadata.runtime,
runtimeVersion: body.metadata.runtimeVersion,
},
});
const backgroundWorker = await findOrCreateBackgroundWorker(
environment,
deployment,
body,
this._prisma
);
Comment on lines +90 to +95
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 409 ServiceValidationError from findOrCreateBackgroundWorker doesn't fail the deployment

When findOrCreateBackgroundWorker throws a 409 ServiceValidationError (content hash mismatch at findOrCreateBackgroundWorker.server.ts:33-37), the error propagates to the API route handler which returns the 409 to the client. However, #failBackgroundWorkerDeployment is never called for this error path — the deployment stays in BUILDING and will eventually time out via TimeoutDeploymentService. This is the same behavior as the old code (P2002 from create would also propagate without failing the deployment). It's not a regression but could be improved by wrapping this call site to transition the deployment to FAILED on 409.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


//upgrade the project to engine "V2" if it's not already
if (environment.project.engine === "V1" && body.engine === "V2") {
Expand Down Expand Up @@ -188,10 +192,11 @@ export class CreateDeploymentBackgroundWorkerServiceV4 extends BaseService {
throw serviceError;
}

// Link the deployment with the background worker
await this._prisma.workerDeployment.update({
// Guarded BUILDING → DEPLOYING transition. `updateMany` for optimistic concurrency control
const { count: updatedCount } = await this._prisma.workerDeployment.updateMany({
where: {
id: deployment.id,
status: "BUILDING",
},
data: {
status: "DEPLOYING",
Expand All @@ -203,6 +208,18 @@ export class CreateDeploymentBackgroundWorkerServiceV4 extends BaseService {
},
});

if (updatedCount === 0) {
logger.warn(
"createDeploymentBackgroundWorker: deployment no longer in BUILDING state, skipping DEPLOYING transition",
{
deploymentId,
environmentId: environment.id,
projectId: environment.projectId,
}
);
return backgroundWorker;
}

await TimeoutDeploymentService.enqueue(
deployment.id,
"DEPLOYING",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import type { CreateBackgroundWorkerRequestBody } from "@trigger.dev/core/v3";
import { BackgroundWorkerId } from "@trigger.dev/core/v3/isomorphic";
import type {
BackgroundWorker,
PrismaClientOrTransaction,
WorkerDeployment,
} from "@trigger.dev/database";
import type { AuthenticatedEnvironment } from "~/services/apiAuth.server";
import { ServiceValidationError } from "../common.server";
import { stripBackgroundWorkerMetadataForStorage } from "../stripBackgroundWorkerMetadataForStorage.server";

/**
* Idempotent on `(project, environment, version)` for sequential calls, not concurrent calls.
*/
export async function findOrCreateBackgroundWorker(
environment: AuthenticatedEnvironment,
deployment: WorkerDeployment,
body: CreateBackgroundWorkerRequestBody,
prisma: PrismaClientOrTransaction
): Promise<BackgroundWorker> {
const existing = await prisma.backgroundWorker.findFirst({
where: {
projectId: environment.projectId,
runtimeEnvironmentId: environment.id,
version: deployment.version,
},
});

if (existing && existing.contentHash === body.metadata.contentHash) {
return existing;
}

if (existing) {
throw new ServiceValidationError(
"A background worker for this deployment version already exists with a different content hash",
409
);
}

return prisma.backgroundWorker.create({
data: {
...BackgroundWorkerId.generate(),
version: deployment.version,
runtimeEnvironmentId: environment.id,
projectId: environment.projectId,
metadata: stripBackgroundWorkerMetadataForStorage(body.metadata),
contentHash: body.metadata.contentHash,
cliVersion: body.metadata.cliPackageVersion,
sdkVersion: body.metadata.packageVersion,
supportsLazyAttempts: body.supportsLazyAttempts,
engine: body.engine,
runtime: body.metadata.runtime,
runtimeVersion: body.metadata.runtimeVersion,
},
});
Comment on lines +40 to +55
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 findOrCreateBackgroundWorker has unhandled P2002 from create on concurrent calls

The findOrCreateBackgroundWorker function (findOrCreateBackgroundWorker.server.ts:40-55) uses a check-then-create pattern against the @@unique([projectId, runtimeEnvironmentId, version]) constraint. If two requests concurrently find no existing worker and both attempt create, one will get a P2002 unique constraint violation that is not caught. The function's JSDoc says "for sequential calls, not concurrent calls" so this is by design, but note that client-side HTTP retries after a timeout can cause server-side concurrency (the server continues processing the first request after the client disconnects). In that scenario, the uncaught P2002 would propagate as a 500 to the retry request. This is actually safer than the alternative (both proceeding), as it prevents state corruption — but the error message will be an opaque Prisma error rather than a user-friendly message.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { BackgroundWorkerMetadata } from "@trigger.dev/core/v3";
import { Prisma } from "@trigger.dev/database";

/**
* Strip BackgroundWorkerMetadata down to the slice that's actually read after
* storage. Everything else is duplicated to dedicated columns/tables
* (BackgroundWorker.{contentHash,cliVersion,sdkVersion,runtime,runtimeVersion},
* BackgroundWorkerTask, BackgroundWorkerFile, TaskQueue, Prompt). Today the
* only post-write reader is changeCurrentDeployment.server.ts, which feeds
* tasks[].schedule into syncDeclarativeSchedules. packageVersion, contentHash,
* and tasks[].filePath are kept solely to satisfy BackgroundWorkerMetadata's
* required fields when the column is parsed back.
*/
export function stripBackgroundWorkerMetadataForStorage(
metadata: BackgroundWorkerMetadata
): Prisma.InputJsonValue {
return {
packageVersion: metadata.packageVersion,
contentHash: metadata.contentHash,
tasks: metadata.tasks
.filter((t) => t.schedule)
.map((t) => ({
id: t.id,
filePath: t.filePath,
schedule: t.schedule,
})),
};
}
Loading