Skip to content

feat: cancel pending prebuilds from non-active template versions#20387

Merged
ssncferreira merged 23 commits into
mainfrom
ssncferreira/feat-cancel-pending-prebuilds
Oct 24, 2025
Merged

feat: cancel pending prebuilds from non-active template versions#20387
ssncferreira merged 23 commits into
mainfrom
ssncferreira/feat-cancel-pending-prebuilds

Conversation

@ssncferreira
Copy link
Copy Markdown
Contributor

@ssncferreira ssncferreira commented Oct 20, 2025

Description

This PR introduces an optimization to automatically cancel pending prebuild-related jobs from non-active template versions in the reconciliation loop.

Problem

Currently, when a template is configured with more prebuild instances than available provisioners, the provisioner queue can become flooded with pending prebuild jobs. This issue is worsened when provisioning/deprovisioning operations take a long time.

When the prebuild reconciliation loop generates jobs faster than provisioners can process them, pending jobs accumulate in the queue. Since prebuilt workspaces should always run the latest active template version, pending prebuild jobs from non-active versions become obsolete once a new version is promoted.

Solution

The reconciliation loop cancels pending prebuild-related jobs from non-active template versions that match the following criteria:

  • Build number: 1 (initial build created by the reconciliation loop)
  • Job status: pending
  • Not yet picked up by a provisioner (worker_id is NULL)
  • Owned by the prebuilds system user
  • Workspace transition: start

This prevents the queue from being cluttered with stale prebuild jobs that would provision workspaces on an outdated template version that would consequently need to be deprovisioned.

Changes

  • Added new SQL query CountPendingNonActivePrebuilds to identify presets with pending jobs from non-active versions
  • Added new SQL query UpdatePrebuildProvisionerJobWithCancel to cancel jobs for a specific preset
  • New reconciliation action type ActionTypeCancelPending handles the cancellation logic
  • Cancellation is non-blocking: failures to cancel prebuild jobs are logged as errors and don't prevent other reconciliation actions

Follow-up PR

Canceling pending prebuild jobs leaves workspaces in a Canceled state. While no Terraform resources need to be destroyed (since jobs were canceled before provisioning started), these database records should still be cleaned up. This will be addressed in a follow-up PR.

Closes: #20242

@ssncferreira ssncferreira force-pushed the ssncferreira/feat-cancel-pending-prebuilds branch 6 times, most recently from f1a8e32 to 9cfb23a Compare October 21, 2025 11:58
@ssncferreira ssncferreira force-pushed the ssncferreira/feat-cancel-pending-prebuilds branch from 9cfb23a to 29cc88e Compare October 21, 2025 17:28
@ssncferreira ssncferreira force-pushed the ssncferreira/feat-cancel-pending-prebuilds branch from 73cf135 to 51fa3ef Compare October 21, 2025 18:04
Comment thread coderd/templateversions.go Outdated
Comment thread coderd/templateversions.go Outdated
@ssncferreira ssncferreira marked this pull request as ready for review October 21, 2025 19:04
Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but I'm not 100% convinced about the row locking?

Comment thread coderd/database/queries/provisionerjobs.sql Outdated
Comment thread coderd/database/queries/templates.sql Outdated
Comment thread coderd/database/modelmethods.go Outdated
Comment thread coderd/database/dbauthz/dbauthz.go Outdated
Comment thread coderd/database/dbauthz/dbauthz.go Outdated
func (q *querier) UpdatePrebuildProvisionerJobWithCancel(ctx context.Context, arg database.UpdatePrebuildProvisionerJobWithCancelParams) ([]uuid.UUID, error) {
// This is a system-only operation for canceling pending prebuild-related jobs
// when a new template version is promoted.
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceSystem); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note(non-blocking): just want to note rbac.ResourceSystem usage here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we give subjectPrebuildsOrchestrator the ability to read/update provisioner jobs, we should be able to use rbac.ResourceProvisionerJob here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, permissions are a bit tricky here.

There is a ResourceProvisionerJobs that we could use here, but that would mean that all users with this permissions would be abel to execute it.

The way it is done for UpdateProvisionerJobWithCancelByID, for instance, is to check the Update permissions on the workspace. Since it is a prebuild, we could also do a
if err := q.authorizeContext(ctx, policy.ActionUpdate, rbac.ResourcePrebuiltWorkspace); err != nil { Maybe this one makes the most sense 🤔

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a bad approximation actually 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the rbac.ResourcePrebuiltWorkspace update check in dcb6de1

Comment thread coderd/database/queries/provisionerjobs.sql Outdated
Comment thread coderd/database/queries/provisionerjobs.sql Outdated
Comment thread coderd/database/queries/provisionerjobs.sql Outdated
Comment thread coderd/templateversions.go Outdated
@ssncferreira ssncferreira removed the request for review from SasSwart October 22, 2025 10:25
@ssncferreira
Copy link
Copy Markdown
Contributor Author

Note: Moving this PR back to draft.

After internal discussion, we've decided to move the logic to cancel pending prebuilds to the prebuilds reconciliation loop rather than handling it in the patchActiveTemplateVersion endpoint.

This PR will be updated to reflect these changes:

  • Prebuild cancellation logic will be moved to the reconciliation loop
  • Implementation for deleting cancelled prebuilds will be handled in a separate PR for simplicity

The PR will be marked ready for review once the new implementation is in place.

@ssncferreira ssncferreira marked this pull request as draft October 22, 2025 10:58
@ssncferreira ssncferreira marked this pull request as ready for review October 23, 2025 10:28
Comment thread coderd/database/dbauthz/dbauthz.go Outdated
func (q *querier) UpdatePrebuildProvisionerJobWithCancel(ctx context.Context, arg database.UpdatePrebuildProvisionerJobWithCancelParams) ([]uuid.UUID, error) {
// This is a system-only operation for canceling pending prebuild-related jobs
// when a new template version is promoted.
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceSystem); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we give subjectPrebuildsOrchestrator the ability to read/update provisioner jobs, we should be able to use rbac.ResourceProvisionerJob here.

Comment on lines +300 to +301
-- Cancels all pending provisioner jobs for prebuilt workspaces on a specific preset from an
-- inactive template version.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment is slightly misleading, the query doesn't check that the preset relates to an inactive template version. It's currently the responsibility of the caller to verify that @preset_id is not related to the current active version ID.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good catch! I've updated the query to filter out active template versions: 0497f8b

Comment thread coderd/prebuilds/preset_snapshot_test.go
Comment thread coderd/database/queries/prebuilds.sql
func TestCancelPendingPrebuilds(t *testing.T) {
t.Parallel()

for _, tt := range []struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, non-blocking: Since these tests all boil down to "only these jobs should be canceled and nothing else", I think we could collapse this to a single test case with all of the "should not cancel" data in one. The rationale for this is twofold:

  1. It reduces the number of test databases we have to create
  2. It gives us better assurance that only the jobs we care about are being canceled in the presence of more antagonist data.

@ssncferreira ssncferreira changed the title feat: cancel pending prebuilds on template publish feat: cancel pending prebuilds from non-active template versions Oct 23, 2025
Copy link
Copy Markdown
Contributor

@SasSwart SasSwart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit/question. Otherwise, looks good!

Comment thread coderd/database/queries/prebuilds.sql Outdated
@ssncferreira ssncferreira merged commit f6e86c6 into main Oct 24, 2025
25 checks passed
@ssncferreira ssncferreira deleted the ssncferreira/feat-cancel-pending-prebuilds branch October 24, 2025 14:27
@github-actions github-actions Bot locked and limited conversation to collaborators Oct 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cancel pending prebuild-related jobs from previous version

4 participants