Skip to content

fix: align autostart tests with persisted next_start_at#26037

Draft
zedkipp wants to merge 1 commit into
mainfrom
zedkipp/fix-autostart-next-start-at-race
Draft

fix: align autostart tests with persisted next_start_at#26037
zedkipp wants to merge 1 commit into
mainfrom
zedkipp/fix-autostart-next-start-at-race

Conversation

@zedkipp
Copy link
Copy Markdown
Contributor

@zedkipp zedkipp commented Jun 3, 2026

TestExecutorAutostartOK and its sibling positive autostart tests computed the autobuild tick from sched.Next(workspace.LatestBuild.CreatedAt). The server sets workspace.next_start_at from the build's completion time, and sched.Next is strictly monotonic, so when build creation and completion straddle the schedule's next fire time the test's tick is strictly less than the persisted next_start_at. The executor's eligibility query (next_start_at <= currentTick) then filters the workspace out and the test fails with an empty transitions map.

Add coderdtest.NextAutostartTick(t, workspace) which returns *workspace.NextStartAt with a non-nil assertion, and use it in the affected positive autostart paths in coderd/autobuild, coderd, and enterprise/coderd. Negative tests, mocked-clock prebuild paths, and paths that assert no transition are unchanged.

Generated with assistance from Coder Agents.

TestExecutorAutostartOK and its sibling positive autostart tests
computed the autobuild tick from sched.Next(workspace.LatestBuild.CreatedAt).
The server sets workspace.next_start_at from the build's completion time,
and sched.Next is strictly monotonic, so when build creation and completion
straddle the schedule's next fire time the test's tick is strictly less
than the persisted next_start_at. The executor's eligibility query
(next_start_at <= currentTick) then filters the workspace out and the
test fails with an empty transitions map.

Add coderdtest.NextAutostartTick(t, workspace) which returns
*workspace.NextStartAt with a non-nil assertion, and use it in the
affected positive autostart paths in coderd/autobuild, coderd, and
enterprise/coderd. Negative tests, mocked-clock prebuild paths, and
paths that assert no transition are unchanged.

Closes coder/internal#1456

Generated with assistance from Coder Agents.
@zedkipp
Copy link
Copy Markdown
Contributor Author

zedkipp commented Jun 4, 2026

/coder-agents-review

@coder-agents-review
Copy link
Copy Markdown
Contributor

coder-agents-review Bot commented Jun 4, 2026

Chat: Review in progress | View chat
Requested: 2026-06-04 16:44 UTC by @zedkipp

deep-review v0.7.1 | Round 1 | 167ac7b..387fc54

Last posted: Round 1, 4 findings (1 P3, 2 Nit, 1 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Open lifecycle_executor_test.go:1243 Missed sibling: TestExecutorRequireActiveVersion still uses old sched.Next pattern R1 Netero, Bisky P3, Mafu-san P2, Ryosuke P3, Kite P3 Yes
CRF-2 Nit Open coderdtest.go:1846 Doc comment on NextAutostartTick is verbose for a 3-line function R1 Gon P2 (downgraded: comment is correct and prevents race reintroduction; verbosity is style) Yes
CRF-3 Nit Open enterprise/coderd/workspaces_test.go:1520 Pre-existing typo "Kick of" should be "Kick off" R1 Gon Yes
CRF-4 Note Open coderdtest.go:1860 require.NotNil called from non-test goroutines at 6 call sites; follows existing pattern R1 Meruem Yes

Contested and acknowledged

(none)

Round log

Round 1

Panel. 1 P3, 2 Nit, 1 Note. Reviewed against 167ac7b..387fc54.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-targeted fix. The core insight is right: tests should consume the system's computed schedule state rather than recomputing it from a different base time. The NextAutostartTick helper is minimal, well-guarded, and the enterprise fix at line 1548 (capturing ws from MustTransitionWorkspace) shows the author traced the data dependency correctly. The root cause analysis in the commit message is precise.

As Ryosuke put it: "don't recompute what the system already computed."

1 P3, 2 Nit, 1 Note. The P3 is a missed sibling that's subject to the same race this PR fixes.


coderd/autobuild/lifecycle_executor_test.go:1243

P3 [CRF-1] TestExecutorRequireActiveVersion still uses the old sched.Next(ws.LatestBuild.CreatedAt) pattern in a positive autostart test. Line 1247 asserts require.Len(t, stats.Transitions, 1), it uses a real provisioner with no mocked clock, and it's subject to the same straddling race as the eight sites this PR fixes. When the stop build at line 1235 straddles the hour boundary, the executor filters the workspace out and the test fails with an empty transitions map.

Fix: tickTime := coderdtest.NextAutostartTick(t, ws). The workspace is refreshed from MustTransitionWorkspace at line 1235, so ws.NextStartAt is populated.

This is the gap between "fix siblings" and "fix the one I found."

(Netero P3, Bisky P3, Mafu-san P2, Ryosuke P3, Kite P3)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment on lines +1846 to +1857
// NextAutostartTick returns the value to send on the autobuild executor's
// ticker channel to trigger an autostart for the given workspace. The
// executor treats the received value as the current time, and its
// eligibility query passes when next_start_at <= currentTick. Returning the
// workspace's persisted NextStartAt guarantees that condition.
//
// Recomputing the tick from the latest build's CreatedAt is racy: the server
// sets next_start_at from the build's completion time, and sched.Next is
// strictly monotonic, so when build creation and completion straddle the
// schedule's next fire time the persisted value advances past
// sched.Next(build.CreatedAt) and the workspace is filtered out of the
// eligibility query.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-2] The doc comment is 12 lines for a 3-line function. The first paragraph restates what the code does ("Returning the workspace's persisted NextStartAt guarantees that condition" duplicates the return statement), and the second paragraph includes intermediate detail ("sched.Next is strictly monotonic") that the conclusion already implies.

Gon proposed a tighter 5-line version that preserves the trap documentation:

// NextAutostartTick returns workspace.NextStartAt for use as the autobuild
// tick. The executor's eligibility query checks next_start_at <= tick.
// Computing from build.CreatedAt is racy: next_start_at derives from build
// completion time, so it can advance past sched.Next(build.CreatedAt) and
// the workspace misses the eligibility window.

Leorio disagrees, arguing the length is proportionate to the subtlety. Downgraded from P2; the comment is correct and prevents the race from being reintroduced.

(Gon P2, Leorio dissent)

🤖

@@ -1518,7 +1518,7 @@ func TestWorkspaceAutobuild(t *testing.T) {
require.NoError(t, err)

// Kick of an autostart build.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-3] "Kick of" should be "Kick off". Pre-existing typo, one-character fix. (Gon)

🤖

// eligibility query.
func NextAutostartTick(t testing.TB, workspace codersdk.Workspace) time.Time {
t.Helper()
require.NotNil(t, workspace.NextStartAt,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-4] require.NotNil calls t.FailNow(), which invokes runtime.Goexit(). Six call sites launch NextAutostartTick inside a go func(). If NextStartAt were nil, the goroutine would exit without sending on tickCh, hanging the test goroutine on <-statsCh until timeout rather than failing with the assertion message.

This follows the existing pattern (UpdateProvisionerLastSeenAt already calls require.NoError from these same goroutines), and NextStartAt is nil only when test setup is broken. The consequence is a confusing timeout message rather than incorrect behavior.

(Meruem)

🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant