fix: align autostart tests with persisted next_start_at#26037
Conversation
TestExecutorAutostartOK and its sibling positive autostart tests computed the autobuild tick from sched.Next(workspace.LatestBuild.CreatedAt). The server sets workspace.next_start_at from the build's completion time, and sched.Next is strictly monotonic, so when build creation and completion straddle the schedule's next fire time the test's tick is strictly less than the persisted next_start_at. The executor's eligibility query (next_start_at <= currentTick) then filters the workspace out and the test fails with an empty transitions map. Add coderdtest.NextAutostartTick(t, workspace) which returns *workspace.NextStartAt with a non-nil assertion, and use it in the affected positive autostart paths in coderd/autobuild, coderd, and enterprise/coderd. Negative tests, mocked-clock prebuild paths, and paths that assert no transition are unchanged. Closes coder/internal#1456 Generated with assistance from Coder Agents.
|
/coder-agents-review |
|
Chat: Review in progress | View chat deep-review v0.7.1 | Round 1 | Last posted: Round 1, 4 findings (1 P3, 2 Nit, 1 Note), COMMENT. Review Finding inventoryFindings
Contested and acknowledged(none) Round logRound 1Panel. 1 P3, 2 Nit, 1 Note. Reviewed against 167ac7b..387fc54. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Clean, well-targeted fix. The core insight is right: tests should consume the system's computed schedule state rather than recomputing it from a different base time. The NextAutostartTick helper is minimal, well-guarded, and the enterprise fix at line 1548 (capturing ws from MustTransitionWorkspace) shows the author traced the data dependency correctly. The root cause analysis in the commit message is precise.
As Ryosuke put it: "don't recompute what the system already computed."
1 P3, 2 Nit, 1 Note. The P3 is a missed sibling that's subject to the same race this PR fixes.
coderd/autobuild/lifecycle_executor_test.go:1243
P3 [CRF-1] TestExecutorRequireActiveVersion still uses the old sched.Next(ws.LatestBuild.CreatedAt) pattern in a positive autostart test. Line 1247 asserts require.Len(t, stats.Transitions, 1), it uses a real provisioner with no mocked clock, and it's subject to the same straddling race as the eight sites this PR fixes. When the stop build at line 1235 straddles the hour boundary, the executor filters the workspace out and the test fails with an empty transitions map.
Fix: tickTime := coderdtest.NextAutostartTick(t, ws). The workspace is refreshed from MustTransitionWorkspace at line 1235, so ws.NextStartAt is populated.
This is the gap between "fix siblings" and "fix the one I found."
(Netero P3, Bisky P3, Mafu-san P2, Ryosuke P3, Kite P3)
🤖
🤖 This review was automatically generated with Coder Agents.
| // NextAutostartTick returns the value to send on the autobuild executor's | ||
| // ticker channel to trigger an autostart for the given workspace. The | ||
| // executor treats the received value as the current time, and its | ||
| // eligibility query passes when next_start_at <= currentTick. Returning the | ||
| // workspace's persisted NextStartAt guarantees that condition. | ||
| // | ||
| // Recomputing the tick from the latest build's CreatedAt is racy: the server | ||
| // sets next_start_at from the build's completion time, and sched.Next is | ||
| // strictly monotonic, so when build creation and completion straddle the | ||
| // schedule's next fire time the persisted value advances past | ||
| // sched.Next(build.CreatedAt) and the workspace is filtered out of the | ||
| // eligibility query. |
There was a problem hiding this comment.
Nit [CRF-2] The doc comment is 12 lines for a 3-line function. The first paragraph restates what the code does ("Returning the workspace's persisted NextStartAt guarantees that condition" duplicates the return statement), and the second paragraph includes intermediate detail ("sched.Next is strictly monotonic") that the conclusion already implies.
Gon proposed a tighter 5-line version that preserves the trap documentation:
// NextAutostartTick returns workspace.NextStartAt for use as the autobuild
// tick. The executor's eligibility query checks next_start_at <= tick.
// Computing from build.CreatedAt is racy: next_start_at derives from build
// completion time, so it can advance past sched.Next(build.CreatedAt) and
// the workspace misses the eligibility window.Leorio disagrees, arguing the length is proportionate to the subtlety. Downgraded from P2; the comment is correct and prevents the race from being reintroduced.
(Gon P2, Leorio dissent)
🤖
| @@ -1518,7 +1518,7 @@ func TestWorkspaceAutobuild(t *testing.T) { | |||
| require.NoError(t, err) | |||
|
|
|||
| // Kick of an autostart build. | |||
There was a problem hiding this comment.
Nit [CRF-3] "Kick of" should be "Kick off". Pre-existing typo, one-character fix. (Gon)
🤖
| // eligibility query. | ||
| func NextAutostartTick(t testing.TB, workspace codersdk.Workspace) time.Time { | ||
| t.Helper() | ||
| require.NotNil(t, workspace.NextStartAt, |
There was a problem hiding this comment.
Note [CRF-4] require.NotNil calls t.FailNow(), which invokes runtime.Goexit(). Six call sites launch NextAutostartTick inside a go func(). If NextStartAt were nil, the goroutine would exit without sending on tickCh, hanging the test goroutine on <-statsCh until timeout rather than failing with the assertion message.
This follows the existing pattern (UpdateProvisionerLastSeenAt already calls require.NoError from these same goroutines), and NextStartAt is nil only when test setup is broken. The consequence is a confusing timeout message rather than incorrect behavior.
(Meruem)
🤖
TestExecutorAutostartOK and its sibling positive autostart tests computed the autobuild tick from sched.Next(workspace.LatestBuild.CreatedAt). The server sets workspace.next_start_at from the build's completion time, and sched.Next is strictly monotonic, so when build creation and completion straddle the schedule's next fire time the test's tick is strictly less than the persisted next_start_at. The executor's eligibility query (next_start_at <= currentTick) then filters the workspace out and the test fails with an empty transitions map.
Add coderdtest.NextAutostartTick(t, workspace) which returns *workspace.NextStartAt with a non-nil assertion, and use it in the affected positive autostart paths in coderd/autobuild, coderd, and enterprise/coderd. Negative tests, mocked-clock prebuild paths, and paths that assert no transition are unchanged.
Generated with assistance from Coder Agents.