Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion provisionerd/provisionerd_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,14 @@ func TestProvisionerd(t *testing.T) {
},
},
})
assert.NoError(t, err)
// The daemon is closed immediately after acquisition, which
// cancels the acquire RPC. dRPC can report context.Canceled
// from Send even after the job was successfully delivered, so
// tolerate it. If the job was never delivered, the test fails
// waiting on completeChan instead.
if !xerrors.Is(err, context.Canceled) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-1] Six sibling inline acquireJobWithCancel handlers use bare assert.NoError(t, err) after stream.Send without the context.Canceled tolerance:

  • RunningPeriodicUpdate (line 266)
  • Shutdown (line 744)
  • ShutdownFromJob (line 827)
  • ReconnectAndFail (line 930)
  • ReconnectAndComplete (line 1025)
  • UpdatesBeforeComplete (line 1124)

"The dRPC race is inherent to the transport layer. The acquireOne helper's own comment (line 1404) says it can happen 'in unit tests that complete,' not only when the daemon is closed immediately. CloseCancelsJob maximizes the race window by closing inside the parse/init callback, but any test whose cleanup cancels the stream while dRPC is flushing the Send response can hit it."

The fix correctly targets the one test with a documented flake. The other six have a much smaller race window because they don't close the daemon during acquisition. Shutdown and ShutdownFromJob are next-most-likely since they explicitly shut down during the test.

A small helper like assertSendNoError(t, err) that encapsulates the tolerance would eliminate the class rather than waiting for individual flake reports, but that is a separate scope decision. (Meruem P3, Mafu-san Note, Mafuuu Note, Leorio Note)

🤖

assert.NoError(t, err)
}
return nil
},
updateJob: noopUpdateJob,
Expand Down
Loading