test(agent): de-flake TestAgent_Session_EnvironmentVariables#26638
Draft
dylanhuff-at-coder wants to merge 2 commits into
Draft
test(agent): de-flake TestAgent_Session_EnvironmentVariables#26638dylanhuff-at-coder wants to merge 2 commits into
dylanhuff-at-coder wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
De-flakes
TestAgent_Session_EnvironmentVariables, which failed only on Windows CI with every subtest erroring at 0.00s with "EOF".Root cause
The test drove one long-lived shell (
cmd.exe/sh) through a single shared stdin/stdout and a single sharedbufio.Scanneracross every subtest. On Windows the shell's channel half-closes early; the next subtest's stdin write then returnsio.EOFand fails immediately, cascading to all subtests. The old scanner guard also let a clean EOF pass without ever asserting the value was found.Fix
Give each variable its own short-lived
ssh.Sessionand read it withsession.Output, re-applying the two session-scoped vars viaSetenv, and assert each value withrequire.Contains. This removes the shared mutable state that caused the cascade and closes the silent false-negative, matching the one-shot pattern already used by the sibling env-var tests.TestAgent_Session_SecretInjectionshared the same anti-pattern and gets the identical refactor. Test-only; no production behavior changes.Closes https://linear.app/codercom/issue/PLAT-310/flake-testagent-session-environmentvariables
Diagnosis and decisions
bufio.Scanner.Err()maps a cleanio.EOFto nil and the ssh read path returns only theio.EOFsentinel, so the scan guard could never emit it. The real failure was the unguarded stdin write inechoEnvhitting a dead shared channel (line 433 at the failing commit550aa6d).updateCommandEnv), so fresh sessions are safe. OnlyMY_SESSIONandMY_SESSION_MANIFESTneed a per-sessionSetenv, andMY_SESSION_MANIFESTstill assertsfalseto prove manifest-overrides-session.-count=100and-race, andmake pre-commitis clean. The Windowscmd.exebranch mirrors the already-passing sibling tests, sotest-go-pg windows-2022is the real proof of the de-flake.cmd.exefallback (usershell_windows.goComSpec) and the broader CI PATH/mise work are tracked separately.