Skip to content

fix(coderd/coderdtest/oidctest): scope IDP NotFound errors to IDP paths#25876

Draft
mafredri wants to merge 1 commit into
mainfrom
fix/flake-oidc-fullname-claims
Draft

fix(coderd/coderdtest/oidctest): scope IDP NotFound errors to IDP paths#25876
mafredri wants to merge 1 commit into
mainfrom
fix/flake-oidc-fullname-claims

Conversation

@mafredri
Copy link
Copy Markdown
Member

Problem

TestUserOIDC/FullNameFromClaims flakes in CI (test-go-pg-17) when the full test suite runs in parallel. The test itself passes (OIDC flow completes, user created correctly), but the FakeIDP's mux.NotFound handler calls t.Errorf when it receives a request at a non-IDP path like /api/v2/organizations/{uuid}/provisionerdaemons/serve.

The same class of bug caused #13788 (/derp path) and coder/internal#280 (/derp path).

Root Cause

The mux.NotFound handler in idp.go calls t.Errorf unconditionally for any unrecognized path. When the IDP runs as a real HTTP server (WithServing()), OS port reuse can route stale connections from other test processes to the IDP's port. The t.Errorf then fails the IDP's owning test for a request that did not originate from it.

Coverage confirms the handler body is cov0 during correct test execution (never reached by any legitimate IDP request):

<span class="cov8">mux.NotFound(func(... r *http.Request) </span>
<span class="cov0">{
    f.logger.Error(...)
    t.Errorf("unexpected request to IDP at path %q. Not supported", ...)
}</span>)

37 tests use WithServing(), all vulnerable.

Fix

Check whether the request path starts with a known IDP route prefix (/oauth2/, /.well-known/, /login/, /external-auth-validate/). These match all routes registered in httpHandler.

  • IDP paths: t.Errorf + logger.Error (preserves real protocol violation detection)
  • Non-IDP paths: t.Logf + logger.Warn (visible in output but does not fail the test)

This also addresses the slogtest failure vector (line 1417 logger.Error downgraded to Warn for non-IDP paths), fixing the class seen in #13788 and coder/internal#280 where WithLogging(t, nil) caused additional test failures.

Debugger evidence

Two-breakpoint dlv session (BP1 at provisionerd.go:237, BP2 at idp.go:1418) on the regression test:

Stop 1 (provisionerd retry loop):

> [Breakpoint 1] provisionerd.(*Server).connect()
    ./provisionerd/provisionerd.go:237

(dlv) print err
error(*codersdk.Error) *{
  statusCode: 404,
  url: "http://localhost:41629/api/v2/organizations/0b75b936-2bf7-4fb7-9.../provisionerdaemons/serve",
}

Stop 2 (IDP NotFound handler):

> [Breakpoint 2] FakeIDP.httpHandler.func11()
    ./coderd/coderdtest/oidctest/idp.go:1418

(dlv) print r.URL.Path
"/api/v2/organizations/0b75b936-2bf7-4fb7-9d84-fc33d5c380b4/provi...+19 more"
(dlv) print r.Host
"localhost:41629"

Stack:
0  FakeIDP.httpHandler.func11        idp.go:1418
1  net/http.HandlerFunc.ServeHTTP    server.go:2286
2  chi.(*Mux).routeHTTP              mux.go:485       <-- chi: no matching route
7  net/http.(*conn).serve            server.go:2073   <-- IDP HTTP server

Goroutines:
  Goroutine 174 - net/http.(*persistConn).roundTrip       <-- provisionerd HTTP client
  Goroutine 175 - provisionerd.(*Server).client [select]  <-- provisionerd connect loop
* Goroutine 110 - FakeIDP.httpHandler.func11              <-- NotFound handler (breakpoint)

Same port (41629), same org UUID (0b75b936), proving the causal chain: provisionerd retry -> IDP port -> chi no-match -> NotFound -> t.Errorf.

🤖 Generated by Coder Agents on behalf of @mafredri. This PR will be reviewed by a human. 🏂🏻

The FakeIDP mux.NotFound handler called t.Errorf for any unrecognized
HTTP request path. When the IDP runs as a real HTTP server
(WithServing), OS port reuse can route stale connections from other
test processes (provisionerd reconnects, DERP) to the IDP port. The
t.Errorf then fails the IDP owning test for a request that did not
originate from it.

Check whether the request path starts with a known IDP route prefix
(/oauth2/, /.well-known/, /login/, /external-auth-validate/). IDP
paths still get t.Errorf and logger.Error (real protocol violations).
Non-IDP paths get t.Logf and logger.Warn (cross-test contamination,
visible in logs but does not fail the test).

Add a regression test that uses provisionerd.New with
codersdk.Client.ServeProvisionerDaemon as the dialer to reproduce the
port-reuse mechanism through real production code paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant