feat: runtime user secrets injection into workspaces#24313
Conversation
3e48793 to
21f3120
Compare
Documentation CheckUpdates Needed
New Documentation Needed
Automated review via Coder Tasks |
|
Planning to omit public docs changes from this PR. The parallel |
johnstcn
left a comment
There was a problem hiding this comment.
🤖
Clean implementation — good separation of concerns, correct env var precedence chain, thorough unit tests for writeSecretFiles. The file permissions and the decision to never log secret values are done right.
2 P1s (secret leakage in debug/logs), 2 P2s, 2 P3s, 3 nits across 9 inline comments.
This review contains findings that may need attention before merge.
f3b05b6 to
6b154df
Compare
johnstcn
left a comment
There was a problem hiding this comment.
🤖
Thanks for addressing all 9 findings from round 1 — every resolution verified. Good fixes across the board.
Fresh review on the fixes themselves found 3 P2s, 3 P3s, and 1 observation across 7 inline comments. The top items are a perm-on-overwrite gap in WriteFile semantics that partially undercuts the 0o600 guarantee, and a synchronous-filesystem-blocks-network-init concern from moving WriteSecretFiles before manifestOK.complete(nil).
| a.logger.Warn(ctx, "failed to resolve home directory for secret files", slog.Error(err)) | ||
| } | ||
| WriteSecretFiles(ctx, a.logger, a.filesystem, homeDir, manifest.Secrets) | ||
| manifestOK.complete(nil) |
There was a problem hiding this comment.
P2 Moving WriteSecretFiles before manifestOK.complete(nil) puts synchronous filesystem I/O on the critical path for tailnet init, with no context or timeout (Edge Case Analyst P2, Go Architect Obs)
manifestOKgatescreateOrUpdateNetwork(line 1442), which in turn gates coordination, DERP subscription, and the stats loop (1186-1206). Before this PR, writes happened aftermanifestOK.complete, so a hung disk only affected file-based secrets. Now a hung or very slow filesystem — NFS/FUSE home mount, misconfigured overlay, disk full with sync writes — blocks the whole chain indefinitely.WriteSecretFilescallsfs.MkdirAllandafero.WriteFilewithout a context; per-secret errors are logged and skipped, but a syscall that never returns is not "an error".
Go Architect adds: the function takes
ctx context.Contextbut never checks it. If the agent'sstopCtxis cancelled mid-write (graceful shutdown racing with a slow filesystem), the loop keeps running. Aselect { case <-ctx.Done(): return; default: }at the top of each iteration would make the function respect cancellation without adding meaningful complexity.
The ordering fix is correct — the alternative (SSH sessions racing missing secret files) is worse. Two refinements worth considering: (a) bounded timeout / ctx check inside the loop so a hung FS can't block network init forever, or (b) detach the write into a goroutine that only the startup-script path waits on, preserving the guarantee the fix established without putting writes on the critical path. Common case is fine; the tail-latency case is the new blast radius.
There was a problem hiding this comment.
I revisited this. I reconsidered doing the file I/O before signalling manifestOK.complete. I don't think it makes much sense to block networking init for the secret file I/O considering some secret files may be large cert chains. So I moved the file I/O back to immediately after manifestOK.complete is signalled.
There will be a theoretical race where an SSH session that connects and reads a secret file before writes finish would see stale or missing content, but in practice SSH requires network init + coordination before any connection arrives, which should take far longer than file writes.
I have documented the assumption above. If the race proves to be a real issue, we can synchronize things better.
| ) | ||
| continue | ||
| } | ||
| filePath = filepath.Join(homeDir, filePath[2:]) |
There was a problem hiding this comment.
I believe that this join opens up an issue around secrets overwriting each other. File-path uniqueness is enforced on the raw stored string, but the agent normalizes before writing (~/ expansion plus filepath.Clean). So two accepted secrets such as ~/.aws/credentials and /home/coder/.aws/credentials collapse to the same on-disk path and one silently overwrites the other.
There was a problem hiding this comment.
Good callout. This is a bit challenging because we don't know the workspace home dir before it starts (not easily anyway). I added logging when this happens + test in this PR. It's something we'll need to document for the feature as well.
We could even warn about this in the UI if two paths look like potential conflicts after resolution, but we can cross that bridge when we get there.
Injects user secrets into workspace agents at runtime via the agent manifest. Secrets with an environment variable name are set as environment variables in every agent session and startup script. Secrets with a file path are written to disk before startup scripts run. - Fetch user secrets in GetManifest and convert to proto - Add WorkspaceSecret type and proto conversion helpers to agentsdk - Write secret files eagerly on manifest fetch (0600 perms, 0700 dirs) - Inject secret env vars per-session in updateCommandEnv - Expand ~/paths using caller-resolved home directory - Log file write errors without blocking workspace startup
cdb1ed5 to
27872c5
Compare
| if original, ok := seen[filePath]; ok { | ||
| // Known shortcoming: the winning secret is determined by the order | ||
| // of secrets in the manifest, which is currently alphabetical by | ||
| // secret name from ListUserSecretsWithValues. This ordering is not | ||
| // user-controllable and has no semantic meaning; users should avoid | ||
| // path collisions rather than rely on which secret wins. | ||
| logger.Warn(ctx, "multiple secrets resolve to the same file path; later secret in manifest order will win (not user-controllable)", | ||
| slog.F("resolved_path", filePath), | ||
| slog.F("first_file_path", original), | ||
| slog.F("conflicting_file_path", secret.FilePath), | ||
| ) | ||
| } | ||
| seen[filePath] = secret.FilePath |
There was a problem hiding this comment.
This is unfortunate. Ideally there would be some way of surfacing this to the user more visibly in the UI.
| if ok && mpSafe != nil { | ||
| mpSafe.Secrets = nil | ||
| } | ||
| a.logger.Critical(ctx, "failed to convert manifest", slog.F("manifest", mpSafe), slog.Error(err)) |
There was a problem hiding this comment.
We'll probably want to do another pass of any place we log the manifest.
Would it make sense to to mark this with debug_redact?
There was a problem hiding this comment.
Unfortunately debug_redact isn't supported in Go yet 😢
Rather than manually stripping out the secrets on every agent.Manifest.Load() call, I decided to strip them off as soon as the agent receives the manifest and to store the secrets separately from the manifest received by the agent. I think this ended up being a much better approach to prevent future accidental leakage.
27872c5 to
1d0a535
Compare

Injects user secrets into workspace agents at runtime via the agent manifest. Secrets with an environment variable name are set as environment variables in every agent session and startup script. Secrets with a file path are written to disk before startup scripts run.