Skip to content

feat: runtime user secrets injection into workspaces#24313

Merged
zedkipp merged 2 commits into
mainfrom
zedkipp/plat-75-runtime-secrets-injection
Apr 17, 2026
Merged

feat: runtime user secrets injection into workspaces#24313
zedkipp merged 2 commits into
mainfrom
zedkipp/plat-75-runtime-secrets-injection

Conversation

@zedkipp
Copy link
Copy Markdown
Contributor

@zedkipp zedkipp commented Apr 13, 2026

Injects user secrets into workspace agents at runtime via the agent manifest. Secrets with an environment variable name are set as environment variables in every agent session and startup script. Secrets with a file path are written to disk before startup scripts run.

  • Fetch user secrets in GetManifest and convert to proto
  • Add WorkspaceSecret type and proto conversion helpers to agentsdk
  • Write secret files eagerly on manifest fetch (0600 perms, 0700 dirs)
  • Inject secret env vars per-session in updateCommandEnv
  • Expand ~/paths using caller-resolved home directory
  • Log file write errors without blocking workspace startup

@zedkipp zedkipp changed the title feat(agent): runtime user secrets injection into workspaces feat: runtime user secrets injection into workspaces Apr 13, 2026
@zedkipp zedkipp force-pushed the zedkipp/plat-75-runtime-secrets-injection branch 4 times, most recently from 3e48793 to 21f3120 Compare April 13, 2026 20:50
@zedkipp zedkipp requested review from dylanhuff-at-coder, f0ssel and johnstcn and removed request for dylanhuff-at-coder April 13, 2026 21:23
@zedkipp zedkipp marked this pull request as ready for review April 13, 2026 21:25
@coder-tasks
Copy link
Copy Markdown
Contributor

coder-tasks Bot commented Apr 13, 2026

Documentation Check

Updates Needed

  • docs/admin/security/secrets.md - Add a new "User Secrets" section documenting the runtime injection feature: how user secrets are stored per-user (via API at /users/{user}/secrets), how to configure env_name for environment variable injection and file_path for file injection, the ~/ path expansion behavior, and the precedence rule (user secrets override template-defined env vars but are overridden by agent bootstrap vars like CODER_AGENT_TOKEN).

New Documentation Needed

  • docs/admin/security/secrets.md (or a new dedicated page) - The user secrets CRUD API has existed for a while, but this PR activates runtime injection into workspace agents. Users now have a first-class way to store personal secrets (tokens, credentials) that are automatically injected into every workspace as env vars or files — this behavior deserves a dedicated explanation separate from the existing "Dynamic Secrets" (Terraform-based) and "SSH Keys" sections.

Automated review via Coder Tasks

@zedkipp
Copy link
Copy Markdown
Contributor Author

zedkipp commented Apr 13, 2026

Planning to omit public docs changes from this PR. The parallel coder secrets CLI PR starts some of the suggested docs. We'll add comprehensive docs in the near future.

Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖

Clean implementation — good separation of concerns, correct env var precedence chain, thorough unit tests for writeSecretFiles. The file permissions and the decision to never log secret values are done right.

2 P1s (secret leakage in debug/logs), 2 P2s, 2 P3s, 3 nits across 9 inline comments.

This review contains findings that may need attention before merge.

Comment thread agent/agent.go
Comment thread agent/agent.go Outdated
Comment thread agent/agent.go
Comment thread agent/agent.go
Comment thread coderd/agentapi/manifest_test.go
Comment thread agent/write_secret_files_test.go
Comment thread codersdk/agentsdk/convert.go Outdated
Comment thread coderd/agentapi/manifest.go Outdated
Comment thread agent/agent.go Outdated
@zedkipp zedkipp force-pushed the zedkipp/plat-75-runtime-secrets-injection branch 4 times, most recently from f3b05b6 to 6b154df Compare April 16, 2026 01:16
Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖

Thanks for addressing all 9 findings from round 1 — every resolution verified. Good fixes across the board.

Fresh review on the fixes themselves found 3 P2s, 3 P3s, and 1 observation across 7 inline comments. The top items are a perm-on-overwrite gap in WriteFile semantics that partially undercuts the 0o600 guarantee, and a synchronous-filesystem-blocks-network-init concern from moving WriteSecretFiles before manifestOK.complete(nil).

Comment thread agent/agent.go
Comment thread agent/agent.go Outdated
a.logger.Warn(ctx, "failed to resolve home directory for secret files", slog.Error(err))
}
WriteSecretFiles(ctx, a.logger, a.filesystem, homeDir, manifest.Secrets)
manifestOK.complete(nil)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Moving WriteSecretFiles before manifestOK.complete(nil) puts synchronous filesystem I/O on the critical path for tailnet init, with no context or timeout (Edge Case Analyst P2, Go Architect Obs)

manifestOK gates createOrUpdateNetwork (line 1442), which in turn gates coordination, DERP subscription, and the stats loop (1186-1206). Before this PR, writes happened after manifestOK.complete, so a hung disk only affected file-based secrets. Now a hung or very slow filesystem — NFS/FUSE home mount, misconfigured overlay, disk full with sync writes — blocks the whole chain indefinitely. WriteSecretFiles calls fs.MkdirAll and afero.WriteFile without a context; per-secret errors are logged and skipped, but a syscall that never returns is not "an error".

Go Architect adds: the function takes ctx context.Context but never checks it. If the agent's stopCtx is cancelled mid-write (graceful shutdown racing with a slow filesystem), the loop keeps running. A select { case <-ctx.Done(): return; default: } at the top of each iteration would make the function respect cancellation without adding meaningful complexity.

The ordering fix is correct — the alternative (SSH sessions racing missing secret files) is worse. Two refinements worth considering: (a) bounded timeout / ctx check inside the loop so a hung FS can't block network init forever, or (b) detach the write into a goroutine that only the startup-script path waits on, preserving the guarantee the fix established without putting writes on the critical path. Common case is fine; the tail-latency case is the new blast radius.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revisited this. I reconsidered doing the file I/O before signalling manifestOK.complete. I don't think it makes much sense to block networking init for the secret file I/O considering some secret files may be large cert chains. So I moved the file I/O back to immediately after manifestOK.complete is signalled.

There will be a theoretical race where an SSH session that connects and reads a secret file before writes finish would see stale or missing content, but in practice SSH requires network init + coordination before any connection arrives, which should take far longer than file writes.

I have documented the assumption above. If the race proves to be a real issue, we can synchronize things better.

Comment thread agent/agent.go Outdated
Comment thread agent/agent.go Outdated
Comment thread agent/agent_test.go
Comment thread agent/write_secret_files_test.go
Comment thread agent/agent.go Outdated
Comment thread codersdk/agentsdk/convert_test.go
Comment thread agent/agent.go
)
continue
}
filePath = filepath.Join(homeDir, filePath[2:])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this join opens up an issue around secrets overwriting each other. File-path uniqueness is enforced on the raw stored string, but the agent normalizes before writing (~/ expansion plus filepath.Clean). So two accepted secrets such as ~/.aws/credentials and /home/coder/.aws/credentials collapse to the same on-disk path and one silently overwrites the other.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. This is a bit challenging because we don't know the workspace home dir before it starts (not easily anyway). I added logging when this happens + test in this PR. It's something we'll need to document for the feature as well.

We could even warn about this in the UI if two paths look like potential conflicts after resolution, but we can cross that bridge when we get there.

Comment thread agent/agent.go
Injects user secrets into workspace agents at runtime via the agent
manifest. Secrets with an environment variable name are set as
environment variables in every agent session and startup script.
Secrets with a file path are written to disk before startup scripts
run.

 - Fetch user secrets in GetManifest and convert to proto
 - Add WorkspaceSecret type and proto conversion helpers to agentsdk
 - Write secret files eagerly on manifest fetch (0600 perms, 0700 dirs)
 - Inject secret env vars per-session in updateCommandEnv
 - Expand ~/paths using caller-resolved home directory
 - Log file write errors without blocking workspace startup
@zedkipp zedkipp force-pushed the zedkipp/plat-75-runtime-secrets-injection branch 3 times, most recently from cdb1ed5 to 27872c5 Compare April 16, 2026 21:17
Comment thread agent/agent.go
Comment on lines +1636 to +1648
if original, ok := seen[filePath]; ok {
// Known shortcoming: the winning secret is determined by the order
// of secrets in the manifest, which is currently alphabetical by
// secret name from ListUserSecretsWithValues. This ordering is not
// user-controllable and has no semantic meaning; users should avoid
// path collisions rather than rely on which secret wins.
logger.Warn(ctx, "multiple secrets resolve to the same file path; later secret in manifest order will win (not user-controllable)",
slog.F("resolved_path", filePath),
slog.F("first_file_path", original),
slog.F("conflicting_file_path", secret.FilePath),
)
}
seen[filePath] = secret.FilePath
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate. Ideally there would be some way of surfacing this to the user more visibly in the UI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We are planning surface these conflicts alongside what secrets were injected and where. Something like this:
Screenshot 2026-04-17 at 11 38 41 AM

I'm planning to tackle this in a subsequent PR.

Comment thread agent/agent.go Outdated
if ok && mpSafe != nil {
mpSafe.Secrets = nil
}
a.logger.Critical(ctx, "failed to convert manifest", slog.F("manifest", mpSafe), slog.Error(err))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably want to do another pass of any place we log the manifest.
Would it make sense to to mark this with debug_redact?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately debug_redact isn't supported in Go yet 😢

Rather than manually stripping out the secrets on every agent.Manifest.Load() call, I decided to strip them off as soon as the agent receives the manifest and to store the secrets separately from the manifest received by the agent. I think this ended up being a much better approach to prevent future accidental leakage.

@zedkipp zedkipp force-pushed the zedkipp/plat-75-runtime-secrets-injection branch from 27872c5 to 1d0a535 Compare April 17, 2026 21:21
@zedkipp zedkipp merged commit 72f35e1 into main Apr 17, 2026
26 checks passed
@zedkipp zedkipp deleted the zedkipp/plat-75-runtime-secrets-injection branch April 17, 2026 22:55
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants