feat(cli): Add support for delay_login_until_ready#5851
Conversation
| cliflag.BoolVarP(cmd.Flags(), &forwardGPG, "forward-gpg", "G", "CODER_SSH_FORWARD_GPG", false, "Specifies whether to forward the GPG agent. Unsupported on Windows workspaces, but supports all clients. Requires gnupg (gpg, gpgconf) on both the client and workspace. The GPG agent must already be running locally and will not be started for you. If a GPG agent is already running in the workspace, it will be attempted to be killed.") | ||
| cliflag.StringVarP(cmd.Flags(), &identityAgent, "identity-agent", "", "CODER_SSH_IDENTITY_AGENT", "", "Specifies which identity agent to use (overrides $SSH_AUTH_SOCK), forward agent must also be enabled") | ||
| cliflag.DurationVarP(cmd.Flags(), &wsPollInterval, "workspace-poll-interval", "", "CODER_WORKSPACE_POLL_INTERVAL", workspacePollInterval, "Specifies how often to poll for workspace automated shutdown.") | ||
| cliflag.BoolVarP(cmd.Flags(), &skipDelayLoginUntilReady, "skip-delay-login-until-ready", "", "CODER_SSH_SKIP_DELAY_LOGIN_UNTIL_READY", false, "Specifies whether to login to a workspace that has not finished starting up (only applicable when the delay login until ready option is enabled).") |
There was a problem hiding this comment.
Should we go with something simpler, shorter here? Maybe --no-delay-login, --no-wait. I'd like to leave room for the opposite flag as well --[no-]wait to allow waiting even when the template doesn't specify it.
| } | ||
| case codersdk.WorkspaceAgentLifecycleReady: | ||
| return nil | ||
| default: |
There was a problem hiding this comment.
Yay or nay?
| default: | |
| case codersdk.WorkspaceAgentLifecycleStartError: | |
| showMessage() | |
| return errors.New("...") | |
| default: |
There was a problem hiding this comment.
I think we should fail-open here. Let folks try to connect, but warn that it might not work.
There was a problem hiding this comment.
Yup, I agree. Let's make it so!
There was a problem hiding this comment.
Done, this is what it looks like now:
❯ coder ssh test
> The workspace agent ran into a problem during startup.
See troubleshooting instructions at:
https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates
Agent startup script exited with non-zero status, use --no-wait to login anyway.
Run 'coder ssh --help' for usage.
And with --no-wait we still warn about the problem:
❯ coder ssh test --no-wait
> The workspace agent ran into a problem during startup.
See troubleshooting instructions at:
https://coder.com/docs/coder-oss/latest/templates#troubleshooting-templates
mafredri@test ~
❯
|
|
||
| func waitingMessage(agent codersdk.WorkspaceAgent) (m *message) { | ||
| m = &message{ | ||
| Prompt: "Don't panic, your workspace is booting up!", |
There was a problem hiding this comment.
Can we add directions on how to bypass this e.g.
To skip this check, use the `--no-wait` argument
(Or whatever we end up calling it)
There was a problem hiding this comment.
Not for this message (this state is before agent is connected so we can't connect yet).
We could add it for the other message, but I wonder if it would encourage bypassing the prompt, esp. without understanding what the flag does. You just go Ctrl+C -> --no-wait yay I'm in (and not everything is working). Would it suffice to explain this in the troubleshooting link?
I think this need will also be alleviated by streaming the startup log once we have it available.
There was a problem hiding this comment.
Would it suffice to explain this in the troubleshooting link?
That's fine either
johnstcn
left a comment
There was a problem hiding this comment.
I think this is definitely a good step in troubleshooting failed workspace builds.
One possible avenue of future work I see is adding WorkspaceAgentDownloading / WorkspaceAgentDownloaded states. We could add a random pre-shared nonce (even the build ID could work here) to the workspace agent downloaded URL and use that to determine if the workspace attempted to download the agent.
|
Updated messaging: |
| m.Prompt = "The workspace is taking longer than expected to get ready, the agent startup script is still executing." | ||
| case codersdk.WorkspaceAgentLifecycleStartError: | ||
| m.Spin = "" | ||
| m.Prompt = "The workspace ran into a problem while getting ready, the agent startup script exited with non-zero status." |
There was a problem hiding this comment.
I like this copy! Once we start streaming build logs, I imagine we can also link to view the logs + the troubleshooting URL?
There was a problem hiding this comment.
Yes definitely! We can also stream the log in the terminal while the user is waiting, like 5 latest rows (keeps updating). Maybe that'd enabled/disabled via flag.
This PR adds support to
sshandcoder sshfordelay_login_until_ready.Ref: #5749
Here are examples of when
coder_agent.delay_login_until_ready = true(for false, behavior is as before, sans formatting changes):On 30+s start,
start_timeout=>start_error:Ultimately in this case, the user can log in withcoder ssh --skip-delay-login-until-ready.We might want to change behavior ofstart_errorto exit with non-zero status instead blocking.