feat(agent/agentcontainers): recreate devcontainers concurrently#18042
Conversation
d17bca2 to
0ebc262
Compare
This change introduces a refactor of the devcontainers recreation logic which is now handled asynchronously rather than being request scoped. The response was consequently changed from "No Content" to "Accepted" to reflect this. A new `Status` field was introduced to the devcontainer struct which replaces `Running` (bool). This reflects that the devcontainer can now be in various states (starting, running, stopped or errored). The status field also protects against multiple concurrent recrations, as long as they are initiated via the API. Updates #16424
0ebc262 to
9328228
Compare
| // @Param workspaceagent path string true "Workspace agent ID" format(uuid) | ||
| // @Param container path string true "Container ID or name" | ||
| // @Success 204 | ||
| // @Success 202 {object} codersdk.Response |
There was a problem hiding this comment.
Note: the response code change here does not appear to affect any endpoints currently in use by the FE.
| err = json.NewDecoder(rec.Body).Decode(&resp) | ||
| require.NoError(t, err, "unmarshal response failed after recreation") | ||
| require.Len(t, resp.Devcontainers, 1, "expected one devcontainer in response after recreation") | ||
| assert.Equal(t, codersdk.WorkspaceAgentDevcontainerStatusRunning, resp.Devcontainers[0].Status, "devcontainer is not stopped after recreation") |
There was a problem hiding this comment.
nit: this could also be a require?
| err = json.NewDecoder(rec.Body).Decode(&resp) | ||
| require.NoError(t, err, "unmarshal response failed after error") | ||
| require.Len(t, resp.Devcontainers, 1, "expected one devcontainer in response after error") | ||
| assert.Equal(t, codersdk.WorkspaceAgentDevcontainerStatusError, resp.Devcontainers[0].Status, "devcontainer is not in an error state after up failure") |
There was a problem hiding this comment.
nit: this could also be a require?
There was a problem hiding this comment.
To what benefit? Assert is useful in that it can allow multiple conditions to fail and give you a larger picture of what's wrong. Require is ofc needed whenever you need that condition to be true, like nil and lengths checks so that the rest of the code doesn't panic.
There was a problem hiding this comment.
My nit is very much stylistic here; having an assert on the last line below one or more require doesn't help us against panics or any of the above you mentioned. Feel free to ignore!
| // The devcontainer state must be set to starting and the recreateWg must be | ||
| // incremented before calling this function. | ||
| func (api *API) recreateDevcontainer(dc codersdk.WorkspaceAgentDevcontainer, configPath string) { | ||
| defer api.recreateWg.Done() |
There was a problem hiding this comment.
I'm nervous that a hanging request to recreateDevcontainer could make closing the devcontainers API hang on api.wg.Wait(). I wonder if it makes sense to have a 'graceful' shutdown context and a 'hard' shutdown context that will forcefully cancel all in-flight recreation waitgroups after a certain period of time elapses?
There was a problem hiding this comment.
This uses api.ctx which is essentially that "hard" shutdown context, there isn't any reason to have a graceful one AFAICT.
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.
A new
Statusfield was introduced to the devcontainer struct whichreplaces
Running(bool). This reflects that the devcontainer can nowbe in various states (starting, running, stopped or errored).
The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.
Updates #16424