Fix mount manager activation error when already exists#13126
Fix mount manager activation error when already exists#13126dmcgowan wants to merge 1 commit intocontainerd:mainfrom
Conversation
| cur, err := sbkt.CreateBucket([]byte{byte(i)}) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Is there a guaranteed upper bound on the number of system mounts? Casting the loop index to a single byte truncates the value, which limits the maximum number of system mounts to 256 before silently wrapping around to zero and causing CreateBucket to fail. Furthermore, since BoltDB iterates keys lexicographically, a naive integer-to-string conversion would alter the expected read sequence (e.g., "10" sorts before "2"), potentially disrupting mount layering order. It might be safer to format the index as a zero-padded string to preserve both unbounded capacity and the correct iteration sequence.
| cur, err := sbkt.CreateBucket([]byte{byte(i)}) | |
| if err != nil { | |
| return err | |
| } | |
| cur, err := sbkt.CreateBucket([]byte(fmt.Sprintf("%04x", i))) | |
| if err != nil { | |
| return err | |
| } |
| if len(m.Options) > 0 { | ||
| if err := bkt.Put(bucketKeyOptions, []byte(strings.Join(m.Options, ","))); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Are there any scenarios where a system mount option might natively contain a comma (e.g., a complex assignment like lowerdir=a,b)? If an option contains a comma, joining and splitting by , will incorrectly divide it into disjoint slice elements. Since mount options are passed as C-strings to the kernel and cannot contain null characters, it is generally safer to use a null byte \x00 as the slice delimiter to perfectly preserve the original array boundaries.
| if len(m.Options) > 0 { | |
| if err := bkt.Put(bucketKeyOptions, []byte(strings.Join(m.Options, ","))); err != nil { | |
| return err | |
| } | |
| if len(m.Options) > 0 { | |
| if err := bkt.Put(bucketKeyOptions, []byte(strings.Join(m.Options, "\x00"))); err != nil { | |
| return err | |
| } | |
| } |
| if v := bkt.Get(bucketKeyOptions); len(v) > 0 { | ||
| m.Options = strings.Split(string(v), ",") | ||
| } |
There was a problem hiding this comment.
If the delimiter in putSystemMount is updated to prevent boundary collisions, this reader will also need to be updated to match the new delimiter. This ensures that any option strings containing commas are preserved accurately.
| if v := bkt.Get(bucketKeyOptions); len(v) > 0 { | |
| m.Options = strings.Split(string(v), ",") | |
| } | |
| if v := bkt.Get(bucketKeyOptions); len(v) > 0 { | |
| m.Options = strings.Split(string(v), "\x00") | |
| } |
|
beep boop Please treat my comments as suggestions and advisory, not mandates. Since I am an LLM and my output is used here directly, you should evaluate carefully and see if my suggestions make sense. Feel free to directly resolve any you disagree with. This is not an automated action. This bot does not read or reply to comments. |
cd4569f to
ae95987
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves mount-manager activation behavior when an activation record for the same name already exists, allowing callers (notably the v2 task manager) to reuse a fully-active mount and to recover from stale/incomplete activation records left behind by crashes.
Changes:
- Update v2 task creation to handle
ErrAlreadyExistsfrom mount activation by querying mountInfo()and continuing with existing mounts. - Make mount manager
Activate()differentiate “fully active” vs “stale/incomplete” existing records, reclaim stale records, and serialize same-name activations to avoid races. - Persist and reload “system mounts” in the mount manager so
Info()can return complete activation info; add comprehensive tests for these scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| core/runtime/v2/task_manager.go | Reuses existing mount activation on ErrAlreadyExists by calling m.mounts.Info() and updating opts.Rootfs. |
| core/mount/manager/manager.go | Adds keyed per-name activation locking; distinguishes active vs stale records; cleans stale targets; persists/reads system mounts. |
| core/mount/manager/manager_test.go | Adds tests for already-exists activation, stale/incomplete recovery, Info(), system mount persistence, and same-name concurrency. |
| core/mount/manager/buckets.go | Extends BoltDB schema keys to support persisting system mounts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Correctly handle cases where the mount activation still exists: - If activation is fully activate, then just return already exists and allow the caller to return error or call Info to continue. - If activation is stale or incomplete due to crash during activation, overwrite the identifier and cleanup the incomplete activation during activate. Signed-off-by: Derek McGowan <derek@mcg.dev>
ae95987 to
80ec03f
Compare
Correctly handle cases where the mount activation still exists:
The issue can arise when restarting a task in which the previous mount not completely deactivated. Since mounts are their own component, there is no guarantee that a mount will get immediately cleaned up when a task is removed (such as with an unclean shutdown). Same if the process is killed during activation, activating a mount requires a two stage transaction to claim an identifier, perform the mounts, then register the mounts back with the identifier. If an unclean shutdown occurs between the stages, an ID can be left claimed without any active mounts. Those can just be reclaimed on activate with any previous mount state cleaned up during GC or during activate.