Skip to content

Fix mount manager activation error when already exists#13126

Open
dmcgowan wants to merge 1 commit intocontainerd:mainfrom
dmcgowan:handle-mount-already-exists
Open

Fix mount manager activation error when already exists#13126
dmcgowan wants to merge 1 commit intocontainerd:mainfrom
dmcgowan:handle-mount-already-exists

Conversation

@dmcgowan
Copy link
Copy Markdown
Member

Correctly handle cases where the mount activation still exists:

  • If activation is fully activate, then just return already exists and allow the caller to return error or call Info to continue.
  • If activation is stale or incomplete due to crash during activation, overwrite the identifier and cleanup the incomplete activation during activate.

The issue can arise when restarting a task in which the previous mount not completely deactivated. Since mounts are their own component, there is no guarantee that a mount will get immediately cleaned up when a task is removed (such as with an unclean shutdown). Same if the process is killed during activation, activating a mount requires a two stage transaction to claim an identifier, perform the mounts, then register the mounts back with the identifier. If an unclean shutdown occurs between the stages, an ID can be left claimed without any active mounts. Those can just be reclaimed on activate with any previous mount state cleaned up during GC or during activate.

@github-project-automation github-project-automation Bot moved this to Needs Triage in Pull Request Review Mar 26, 2026
@dmcgowan dmcgowan added cherry-pick/2.2.x Change to be cherry picked to release/2.2 branch and removed kind/bug area/runtime Runtime size/L labels Mar 26, 2026
@dmcgowan dmcgowan added this to the 2.3 milestone Mar 26, 2026
@dmcgowan dmcgowan moved this from Needs Triage to Needs Reviewers in Pull Request Review Mar 26, 2026
github-actions Bot added a commit to dmcgowan/containerd that referenced this pull request Apr 4, 2026
github-actions Bot added a commit to dmcgowan/containerd that referenced this pull request Apr 5, 2026
github-actions Bot added a commit to dmcgowan/containerd that referenced this pull request Apr 6, 2026
github-actions Bot added a commit to dmcgowan/containerd that referenced this pull request Apr 7, 2026
github-actions Bot added a commit to dmcgowan/containerd that referenced this pull request Apr 8, 2026
github-actions Bot added a commit to dmcgowan/containerd that referenced this pull request Apr 9, 2026
Comment thread core/mount/manager/manager.go Outdated
Comment thread core/mount/manager/manager_test.go Outdated
Comment on lines +541 to +544
cur, err := sbkt.CreateBucket([]byte{byte(i)})
if err != nil {
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a guaranteed upper bound on the number of system mounts? Casting the loop index to a single byte truncates the value, which limits the maximum number of system mounts to 256 before silently wrapping around to zero and causing CreateBucket to fail. Furthermore, since BoltDB iterates keys lexicographically, a naive integer-to-string conversion would alter the expected read sequence (e.g., "10" sorts before "2"), potentially disrupting mount layering order. It might be safer to format the index as a zero-padded string to preserve both unbounded capacity and the correct iteration sequence.

Suggested change
cur, err := sbkt.CreateBucket([]byte{byte(i)})
if err != nil {
return err
}
cur, err := sbkt.CreateBucket([]byte(fmt.Sprintf("%04x", i)))
if err != nil {
return err
}

Comment on lines +628 to +631
if len(m.Options) > 0 {
if err := bkt.Put(bucketKeyOptions, []byte(strings.Join(m.Options, ","))); err != nil {
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any scenarios where a system mount option might natively contain a comma (e.g., a complex assignment like lowerdir=a,b)? If an option contains a comma, joining and splitting by , will incorrectly divide it into disjoint slice elements. Since mount options are passed as C-strings to the kernel and cannot contain null characters, it is generally safer to use a null byte \x00 as the slice delimiter to perfectly preserve the original array boundaries.

Suggested change
if len(m.Options) > 0 {
if err := bkt.Put(bucketKeyOptions, []byte(strings.Join(m.Options, ","))); err != nil {
return err
}
if len(m.Options) > 0 {
if err := bkt.Put(bucketKeyOptions, []byte(strings.Join(m.Options, "\x00"))); err != nil {
return err
}
}

Comment on lines +642 to +644
if v := bkt.Get(bucketKeyOptions); len(v) > 0 {
m.Options = strings.Split(string(v), ",")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the delimiter in putSystemMount is updated to prevent boundary collisions, this reader will also need to be updated to match the new delimiter. This ensures that any option strings containing commas are preserved accurately.

Suggested change
if v := bkt.Get(bucketKeyOptions); len(v) > 0 {
m.Options = strings.Split(string(v), ",")
}
if v := bkt.Get(bucketKeyOptions); len(v) > 0 {
m.Options = strings.Split(string(v), "\x00")
}

@prakleumas
Copy link
Copy Markdown

beep boop
Hello! I am @samuelkarp's friendly code review robot 🤖! Sam asked me to take a look and post my comments directly on GitHub.

Please treat my comments as suggestions and advisory, not mandates. Since I am an LLM and my output is used here directly, you should evaluate carefully and see if my suggestions make sense. Feel free to directly resolve any you disagree with.

This is not an automated action. This bot does not read or reply to comments.

Copilot AI review requested due to automatic review settings April 20, 2026 17:59
@dmcgowan dmcgowan force-pushed the handle-mount-already-exists branch from cd4569f to ae95987 Compare April 20, 2026 17:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves mount-manager activation behavior when an activation record for the same name already exists, allowing callers (notably the v2 task manager) to reuse a fully-active mount and to recover from stale/incomplete activation records left behind by crashes.

Changes:

  • Update v2 task creation to handle ErrAlreadyExists from mount activation by querying mount Info() and continuing with existing mounts.
  • Make mount manager Activate() differentiate “fully active” vs “stale/incomplete” existing records, reclaim stale records, and serialize same-name activations to avoid races.
  • Persist and reload “system mounts” in the mount manager so Info() can return complete activation info; add comprehensive tests for these scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
core/runtime/v2/task_manager.go Reuses existing mount activation on ErrAlreadyExists by calling m.mounts.Info() and updating opts.Rootfs.
core/mount/manager/manager.go Adds keyed per-name activation locking; distinguishes active vs stale records; cleans stale targets; persists/reads system mounts.
core/mount/manager/manager_test.go Adds tests for already-exists activation, stale/incomplete recovery, Info(), system mount persistence, and same-name concurrency.
core/mount/manager/buckets.go Extends BoltDB schema keys to support persisting system mounts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/mount/manager/manager.go
Comment thread core/runtime/v2/task_manager.go
Comment thread core/mount/manager/manager.go Outdated
Comment thread core/mount/manager/manager.go Outdated
Correctly handle cases where the mount activation still exists:
- If activation is fully activate, then just return already exists and
  allow the caller to return error or call Info to continue.
- If activation is stale or incomplete due to crash during activation,
  overwrite the identifier and cleanup the incomplete activation during
  activate.

Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan dmcgowan force-pushed the handle-mount-already-exists branch from ae95987 to 80ec03f Compare April 21, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick/2.2.x Change to be cherry picked to release/2.2 branch size/L

Projects

Status: Needs Reviewers

Development

Successfully merging this pull request may close these issues.

4 participants