Skip to content

fix(skills): stage updates in a temp dir and swap in-place#13449

Open
SamMorrowDrums wants to merge 4 commits into
trunkfrom
sammorrowdrums/fix-skill-update-in-place
Open

fix(skills): stage updates in a temp dir and swap in-place#13449
SamMorrowDrums wants to merge 4 commits into
trunkfrom
sammorrowdrums/fix-skill-update-in-place

Conversation

@SamMorrowDrums
Copy link
Copy Markdown
Contributor

@SamMorrowDrums SamMorrowDrums commented May 18, 2026

Closes #13370.

Problem

gh skill update could relocate skills and delete the original install directory:

  • With --dir, namespaced layouts ({dir}/{namespace}/{name}/SKILL.md) had their install base computed by walking filepath.Dir twice, putting the new install one level too shallow. The post-install migration step then RemoveAlled the original namespaced directory, taking the user's files with it.
  • Without --dir, the same migration cleanup quietly relocated namespaced skills found under an agent install directory to a flat layout, even though users expect updates to land where the skill currently lives.

Either path breaks symlinks, mounts, and any external references that point at the skill directory.

Fix

Each update is staged into a sibling directory of the existing skill dir (same filesystem, so renames are atomic). On success the contents are swapped in via per-entry rename:

  1. Move every existing entry from the skill dir into a sibling backup dir.
  2. Move every staged entry into the skill dir.
  3. Remove the backup.

If any step fails, the rollback path removes any freshly installed entries and moves the originals back from the backup.

This guarantees:

  • All updates happen in-place, regardless of layout (flat vs. namespaced) and regardless of whether the skill was found via an agent host or via --dir.
  • The skill directory's inode is preserved, so symlinks, mounts, and external references continue to resolve.
  • Stale files from the previous version are removed.
  • A failure at any point (install, read, rename) leaves the existing skill completely untouched.

Tests

  • New acceptance test acceptance/testdata/skills/skills-update-inplace.txtar plants a namespaced skill, runs gh skill update --dir, and asserts the skill remains at the original namespaced path with no flat-path duplicate.
  • New unit test TestSwapDirectoryContents_PreservesDestInode asserts the skill directory's identity is preserved across a successful swap (via os.SameFile).
  • New unit test TestSwapDirectoryContents_RollsBackOnFailure exercises the rollback path and asserts the original content (including nested subdirs) is fully restored.
  • The "namespaced skill with --dir" unit test is updated to assert in-place behavior and stale-file cleanup.
  • The existing install_failure_during_update_reports_error_and_continues unit test had a harness shortcut that silently skipped its verify block whenever wantErr was set. That shortcut is removed so the "preserve original on install failure" assertion is now actually enforced.

Previously, `gh skill update --dir` walked two directory levels up for
namespaced skill layouts when computing the install base, causing the
skill to be relocated and the original directory to be removed by the
post-install migration step. The same migration also relocated
namespaced skills found under agent install directories during a normal
update, even though users expect updates to land in the same place the
skill was discovered.

Update now stages each install into a private temp directory and, on
success, atomically swaps the contents into the existing skill
directory. This:

- Always updates in-place, regardless of whether the skill lives under
  an agent host directory or a custom `--dir`, and regardless of
  whether the layout is flat or namespaced.
- Preserves the skill directory's inode so symlinks, mounts, and
  external references continue to resolve.
- Removes stale files left over from the previous version.
- Leaves the existing skill completely untouched if the install fails
  partway through.

The previous test harness silently skipped `verify` whenever `wantErr`
was set, which masked the existing failure-preservation assertion.
That harness shortcut is removed so the assertion is enforced.

Adds an acceptance test that plants a namespaced skill, runs an
update with `--dir`, and asserts the skill remains at its original
namespaced path with no flat-path duplicate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SamMorrowDrums SamMorrowDrums marked this pull request as ready for review May 18, 2026 13:09
@SamMorrowDrums SamMorrowDrums requested a review from a team as a code owner May 18, 2026 13:09
Copilot AI review requested due to automatic review settings May 18, 2026 13:09
@SamMorrowDrums SamMorrowDrums requested a review from a team as a code owner May 18, 2026 13:09
@SamMorrowDrums SamMorrowDrums requested a review from BagToad May 18, 2026 13:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes gh skill update so updates are applied in-place (preserving the existing skill directory) by staging installs in a temporary directory and then replacing the contents of the installed skill directory, preventing accidental relocation/deletion of namespaced installs (closes #13370).

Changes:

  • Reworks update flow to stage an updated skill in a temp directory and then copy/move files into the existing install directory (no directory relocation/removal).
  • Updates unit tests to assert in-place behavior (including stale-file cleanup) and ensures error-path verify blocks actually run.
  • Adds an acceptance test covering in-place updates for namespaced skills when using --dir.
Show a summary per file
File Description
pkg/cmd/skills/update/update.go Replaces “install then possibly delete/migrate” logic with a staging + in-place content replacement approach.
pkg/cmd/skills/update/update_test.go Updates expectations for in-place behavior; fixes test harness so verify runs even when wantErr is set.
acceptance/testdata/skills/skills-update-inplace.txtar New acceptance test ensuring namespaced --dir updates don’t relocate or delete the original directory.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (3)

pkg/cmd/skills/update/update.go:469

  • The implementation deletes the existing contents of u.local.dir before moving/copying staged files in. If RemoveAll or any subsequent moveOrCopy/copyPath call fails partway through (permissions, disk full, interrupted copy, etc.), the skill directory can be left partially updated or empty, which contradicts the function contract/comments (“failed install leaves the existing skill completely untouched”) and the PR description. To make this failure-safe, consider a transactional approach (e.g., rename existing entries into a temporary backup dir first, populate from staging, and restore from backup on error; ideally stage on the same filesystem so renames can be atomic).
	if err := os.MkdirAll(u.local.dir, 0o755); err != nil {
		return fmt.Errorf("could not ensure skill directory %s: %w", u.local.dir, err)
	}
	existing, err := os.ReadDir(u.local.dir)
	if err != nil {
		return fmt.Errorf("could not read skill directory %s: %w", u.local.dir, err)
	}
	for _, entry := range existing {
		if err := os.RemoveAll(filepath.Join(u.local.dir, entry.Name())); err != nil {
			return fmt.Errorf("could not clean skill directory %s: %w", u.local.dir, err)
		}
	}

	staged, err := os.ReadDir(stagedSkillDir)
	if err != nil {
		return fmt.Errorf("could not read staged skill directory %s: %w", stagedSkillDir, err)
	}
	for _, entry := range staged {
		src := filepath.Join(stagedSkillDir, entry.Name())
		dst := filepath.Join(u.local.dir, entry.Name())
		if err := moveOrCopy(src, dst); err != nil {
			return fmt.Errorf("could not move %s into place: %w", entry.Name(), err)
		}
	}

pkg/cmd/skills/update/update.go:481

  • moveOrCopy falls back to copyPath for any os.Rename error. That can mask real failures (e.g., permission denied, destination already exists, invalid path) and may produce partial copies while still discarding the original error context. It’s safer to only fall back for cross-device rename failures (EXDEV) and otherwise return the rename error.
// moveOrCopy renames src to dst, falling back to a recursive copy when the
// rename crosses filesystem boundaries (e.g. when TMPDIR lives on a separate
// volume from the skill directory).
func moveOrCopy(src, dst string) error {
	if err := os.Rename(src, dst); err == nil {
		return nil
	}
	return copyPath(src, dst)
}

pkg/cmd/skills/update/update.go:515

  • copyPath reads entire files into memory via os.ReadFile before writing. If a skill includes large files, this can cause unnecessary memory spikes. Consider switching to streaming copy (open src/dst and io.Copy) while still applying the desired permissions.
	default:
		data, err := os.ReadFile(src)
		if err != nil {
			return err
		}
		return os.WriteFile(dst, data, info.Mode().Perm())
	}
  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread pkg/cmd/skills/update/update.go
Address review feedback on the in-place update implementation:

- Stage into a sibling of the existing skill directory so every rename
  during the swap is intra-filesystem and atomic. This eliminates the
  cross-device copy fallback and the associated memory-spike concern
  (no more os.ReadFile/os.WriteFile of arbitrary skill files).
- Replace the "clear-then-fill" sequence with a backup-and-swap: move
  existing entries into a sibling backup dir, then move staged entries
  into place. If any step fails, restore from backup. This makes the
  "failed install leaves the existing skill untouched" guarantee hold
  even when the failure occurs mid-swap rather than only during install.
- Drop the moveOrCopy/copyPath helpers entirely.

Adds two unit tests: TestSwapDirectoryContents_PreservesDestInode
asserts the directory's identity is preserved across a successful
swap, and TestSwapDirectoryContents_RollsBackOnFailure exercises the
rollback path by pointing the swap at a non-existent staged dir,
verifying original content (including nested subdirs) is fully
restored.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match prior-art style: leading docstring on helper tests (see
TestDeduplicateByName_Namespaced) and use assert.Empty for clearer
failure output instead of a bare t.Errorf in a loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@tommaso-moro tommaso-moro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gh skill update --dir relocates the skill and deletes the original install directory

3 participants