Skip to content

fix(windows): fix stale venv deletion and CUDA torch stripping on Win…#4877

Draft
JyothishArumugam wants to merge 4 commits intounslothai:mainfrom
JyothishArumugam:fix/windows-stale-venv-cuda-torch
Draft

fix(windows): fix stale venv deletion and CUDA torch stripping on Win…#4877
JyothishArumugam wants to merge 4 commits intounslothai:mainfrom
JyothishArumugam:fix/windows-stale-venv-cuda-torch

Conversation

@JyothishArumugam
Copy link
Copy Markdown

Fixes #4701

Changes

  • Replace Remove-Item with cmd /c rd /s /q for venv deletion on Windows.
    PowerShell's Remove-Item fails with "Access Denied" on .exe files inside
    freshly created venvs due to Windows file handle locking, even when no
    process is running.

  • Install unsloth with --no-deps then re-pin CUDA torch separately.
    uv's dependency resolver strips the +cuXXX CUDA suffix from torch when
    resolving unsloth's dependencies in the same invocation, causing setup.ps1
    to always detect "torch cpu != required cuXXX" and attempt a venv rebuild.

  • Since --no-deps skips requirements, we explicitly call Find-NoTorchRuntimeFile
    to ensure the environment is fully populated before the final CUDA re-pinning.

Tested on

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2f7701053

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread install.ps1 Outdated
Comment on lines +879 to +883
$baseInstallExit = Invoke-InstallCommand { uv pip install --python $VenvPython --no-deps --upgrade-package unsloth "$PackageName" unsloth-zoo }
if ($baseInstallExit -eq 0) {
# Install runtime deps from requirements file if present
$RuntimeReq = Find-NoTorchRuntimeFile
if ($RuntimeReq) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve dependency install for custom --package values

This branch now installs "$PackageName" with --no-deps and then tries to recover dependencies via Find-NoTorchRuntimeFile, which only works when the installed package contains Unsloth’s studio/backend/requirements/no-torch-runtime.txt. For the documented custom-package flow (--package roland-sloth), that file may not exist, so the install can succeed with missing runtime deps; because SKIP_STUDIO_BASE=1 is set later, setup won’t reinstall them. This is a regression from the previous direct install path that let pip/uv resolve the custom package dependencies normally.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the installation logic to ensuring --no-deps workaround only applies to the default unsloth package. For custom --package installs, the script now uses standard dependency resolution while explicitly ensuring the unsloth CLI is installed to prevent the 'missing CLI' error.

@rolandtannous
Copy link
Copy Markdown
Collaborator

@JyothishArumugam yes for number 1.
not really understanding the need for number 2. Tested and works fine on windows machines. can you please explain to me the need for this.

@rolandtannous
Copy link
Copy Markdown
Collaborator

submit the fix for "Access Denied" separately so it can be tested and merged.
for changing the way we install check #4701 (comment)

@JyothishArumugam
Copy link
Copy Markdown
Author

JyothishArumugam commented Apr 6, 2026

@rolandtannous

The reason for the second fix (the --no-deps + re-pinning strategy) is to address a specific uv resolver behavior on Windows GPU machines.

When installing unsloth and torch+cuXXX in a single pass, uv often 'simplifies' the dependency tree by stripping the +cuXXX suffix, which results in a Torch CPU installation. This causes a regression where setup.ps1 detects the wrong version and triggers an infinite venv-rebuild loop.

By installing the package first with --no-deps and then explicitly re-pinning the CUDA version from the specific Torch index, we ensure the GPU 'engine' is correctly installed and the re-installation loop is broken.

@rolandtannous
Copy link
Copy Markdown
Collaborator

@JyothishArumugam read the comment i linked on the issue.
Windows GPU machines don't have the issue you are describing. It's particular to your machine and i'd be more than happy to help you troubleshoot it. However nvidia powered windows machine are confirmed to work by internal tests and user tests. If the hypothesis was true, it would completely break and fail to work. that's not the case.
Your fix breaks the package and its particular dependency requirements.

Decouple the two fixes . Push the venv fix separately so i can test and merge. Thank you.

@JyothishArumugam JyothishArumugam force-pushed the fix/windows-stale-venv-cuda-torch branch from df38778 to 55207be Compare April 7, 2026 18:07
@JyothishArumugam
Copy link
Copy Markdown
Author

@rolandtannous Thank you for the response. I want to respectfully clarify why Fix 1 alone
does not solve the issue, and why Fix 2 is needed.

Why Fix 1 alone is not enough

Fix 1 (replacing Remove-Item with cmd rd) fixes the deletion failure,
but the deletion is only attempted because setup.ps1 detects
"torch cpu != required cuXXX". If torch is correctly installed as CUDA,
setup.ps1 never tries to delete the venv at all.

So the real question is: why does setup.ps1 always see "torch cpu"?

Why Fix 2 is needed — this is a uv resolver issue, not machine-specific

I added verbose logging to trace exactly what happens during install:

  1. Phase 1: torch==2.6.0+cu124 is installed correctly
  2. Phase 2: uv pip install --upgrade-package unsloth "$PackageName" is run
  3. uv's resolver, when satisfying unsloth's dependencies, pulls torch
    from default PyPI instead of the CUDA index
  4. Result: torch==2.6.0+cu124 is silently replaced with torch==2.10.0
    (no CUDA suffix, CPU-only)
  5. setup.ps1 sees "torch cpu != required cu124" and tries to rebuild the venv

This is documented uv behavior — when --index-url is used in one invocation
and a subsequent invocation resolves the same package from default PyPI, the
CUDA build suffix is stripped. This is not specific to my GPU or CUDA version.

@JackieJK faces the exact same issue

Looking at @JackieJK's log in #4701:
[INFO] Stale venv detected (torch cpu != required cu130) -- rebuilding...
[ERROR] Could not remove stale venv: Access to path 'python.exe' is denied.

This is identical to my error — just with cu130 instead of cu124.
Two different machines, two different Python managers (Scoop vs PyManager),
two different CUDA versions — but the exact same failure sequence:

  1. torch CUDA gets stripped → setup.ps1 detects "torch cpu" → tries to delete venv → deletion fails

If this were machine-specific, the error message would differ. The fact that
both of us hit the identical "torch cpu != required cuXXX" message strongly
suggests this is a uv resolver behavior triggered under certain Windows
PATH configurations.

What I propose

I am happy to decouple the PRs as requested:

  • PR A (Fix 1 only): Remove-Itemcmd rd — safe, isolated, ready to merge
  • PR B (Fix 2 only): torch re-pin after unsloth install — addresses the root cause

But I want to flag that merging only Fix 1 will not resolve the installation
failure for users like @JackieJK and myself. The venv deletion fix only makes
the error message cleaner — it doesn't stop setup.ps1 from attempting the
rebuild in the first place.

Happy to provide any additional logs or reproduce steps to help investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Unsloth Studio installation fails on Windows 11 with scoop

2 participants