[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback by HellBoxyz · Pull Request #4874 · unslothai/unsloth

HellBoxyz · 2026-04-06T13:57:39Z

Problem

Unsloth Studio doesn't detect GPU on AMD/Intel systems. The VRAM detection (_get_gpu_free_memory()) uses only nvidia-smi, so on non-NVIDIA hardware it returns an empty list. This means:

Studio thinks there's no GPU at all
Context length stays at full native (e.g. 128K) without auto-reduction
KV cache doesn't fit in VRAM and spills into system RAM
Inference is slow because data transfers between GPU and RAM

Fix

Add a vulkaninfo fallback that kicks in when nvidia-smi is not available:

Parses Vulkan memory heap budgets (VK_EXT_memory_budget)
Correctly handles multi-GPU systems and GPUs with multiple DEVICE_LOCAL heaps
nvidia-smi still has priority — zero impact on NVIDIA setups
When nvidia-smi succeeds (returncode 0), its result is authoritative — empty list means no visible GPUs, no fallback to vulkan

Before / After

Before (AMD GPU):

GPUs free: [], selected: None, fit: True
→ 128K context, KV cache in RAM, slow

After (AMD GPU):

Vulkan GPU memory detected: GPU0=7382MiB
GPUs free: [(0, 7382)], selected: [0], fit: False
→ Context auto-reduced to fit VRAM, everything on GPU

Tested on

AMD Radeon RX 5700 XT (8 GB), Windows 11, Vulkan 1.4.341
Model: gemma-4-E4B-it Q4_K_XL (4.8 GB)
Context properly auto-reduced, full GPU offload with -ngl -1
12 unit tests covering parser + orchestrator edge cases

gemini-code-assist

Code Review

This pull request introduces a fallback mechanism for detecting free GPU memory using vulkaninfo, enabling support for AMD, Intel, and other Vulkan-compatible hardware when nvidia-smi is unavailable. The review feedback identifies a logic error in the parsing of vulkaninfo output, where multiple memory heaps are incorrectly treated as distinct GPUs, and provides a more robust implementation that groups heaps by physical device.

gemini-code-assist · 2026-04-06T14:02:29Z

+        # Split output into per-heap blocks at each "	memoryHeaps[N]:"
+        # marker, then check each block for DEVICE_LOCAL flag and budget.
+        heap_sections = re.split(r"(?=\tmemoryHeaps\[\d+\]:)", output)
+        budget_re = re.compile(r"budget\s*=\s*(\d+)")
+
+        gpus: list[tuple[int, int]] = []
+        gpu_idx = 0
+        for section in heap_sections:
+            if not section.strip().startswith("memoryHeaps["):
+                continue
+            if "MEMORY_HEAP_DEVICE_LOCAL_BIT" not in section:
+                continue
+            budget_m = budget_re.search(section)
+            if not budget_m:
+                continue
+            budget_bytes = int(budget_m.group(1))
+            free_mib = budget_bytes // (1024 * 1024)
+            if free_mib > 0:
+                gpus.append((gpu_idx, free_mib))
+                gpu_idx += 1


The current parsing logic for vulkaninfo output is not robust for all systems. It treats every device-local memory heap as a separate GPU, which is incorrect for multi-GPU systems or single GPUs that expose multiple device-local heaps. This can lead to misreporting the number of GPUs and their available memory, causing issues with GPU selection and model offloading.

A more robust approach is to group memory heaps by physical device and report the largest available memory budget for each. This ensures that each physical GPU is represented as a single entry with its correct available VRAM.

# Split output by physical device. vulkaninfo typically separates devices # with headers like "GPU0", "GPU1", etc. on their own lines. # The lookahead (?=...) keeps the delimiter. device_sections = re.split(r"(?=^GPU\\d+\\n)", output, flags=re.MULTILINE) if len(device_sections) > 1: # Filter out any non-GPU sections (like the header before GPU0) device_sections = [s for s in device_sections if s.strip().startswith("GPU")] # If no GPUn headers, device_sections contains the whole output as one element. budget_re = re.compile(r"budget\\s*=\\s*(\\d+)") gpus: list[tuple[int, int]] = [] for gpu_idx, device_section in enumerate(device_sections): # For each physical device, find the largest device-local memory heap budget. # A single GPU can have multiple device-local heaps. max_free_mib = 0 heap_sections = re.split(r"(?=\\tmemoryHeaps\\[\\d+\\]:)", device_section) for section in heap_sections: if "MEMORY_HEAP_DEVICE_LOCAL_BIT" in section: budget_m = budget_re.search(section) if budget_m: budget_bytes = int(budget_m.group(1)) free_mib = budget_bytes // (1024 * 1024) if free_mib > max_free_mib: max_free_mib = free_mib if max_free_mib > 0: gpus.append((gpu_idx, max_free_mib))

_get_gpu_free_memory() relied exclusively on nvidia-smi, returning an empty list on non-NVIDIA systems. This caused the VRAM-aware context auto-reduction logic to be skipped entirely: models launched with full native context (e.g. 128K+), KV caches spilled into system RAM, and inference performance degraded significantly. Add a vulkaninfo fallback that parses VK_EXT_memory_budget heap data to detect DEVICE_LOCAL VRAM budget on AMD, Intel, and any Vulkan-capable GPU. Handles multi-GPU systems (split by GPU device headers) and GPUs with multiple DEVICE_LOCAL heaps (takes largest budget per device). nvidia-smi retains priority — zero impact on NVIDIA setups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rolandtannous · 2026-04-06T16:13:07Z

This is duplicate of #4720

HellBoxyz requested a review from rolandtannous as a code owner April 6, 2026 13:57

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

HellBoxyz changed the title ~~[Studio] Add vulkaninfo fallback for GPU memory detection on AMD/Intel GPUs~~ [Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback Apr 6, 2026

HellBoxyz force-pushed the fix/vulkan-gpu-memory-detection branch from 23d2dc0 to a73223d Compare April 6, 2026 16:10

HellBoxyz requested a review from danielhanchen as a code owner April 6, 2026 16:10

rolandtannous marked this pull request as draft April 6, 2026 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback#4874

[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback#4874
HellBoxyz wants to merge 1 commit intounslothai:mainfrom
HellBoxyz:fix/vulkan-gpu-memory-detection

HellBoxyz commented Apr 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 6, 2026

Uh oh!

rolandtannous commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

HellBoxyz commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Before / After

Tested on

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

rolandtannous commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HellBoxyz commented Apr 6, 2026 •

edited

Loading