Skip to content

[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback#4874

Draft
HellBoxyz wants to merge 1 commit intounslothai:mainfrom
HellBoxyz:fix/vulkan-gpu-memory-detection
Draft

[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback#4874
HellBoxyz wants to merge 1 commit intounslothai:mainfrom
HellBoxyz:fix/vulkan-gpu-memory-detection

Conversation

@HellBoxyz
Copy link
Copy Markdown

@HellBoxyz HellBoxyz commented Apr 6, 2026

Problem

Unsloth Studio doesn't detect GPU on AMD/Intel systems. The VRAM detection (_get_gpu_free_memory()) uses only nvidia-smi, so on non-NVIDIA hardware it returns an empty list. This means:

  • Studio thinks there's no GPU at all
  • Context length stays at full native (e.g. 128K) without auto-reduction
  • KV cache doesn't fit in VRAM and spills into system RAM
  • Inference is slow because data transfers between GPU and RAM

Fix

Add a vulkaninfo fallback that kicks in when nvidia-smi is not available:

  • Parses Vulkan memory heap budgets (VK_EXT_memory_budget)
  • Correctly handles multi-GPU systems and GPUs with multiple DEVICE_LOCAL heaps
  • nvidia-smi still has priority — zero impact on NVIDIA setups
  • When nvidia-smi succeeds (returncode 0), its result is authoritative — empty list means no visible GPUs, no fallback to vulkan

Before / After

Before (AMD GPU):

GPUs free: [], selected: None, fit: True
→ 128K context, KV cache in RAM, slow

After (AMD GPU):

Vulkan GPU memory detected: GPU0=7382MiB
GPUs free: [(0, 7382)], selected: [0], fit: False
→ Context auto-reduced to fit VRAM, everything on GPU

Tested on

  • AMD Radeon RX 5700 XT (8 GB), Windows 11, Vulkan 1.4.341
  • Model: gemma-4-E4B-it Q4_K_XL (4.8 GB)
  • Context properly auto-reduced, full GPU offload with -ngl -1
  • 12 unit tests covering parser + orchestrator edge cases

@HellBoxyz HellBoxyz requested a review from rolandtannous as a code owner April 6, 2026 13:57
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism for detecting free GPU memory using vulkaninfo, enabling support for AMD, Intel, and other Vulkan-compatible hardware when nvidia-smi is unavailable. The review feedback identifies a logic error in the parsing of vulkaninfo output, where multiple memory heaps are incorrectly treated as distinct GPUs, and provides a more robust implementation that groups heaps by physical device.

Comment on lines +405 to +424
# Split output into per-heap blocks at each " memoryHeaps[N]:"
# marker, then check each block for DEVICE_LOCAL flag and budget.
heap_sections = re.split(r"(?=\tmemoryHeaps\[\d+\]:)", output)
budget_re = re.compile(r"budget\s*=\s*(\d+)")

gpus: list[tuple[int, int]] = []
gpu_idx = 0
for section in heap_sections:
if not section.strip().startswith("memoryHeaps["):
continue
if "MEMORY_HEAP_DEVICE_LOCAL_BIT" not in section:
continue
budget_m = budget_re.search(section)
if not budget_m:
continue
budget_bytes = int(budget_m.group(1))
free_mib = budget_bytes // (1024 * 1024)
if free_mib > 0:
gpus.append((gpu_idx, free_mib))
gpu_idx += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current parsing logic for vulkaninfo output is not robust for all systems. It treats every device-local memory heap as a separate GPU, which is incorrect for multi-GPU systems or single GPUs that expose multiple device-local heaps. This can lead to misreporting the number of GPUs and their available memory, causing issues with GPU selection and model offloading.

A more robust approach is to group memory heaps by physical device and report the largest available memory budget for each. This ensures that each physical GPU is represented as a single entry with its correct available VRAM.

# Split output by physical device. vulkaninfo typically separates devices
# with headers like "GPU0", "GPU1", etc. on their own lines.
# The lookahead (?=...) keeps the delimiter.
device_sections = re.split(r"(?=^GPU\\d+\\n)", output, flags=re.MULTILINE)
if len(device_sections) > 1:
    # Filter out any non-GPU sections (like the header before GPU0)
    device_sections = [s for s in device_sections if s.strip().startswith("GPU")]
# If no GPUn headers, device_sections contains the whole output as one element.

budget_re = re.compile(r"budget\\s*=\\s*(\\d+)")
gpus: list[tuple[int, int]] = []

for gpu_idx, device_section in enumerate(device_sections):
    # For each physical device, find the largest device-local memory heap budget.
    # A single GPU can have multiple device-local heaps.
    max_free_mib = 0
    heap_sections = re.split(r"(?=\\tmemoryHeaps\\[\\d+\\]:)", device_section)
    for section in heap_sections:
        if "MEMORY_HEAP_DEVICE_LOCAL_BIT" in section:
            budget_m = budget_re.search(section)
            if budget_m:
                budget_bytes = int(budget_m.group(1))
                free_mib = budget_bytes // (1024 * 1024)
                if free_mib > max_free_mib:
                    max_free_mib = free_mib
    
    if max_free_mib > 0:
        gpus.append((gpu_idx, max_free_mib))

@HellBoxyz HellBoxyz changed the title [Studio] Add vulkaninfo fallback for GPU memory detection on AMD/Intel GPUs [Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback Apr 6, 2026
_get_gpu_free_memory() relied exclusively on nvidia-smi, returning an
empty list on non-NVIDIA systems. This caused the VRAM-aware context
auto-reduction logic to be skipped entirely: models launched with full
native context (e.g. 128K+), KV caches spilled into system RAM, and
inference performance degraded significantly.

Add a vulkaninfo fallback that parses VK_EXT_memory_budget heap data
to detect DEVICE_LOCAL VRAM budget on AMD, Intel, and any Vulkan-capable
GPU. Handles multi-GPU systems (split by GPU device headers) and GPUs
with multiple DEVICE_LOCAL heaps (takes largest budget per device).

nvidia-smi retains priority — zero impact on NVIDIA setups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@HellBoxyz HellBoxyz force-pushed the fix/vulkan-gpu-memory-detection branch from 23d2dc0 to a73223d Compare April 6, 2026 16:10
@HellBoxyz HellBoxyz requested a review from danielhanchen as a code owner April 6, 2026 16:10
@rolandtannous
Copy link
Copy Markdown
Collaborator

This is duplicate of #4720

@rolandtannous rolandtannous marked this pull request as draft April 6, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants