GitHub - ivanopcode/devnote-override-macos-metal-vram-cap: Dev Note — Override macOS Metal “VRAM” Cap on Apple Silicon for Local LLMs

Dev Note — Override macOS Metal “VRAM” Cap on Apple Silicon for Local LLMs

Based on: Greg Stencel, Apple Silicon limitations with usage on local LLM Source: stencel.io/posts/apple-silicon-limitations-with-usage-on-local-llm .html

Why this matters

On Apple Silicon, Metal exposes a recommendedMaxWorkingSetSize (~75% of unified RAM). Tools like Ollama/llama.cpp (MPS/Metal) treat this as a hard ceiling, e.g. ~96 GB on a 128 GB M1 Ultra. You can raise this cap via a hidden kernel parameter to fit larger models fully on the GPU.

Prerequisites

Apple Silicon Mac with unified memory.
macOS Sonoma (14+) recommended; older versions supported with a different key.
Admin privileges for sudo.
Willingness to accept unsupported/undocumented tweaks (may impact stability).

Quick Start (Sonoma 14+)

Increase Metal’s working set so larger LLMs stay on-GPU:

# Example: set ~120 GB on a 128 GB machine (122880 MB)
sudo sysctl iogpu.wired_limit_mb=122880

No reboot required. See Verify.

Choose a Safe Limit

Leave 8–16 GB for macOS and other processes to avoid memory pressure.

Total RAM	Typical Default Cap (~75%)	Example Raised Cap	MB Value
32 GB	~21–24 GB	~28–30 GB	28672–30720
64 GB	~48 GB	~56 GB	57344
128 GB	~96 GB	~120 GB	122880

Formula: desired_MB = desired_GiB * 1024

Set the Limit (Sonoma 14+)

Use the MB-based sysctl:

# Check current value (0 = system default ~75%)
sysctl iogpu.wired_limit_mb

# Set new cap (e.g., 120 GB)
sudo sysctl iogpu.wired_limit_mb=122880

Set the Limit (Ventura/Monterey)

Older macOS used a bytes key:

# Bytes value (example ~56 GB)
# 56 * 1024 * 1024 * 1024 = 60129542144
sudo sysctl debug.iogpu.wired_limit=60129542144

Verify

Ollama / llama.cpp logs should show a higher ggml_metal_init: recommendedMaxWorkingSetSize = XXXXX MB
Read back the sysctl:
```
sysctl iogpu.wired_limit_mb
```
Watch Activity Monitor → Memory Pressure while running the model.

Persist Across Reboots (Optional)

May require disabling SIP to modify system files. Consider re-running the one-liner manually when needed instead.

# /etc/sysctl.conf (create if missing)
# Sonoma 14+ (MB)
iogpu.wired_limit_mb=122880

Reboot to apply.

Revert

# Reset to system default (~75%)
sudo sysctl iogpu.wired_limit_mb=0

# If persisted, remove the line from /etc/sysctl.conf and reboot.

Operational Tips & Cautions

Do not set to 100% of RAM—reserve headroom (8–16 GB).
If Memory Pressure turns yellow/red or system starts swapping, lower the cap.
Changes are unsupported by Apple; future macOS updates may alter behavior.

Notes for Local LLM Usage

Raising the cap helps keep entire models + KV cache on GPU for faster tokens/sec.
Quantization (e.g., 4–8 bit) reduces footprint; balance context length vs memory.
Hybrid fallback is possible (overflow on CPU) but slows generation.
Keep Ollama/llama.cpp updated; Metal/MPS backends evolve frequently.

Credit: This procedure and context are derived from Greg Stencel’s post: Apple Silicon limitations with usage on local LLM — https://stencel.io/posts/apple-silicon-limitations-with-usage-on-local-llm%20.html

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dev Note — Override macOS Metal “VRAM” Cap on Apple Silicon for Local LLMs

Why this matters

Contents

Prerequisites

Quick Start (Sonoma 14+)

Choose a Safe Limit

Set the Limit (Sonoma 14+)

Set the Limit (Ventura/Monterey)

Verify

Persist Across Reboots (Optional)

Revert

Operational Tips & Cautions

Notes for Local LLM Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

Dev Note — Override macOS Metal “VRAM” Cap on Apple Silicon for Local LLMs

Why this matters

Contents

Prerequisites

Quick Start (Sonoma 14+)

Choose a Safe Limit

Set the Limit (Sonoma 14+)

Set the Limit (Ventura/Monterey)

Verify

Persist Across Reboots (Optional)

Revert

Operational Tips & Cautions

Notes for Local LLM Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages