Add GGUF fit-target control and wire to llama-server --fit-target by aiSynergy37 · Pull Request #4882 · unslothai/unsloth

aiSynergy37 · 2026-04-06T20:36:37Z

Summary

add optional fit_target to inference load API request/response and status response for GGUF
wire fit_target through backend GGUF load path into LlamaCppBackend.load_model(...)
when --fit on is used, pass --fit-target <value> to llama-server
expose Fit Target in Chat Settings (GGUF) with presets: Auto / 64 / 128 / 256 / 512
persist loaded fit-target in runtime store and restore it from /api/inference/status

Why

Issue #4857 asks for a Studio UI control to tune llama-server --fit-target for tight VRAM cases where default fit behavior leaves too much GPU memory unused.

Validation

pytest studio/backend/tests/test_native_context_length.py -v (38 passed)
python -m compileall studio/backend/core/inference/llama_cpp.py studio/backend/models/inference.py studio/backend/routes/inference.py
Frontend typecheck could not run in this environment because tsc is unavailable (frontend deps/tools not installed).

Fixes #4857

gemini-code-assist · 2026-04-06T20:36:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

xyehya · 2026-04-07T04:36:39Z

Hey team,
Was testing a fresh installation for the latest release to check if the context size UI control has been fixed or not but the behavior is not consistent.

Tried chat loading a local GGUF file for unsloth gemma-4 which it found automatically under finetunes list in models.

First load --> full gemma-3-31b context length by default + bf16 kv ---> cpu offload --> chat works fine

Tweaked kv to q8_0 and reduced context size to 100k --> the reload command in terminal automatically falls back to 8096 context and q8_0 --> its not respecting the context set from UI but rather falling back to safe kv to fit in VRAM. (When cranking the context above that it displays warning about spilling to RAM nevertheless even clicking apply doesn't force the correct UI set setting)

I think there will be dependencies between applying this current PR and fixing that issue

Expose fit_target from Chat settings through API and GGUF load/status responses, and pass --fit-target when --fit is active. Includes backend regression coverage.\n\nFixes unslothai#4857

… VRAM cap

for more information, see https://pre-commit.ci

aiSynergy37 · 2026-04-09T22:09:56Z

Follow-up fix pushed in 18ad8a7 to address the context fallback concern:\n\n- When users request an explicit context and also set it_target, we now keep the requested context and use --fit instead of silently capping context down to a GPU-only safe value.\n- Added runtime regression test est_explicit_fit_target_keeps_requested_context in studio/backend/tests/test_native_context_length.py to lock this behavior.\n\nThis makes the new Fit Target control effective in tight VRAM scenarios where explicit context previously got reduced before launch.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fded3c9b42

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-09T22:16:53Z

              chat_template_override: chatTemplateOverride,
              cache_type_kv: kvCacheDtype,
              speculative_type: speculativeType,
+              fit_target: fitTarget,


Preserve fit_target across failed model-switch rollback

Adding fit_target to the primary load request here introduces a new rollback gap: if loading the new model fails after unloading the previous GGUF model, the catch-path reloads the previous model without fit_target, so the restored model silently comes back with different VRAM/offload behavior. This only shows up when a non-default fit target was active and a subsequent load fails, but in that case users lose their runtime tuning unexpectedly.

Useful? React with 👍 / 👎.

aiSynergy37 · 2026-04-09T22:29:17Z

Addressed review note: preserve it_target across failed model-switch rollback.\n\nUpdate in commit �38ff2ca captures previousLoadedFitTarget before unload and passes it during rollback loadModel(...), so a failed switch restores the previous GGUF model with the same fit-target tuning.

aiSynergy37 requested review from Manan17, danielhanchen, rolandtannous and wasimysaid as code owners April 6, 2026 20:36

aiSynergy37 mentioned this pull request Apr 6, 2026

[Feature] Add Unsloth Studio UI value to tune the llama-server --fit-target flag for squeezed extra performance. #4857

Open

stoicAI1776 added 2 commits April 10, 2026 03:34

feat(studio): add GGUF fit-target setting for llama-server

658a2ba

Expose fit_target from Chat settings through API and GGUF load/status responses, and pass --fit-target when --fit is active. Includes backend regression coverage.\n\nFixes unslothai#4857

fix(studio): honor explicit fit_target when explicit GGUF ctx exceeds…

18ad8a7

… VRAM cap

aiSynergy37 force-pushed the feat/gguf-fit-target-ui-4857 branch from e11db97 to 18ad8a7 Compare April 9, 2026 22:09

[pre-commit.ci] auto fixes from pre-commit.com hooks

fded3c9

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed Apr 9, 2026

View reviewed changes

fix(studio): preserve fit_target on model-switch rollback

a38ff2c

feat(studio): add 1024 preset to GGUF fit-target selector

3630807

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GGUF fit-target control and wire to llama-server --fit-target#4882

Add GGUF fit-target control and wire to llama-server --fit-target#4882
aiSynergy37 wants to merge 5 commits intounslothai:mainfrom
aiSynergy37:feat/gguf-fit-target-ui-4857

aiSynergy37 commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot commented Apr 6, 2026

Uh oh!

xyehya commented Apr 7, 2026

Uh oh!

aiSynergy37 commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 9, 2026

Uh oh!

aiSynergy37 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

aiSynergy37 commented Apr 6, 2026

Summary

Why

Validation

Uh oh!

gemini-code-assist Bot commented Apr 6, 2026

Uh oh!

xyehya commented Apr 7, 2026

Uh oh!

aiSynergy37 commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

aiSynergy37 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants