-
Notifications
You must be signed in to change notification settings - Fork 18.2k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
ggml: add DeepSeek V4 hyperconnection + KV ops (CPU)
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#23122
opened May 15, 2026 by
cchuter
Loading…
server : fix prompt-cache reuse for hybrid/recurrent models
examples
server
#23121
opened May 15, 2026 by
bjahoor
Loading…
1 task done
ci : fix release symlinks
devops
improvements to build systems and github actions
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#23119
opened May 15, 2026 by
CISC
Member
Loading…
webui: Fix handling of MCP resource template parameters
examples
server/webui
server
#23117
opened May 15, 2026 by
kubawoo
Loading…
server: honour per-request reasoning_budget_tokens in chat completions
examples
server
testing
Everything test related
#23116
opened May 15, 2026 by
bernardladenthin
Loading…
metal: reuse K/V in flash-attn vec for spec-decode
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#23114
opened May 15, 2026 by
forforever73
Contributor
Loading…
[DRAFT] Support for Zaya1 8B model (depends on PR #22833)
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
python
python script changes
testing
Everything test related
#23112
opened May 15, 2026 by
Juste-Leo2
Contributor
•
Draft
vendor : update cpp-httplib to 0.45.0
python
python script changes
script
Script related
#23103
opened May 15, 2026 by
cabelo
Contributor
Loading…
CUDA: Continue directly including cuda/iterator
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#23102
opened May 15, 2026 by
ORippler
Collaborator
Loading…
[SYCL] Level Zero detection in ggml_sycl_init
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#23097
opened May 15, 2026 by
sanmai
Contributor
Loading…
Add TheRock 7.13 build target
devops
improvements to build systems and github actions
#23091
opened May 15, 2026 by
superm1
Contributor
Loading…
Performance optimization for UMA and host-visible buffers in the Vulkan backend
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#23083
opened May 15, 2026 by
winstonma
Loading…
mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check
model
Model specific
python
python script changes
#23082
opened May 15, 2026 by
limloop
Loading…
ggml-hexagon: add PAD op HVX kernel
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
#23078
opened May 14, 2026 by
pdhinaka
Contributor
Loading…
convert : filter lora tensor names
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
python
python script changes
#23077
opened May 14, 2026 by
CISC
Member
Loading…
ui: Restructure repo to use Compilation issues
devops
improvements to build systems and github actions
examples
script
Script related
server/webui
server
tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming
build
#23064
opened May 14, 2026 by
allozaur
Contributor
Loading…
convert : allow dequantizing some fp8 models
python
python script changes
#23062
opened May 14, 2026 by
cora4
Loading…
download: add option to skip_download
examples
server
#23059
opened May 14, 2026 by
ngxson
Contributor
Loading…
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#23056
opened May 14, 2026 by
TheBlueMatt
Contributor
Loading…
NvFP4 quantized LM head support
model
Model specific
#23046
opened May 14, 2026 by
ynankani
Contributor
Loading…
opencl: allow loading precompiled binary kernels from library
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
model: discover hybrid layer types via tensor presence
model
Model specific
#23037
opened May 14, 2026 by
generalchuckles-cm
Loading…
ggml-cuda: per-arch MMQ config variable refactor into tabular structure
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#23031
opened May 14, 2026 by
chrismcmacken
Loading…
chat : add Nemotron Nano v2 specialized parser
testing
Everything test related
#23029
opened May 14, 2026 by
marcusds
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.