Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1027 commits
Select commit Hold shift + click to select a range
49b1e73
docs: Add cuda 12.5 to README.md (#1750)
Smartappli Sep 20, 2024
1324c0c
chore(deps): bump actions/cache from 3 to 4 (#1751)
dependabot[bot] Sep 20, 2024
4744551
feat: Update llama.cpp
abetlen Sep 22, 2024
926b414
feat: Update llama.cpp
abetlen Sep 25, 2024
b3dfb42
chore: Bump version
abetlen Sep 25, 2024
8e07db0
fix: install build dependency
abetlen Sep 25, 2024
65222bc
fix: install build dependency
abetlen Sep 25, 2024
9992c50
fix: Fix speculative decoding
abetlen Sep 26, 2024
11d9562
misc: Rename all_text to remaining_text (#1658)
xu-song Sep 26, 2024
e975dab
fix: Additional fixes for speculative decoding
abetlen Sep 26, 2024
dca0c9a
feat: Update llama.cpp
abetlen Sep 26, 2024
01c7607
feat: Expose libggml in internal APIs (#1761)
abetlen Sep 26, 2024
57e70bb
feat: Update llama.cpp
abetlen Sep 29, 2024
7c4aead
chore: Bump version
abetlen Sep 29, 2024
7403e00
feat: Update llama.cpp
abetlen Oct 22, 2024
e712cff
feat: Update llama.cpp
abetlen Oct 31, 2024
cafa33e
feat: Update llama.cpp
abetlen Nov 15, 2024
d1cb50b
Add missing ggml dependency
abetlen Nov 16, 2024
2796f4e
Add all missing ggml dependencies
abetlen Nov 16, 2024
7ecdd94
chore: Bump version
abetlen Nov 16, 2024
f3fb90b
feat: Update llama.cpp
abetlen Nov 28, 2024
7ba257e
feat: Update llama.cpp
abetlen Dec 6, 2024
9d06e36
fix(ci): Explicitly install arm64 python version
abetlen Dec 6, 2024
fb0b8fe
fix(ci): Explicitly set cmake osx architecture
abetlen Dec 6, 2024
72ed7b8
fix(ci): Explicitly test on arm64 macos runner
abetlen Dec 6, 2024
8988aaf
fix(ci): Use macos-14 runner
abetlen Dec 6, 2024
f11a781
fix(ci): Use macos-13 runner
abetlen Dec 6, 2024
9a09fc7
fix(ci): Debug print python system architecture
abetlen Dec 6, 2024
a412ba5
fix(ci): Update config
abetlen Dec 6, 2024
df05096
fix(ci): Install with regular pip
abetlen Dec 6, 2024
1cd3f2c
fix(ci): gg
abetlen Dec 6, 2024
b34f200
fix(ci): Use python3
abetlen Dec 6, 2024
d8cc231
fix(ci): Use default architecture chosen by action
abetlen Dec 6, 2024
d5d5099
fix(ci): Update CMakeLists.txt for macos
abetlen Dec 6, 2024
4f17ae5
fix(ci): Remove cuda version 12.5.0 incompatibility with VS (#1838)
pabl-o-ce Dec 6, 2024
991d9cd
fix(ci): Remove CUDA 12.5 from index
abetlen Dec 6, 2024
2795303
chore(deps): bump pypa/cibuildwheel from 2.21.1 to 2.22.0 (#1844)
dependabot[bot] Dec 6, 2024
2523472
fix: Fix pickling of Llama class by setting seed from _seed member. C…
abetlen Dec 6, 2024
d553a54
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Dec 6, 2024
ddac04c
chore(deps): bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0…
dependabot[bot] Dec 6, 2024
fa04cdc
fix logit-bias type hint (#1802)
ddh0 Dec 6, 2024
38fbd29
docs: Remove ref to llama_eval in llama_cpp.py docs (#1819)
richdougherty Dec 6, 2024
4192210
fix: make content not required in ChatCompletionRequestAssistantMessa…
feloy Dec 6, 2024
77a12a3
fix: Re-add suport for CUDA 12.5, add CUDA 12.6 (#1775)
Smartappli Dec 6, 2024
073b7e4
fix: added missing exit_stack.close() to /v1/chat/completions (#1796)
Ian321 Dec 6, 2024
9bd0c95
fix: Avoid thread starvation on many concurrent requests by making us…
gjpower Dec 6, 2024
1ea6154
fix(docs): Update development instructions (#1833)
Florents-Tselai Dec 6, 2024
d610477
fix(examples): Refactor Batching notebook to use new sampler chain AP…
lukestanley Dec 6, 2024
4f0ec65
fix: chat API logprobs format (#1788)
domdomegg Dec 6, 2024
df136cb
misc: Update development Makefile
abetlen Dec 6, 2024
6889429
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Dec 6, 2024
b9b50e5
misc: Update run server command
abetlen Dec 6, 2024
5585f8a
feat: Update llama.cpp
abetlen Dec 9, 2024
61508c2
Add CUDA 12.5 and 12.6 to generated output wheels
abetlen Dec 9, 2024
a9fe0f8
chore: Bump version
abetlen Dec 9, 2024
ca80802
fix(ci): hotfix for wheels
abetlen Dec 9, 2024
002f583
chore: Bump version
abetlen Dec 9, 2024
ea4d86a
fix(ci): update macos runner image to non-deprecated version
abetlen Dec 9, 2024
afedfc8
fix: add missing await statements for async exit_stack handling (#1858)
gjpower Dec 9, 2024
801a73a
feat: Update llama.cpp
abetlen Dec 9, 2024
803924b
chore: Bump version
abetlen Dec 9, 2024
2bc1d97
feat: Update llama.cpp
abetlen Dec 19, 2024
c9dfad4
feat: Update llama.cpp
abetlen Dec 30, 2024
1d5f534
feat: Update llama.cpp
abetlen Jan 8, 2025
e8f14ce
fix: streaming resource lock (#1879)
gjpower Jan 8, 2025
0580cf2
chore: Bump version
abetlen Jan 8, 2025
80be68a
feat: Update llama.cpp
abetlen Jan 29, 2025
0b89fe4
feat: Update llama.cpp
abetlen Jan 29, 2025
14879c7
fix(ci): Fix the CUDA workflow (#1894)
oobabooga Jan 29, 2025
4442ff8
fix: error showing time spent in llama perf context print (#1898)
shakalaca Jan 29, 2025
710e19a
chore: Bump version
abetlen Jan 29, 2025
344c106
feat: Update llama.cpp
abetlen Mar 12, 2025
e232fae
feat: Update llama.cpp
abetlen Mar 12, 2025
37eb5f0
chore: Bump version
abetlen Mar 12, 2025
99f2ebf
feat: Update llama.cpp
abetlen Apr 11, 2025
4c6514d
feat: Update llama.cpp
abetlen May 8, 2025
cb2edb9
chore: Bump version
abetlen May 8, 2025
b1d23df
hotfix: Disable curl support
abetlen May 8, 2025
0d475d7
feat: Update llama.cpp
abetlen Jul 1, 2025
51dce74
misc: Fix support for new parameters, deprecate rpc_servers parameter
abetlen Jul 1, 2025
5a635f4
fix(minor): Fix type hint for older versions of python
abetlen Jul 1, 2025
0dec788
fix: Fix missing deprecated symbols on windows with missing LLAMA_API…
abetlen Jul 1, 2025
cd548bd
feat: Add support for new mtmd api, add Qwen2.5-VL chat handler
abetlen Jul 3, 2025
07a979f
fix: Use num_threads from llama model for mtmd
abetlen Jul 3, 2025
6f3f0bf
docs: Add Qwen2.5-VL to README
abetlen Jul 3, 2025
9770b84
chore: Bump version
abetlen Jul 3, 2025
9e5a4ea
fix: Update reference to in Llama.embed. Closes #2037
abetlen Jul 5, 2025
ae54cde
fix(ci): Update cuda build action to use ubuntu 22.04
abetlen Jul 5, 2025
083fcf6
fix(ci): Add git to package list
abetlen Jul 5, 2025
11d28df
fix(ci): Remove macos-13 builds to fix cross compilation error
abetlen Jul 5, 2025
1580839
chore: Bump version
abetlen Jul 5, 2025
82ad829
fix(ci): update runners for cpu builds
abetlen Jul 5, 2025
7011bc1
fix(ci): Update docker runner
abetlen Jul 6, 2025
b39e9d4
feat: Update llama.cpp
abetlen Jul 6, 2025
98fda8c
fix(ci): Temporarily disable windows cuda wheels
abetlen Jul 6, 2025
8866fbd
chore: Bump version
abetlen Jul 6, 2025
cce4887
fix(ci): Fix macos cpu builds
abetlen Jul 6, 2025
a99fd21
feat: Update llama.cpp
abetlen Jul 15, 2025
c8579d7
fix: Better chat format for Qwen2.5-VL (#2040)
alcoftTAO Jul 15, 2025
d9749cb
chore: Bump version
abetlen Jul 15, 2025
95292e3
feat: Update llama.cpp
abetlen Jul 16, 2025
e1af05f
chore: Bump version
abetlen Jul 18, 2025
4f26028
feat: Update llama.cpp
abetlen Aug 7, 2025
d12ca47
misc: Update pypi downloads badge
abetlen Aug 7, 2025
68e89e8
misc: Add Python 3.13 classifier tag
abetlen Aug 7, 2025
af63792
feat: Add gpt-oss chat format support through strftime_now in chat fo…
abetlen Aug 7, 2025
30ddd56
fix: rename op_offloat to op_offload in llama.py (#2046)
sergey21000 Aug 7, 2025
dfc9bf5
chore: Bump version
abetlen Aug 7, 2025
ce6fd8b
feat: Update llama.cpp
abetlen Aug 15, 2025
c37132b
chore: Bump version
abetlen Aug 15, 2025
ca3b00a
fix(ci): Rename `huggingface-cli` to `hf` (#2149)
abetlen Mar 22, 2026
9f661ff
fix(ci): Fix macos tests, support both Intel and Apple Silicon testin…
abetlen Mar 22, 2026
a9b4a06
misc: Add Ruff formatting (#2148)
abetlen Mar 22, 2026
18aa31e
feat: Update llama.cpp to ggerganov/llama.cpp@49bfddeca18e62fa3d39114…
abetlen Mar 23, 2026
e1f8ac0
ci: add riscv64 wheel builds to release workflow (#2139)
gounthar Mar 23, 2026
11e7a55
fix: Qwen 3.5 support (#2152)
abetlen Mar 23, 2026
a6b1807
chore: Bump version (#2153)
abetlen Mar 23, 2026
f0391c5
fix(ci): release wheel workflow (#2154)
abetlen Mar 24, 2026
909ebf1
fix(ci): cuda wheel workflow (#2155)
abetlen Mar 24, 2026
ccc6bc0
fix(ci): docker build workflow (#2156)
abetlen Mar 24, 2026
7b38c31
feat: expose attention_type parameter in Llama.__init__ (#2143)
jamesbiederbeck Mar 24, 2026
d6f46a5
chore: bump version (#2157)
abetlen Mar 24, 2026
5f9c231
fix(ci): reduce CUDA binary wheel size only including cubins for curr…
abetlen Mar 25, 2026
ac59e5a
fix: handle embedding models without KV memory (#2160)
abetlen Mar 25, 2026
c670222
feat: Update llama.cpp to ggerganov/llama.cpp@c0159f9c1f874da15e94f37…
abetlen Mar 25, 2026
f54421b
Bump version to 0.3.19 (#2162)
abetlen Mar 25, 2026
fcd932a
fix(ci): publish distinct manylinux and musllinux cpu wheels (#2165)
abetlen Mar 29, 2026
7613aca
ci: publish release wheels as py3-none (#2166)
abetlen Mar 29, 2026
7257ba9
feat(server): add model-load chat_template_kwargs (#2168)
abetlen Mar 30, 2026
100b275
feat: Update llama.cpp to ggerganov/llama.cpp@f49e9178767d557a522618b…
abetlen Apr 3, 2026
08e088c
fix(misc): replace deprecated llama.cpp references (#2170)
abetlen Apr 3, 2026
02d6bee
chore: bump version to 0.3.20 (#2171)
abetlen Apr 3, 2026
1bcc5bc
feat: Update llama.cpp to ggerganov/llama.cpp@3bd9aa1f9 (#2176)
abetlen Apr 8, 2026
1b1a320
feat: Update llama.cpp to ggerganov/llama.cpp@227ed28e1 (#2182)
abetlen Apr 13, 2026
d87bf08
feat: Update llama.cpp to ggerganov/llama.cpp@f53577432 (#2189)
abetlen Apr 27, 2026
511b3f4
fix(ci): Build one arm64 py3 release wheel (#2191)
abetlen Apr 27, 2026
c8075d1
chore: bump version to 0.3.21 (#2192)
abetlen Apr 27, 2026
195cc59
fix(ci): Repair py3 CPU release wheels (#2193)
abetlen Apr 27, 2026
d2bcbac
fix(ci): Scope CPU release wheel selectors by OS (#2194)
abetlen Apr 27, 2026
c6dc905
fix(docs): update mkdocstrings inventories config (#2195)
abetlen Apr 27, 2026
587d94a
feat: Update llama.cpp to ggerganov/llama.cpp@63d93d173 (#2197)
abetlen May 2, 2026
d2113a1
feat(ci): re-enable Windows CUDA wheels (#2198)
abetlen May 2, 2026
9cf0ce7
chore: bump version to 0.3.22 (#2200)
abetlen May 2, 2026
2bfd80c
fix(ci): pass CUDA unsupported compiler flag during detection (#2201)
abetlen May 2, 2026
04a3638
fix(ci): pass CUDA compiler arg for Windows detection (#2202)
abetlen May 2, 2026
bc6ff9f
fix(ci): install CUDA CCCL headers for wheel builds (#2203)
abetlen May 2, 2026
14d7846
fix(ci): skip unsupported Windows CUDA versions (#2204)
abetlen May 2, 2026
90e8df9
fix(_internals): use n_tokens0 offset when enabling last-token logits…
Anai-Guo May 4, 2026
128c331
fix: configure n_seq_max for batched embeddings (#2206)
abetlen May 8, 2026
f774690
feat: update llama.cpp to 5d6f18a63 (#2207)
abetlen May 8, 2026
f8c1f36
fix(embed): mark all tokens as output to suppress llama.cpp 'overridi…
Anai-Guo May 11, 2026
5684112
feat: update llama.cpp to 7d442abf (#2214)
abetlen May 11, 2026
4a1a8ec
chore: bump version to 0.3.23 (#2215)
abetlen May 11, 2026
95ccb19
fix(embedding): set kv_unified=True when embedding=True to enable bat…
SanjanaB123 May 13, 2026
7664a3e
feat: Update llama.cpp to ggerganov/llama.cpp@91e84fed6 (#2218)
abetlen May 15, 2026
c7bea71
chore: migrate llama.cpp submodule to ggml-org (#2034)
shalinib-ibm May 15, 2026
5dd9b1c
feat: Update llama.cpp to b9a2170fc (#2223)
abetlen May 18, 2026
52fe54b
feat: Update llama.cpp to c0c7e147e (#2228)
abetlen May 23, 2026
3bda091
docs: add contributing guide (#2229)
abetlen May 24, 2026
2c455a5
feat: Update llama.cpp to d749821db (#2233)
abetlen May 31, 2026
f160bf7
Fix: model fails to load when chat template uses HuggingFace generati…
tobocop2 May 31, 2026
b91460b
feat: enable arm64 musl builds (#2221)
acon96 May 31, 2026
6bdab5d
fix: suppress stdout and stderr in Jupyter notebooks (#2181)
Anai-Guo May 31, 2026
fdf38b3
fix: avoid cleanup errors for partially initialized LlamaModel (#2173)
usernames122 May 31, 2026
e8ee64b
feat: add Jinja2 loop controls to chat templates (#2018)
handshape Jun 1, 2026
5848020
fix: use env var configured multimodal library override paths when lo…
navratil-matej Jun 1, 2026
84bc143
fix: match Transformers `tojson` in chat template rendering (#1486)
CISC Jun 1, 2026
33bf9d2
fix: correct typo in comments and settings description (#2121)
thecaptain789 Jun 1, 2026
73ee7cd
fix(docs): remove double word typo in README (#1791)
Victoran0 Jun 1, 2026
cdb7a75
fix: clear prompt for recurrent / hybrid models when only a partial p…
avion23 Jun 1, 2026
e3aa6b5
docs: fix typo in README (#2072)
ImadSaddik Jun 1, 2026
8687122
docs: fix NanoLlava chat handler name in README (#2059)
anakin87 Jun 1, 2026
da07e46
docs: update llama.cpp build docs link (#2056)
SleepyYui Jun 1, 2026
52cf747
docs: update ROCm install instructions (#1867)
agronholm Jun 1, 2026
c3adb35
server types: Move 'model' parameter to clarify it is used (#1786)
domdomegg Jun 1, 2026
2024060
feat: update llama.cpp to af6528e6d (#2235)
abetlen Jun 1, 2026
26633bd
chore: bump version to 0.3.24 (#2236)
abetlen Jun 1, 2026
c7af423
fix(ci): add Pascal compute capability targets to CUDA wheel builds (…
abetlen Jun 1, 2026
43c92a7
feat(ci): add CUDA 11.8 wheel builds (#2238)
abetlen Jun 1, 2026
718a1ca
feat(ci): add CUDA 13 wheel builds (#2239)
abetlen Jun 1, 2026
927b574
docs: add Python 3.14 classifier (#2240)
abetlen Jun 1, 2026
a9b480f
feat: add Gemma 4 multimodal chat support (#2241)
abetlen Jun 1, 2026
4b66c45
feat: update llama.cpp to 210a6570c (#2242)
abetlen Jun 2, 2026
f1bfa11
chore: bump version to 0.3.25 (#2243)
abetlen Jun 2, 2026
d185d64
fix: handle additional `from_pretrained` files in subfolders (#2085)
TNing Jun 2, 2026
bbdc851
chore(deps): bump pypa/cibuildwheel from 2.22.0 to 3.4.1 (#2249)
dependabot[bot] Jun 3, 2026
dad5d0a
chore(deps): bump actions/cache from 4 to 5 (#2248)
dependabot[bot] Jun 3, 2026
aa944e4
ci: cache embedding test model (#2250)
abetlen Jun 3, 2026
b439a84
chore(deps): bump actions/upload-artifact from 4 to 7 (#2245)
dependabot[bot] Jun 3, 2026
f8bd67d
chore(deps): bump docker/setup-buildx-action from 3 to 4 (#2246)
dependabot[bot] Jun 3, 2026
6e6c4e6
chore(deps): bump actions/setup-python from 5 to 6 (#2247)
dependabot[bot] Jun 3, 2026
ab7a9b0
feat(ci): add Vulkan wheel builds (#2251)
abetlen Jun 3, 2026
3754c04
feat(ci): add ROCm wheel builds (#2252)
abetlen Jun 3, 2026
ddaac10
feat: update llama.cpp (#2253)
abetlen Jun 3, 2026
8d2d269
feat: update llama.cpp (#2254)
abetlen Jun 3, 2026
cc2efc5
feat: update llama.cpp to ggml-org/llama.cpp@94a220cd6 (#2255)
abetlen Jun 3, 2026
e2d148a
feat: update llama.cpp to ggml-org/llama.cpp@e3ba22d6c (#2262)
abetlen Jun 4, 2026
d099001
chore(deps): bump actions/download-artifact from 4 to 8 (#2257)
dependabot[bot] Jun 4, 2026
df45432
chore(deps): bump actions/checkout from 4 to 6 (#2258)
dependabot[bot] Jun 4, 2026
b46bccf
chore(deps): bump docker/login-action from 3 to 4 (#2259)
dependabot[bot] Jun 4, 2026
927dde2
chore(deps): bump actions/deploy-pages from 4 to 5 (#2260)
dependabot[bot] Jun 4, 2026
9f6efb0
chore(deps): bump conda-incubator/setup-miniconda from 3.1.0 to 4.0.1…
dependabot[bot] Jun 4, 2026
2dae477
feat: Generic Multimodal Chat Handler (#2256)
abetlen Jun 4, 2026
8edcd15
chore(deps): bump docker/build-push-action from 6 to 7 (#2263)
dependabot[bot] Jun 4, 2026
64c0175
chore(deps): bump actions/upload-pages-artifact from 3 to 5 (#2264)
dependabot[bot] Jun 4, 2026
6bccad5
chore(deps): bump actions/configure-pages from 5 to 6 (#2265)
dependabot[bot] Jun 4, 2026
23fe09f
chore(deps): bump softprops/action-gh-release from 2 to 3 (#2266)
dependabot[bot] Jun 4, 2026
9013c1d
chore(deps): bump docker/setup-qemu-action from 3 to 4 (#2267)
dependabot[bot] Jun 4, 2026
c2e22ae
feat: update llama.cpp to ggml-org/llama.cpp@7c158fbb4 (#2268)
abetlen Jun 5, 2026
5151ac7
chore: bump version to 0.3.26 (#2269)
abetlen Jun 5, 2026
78ac75e
fix(ci): repair release wheel workflows (#2270)
abetlen Jun 5, 2026
7c86eae
fix(ci): allow empty wheel indexes (#2271)
abetlen Jun 5, 2026
6721989
fix(ci): index all CUDA wheel variants (#2272)
abetlen Jun 5, 2026
4684985
fix(ci): build one riscv64 release wheel (#2273)
abetlen Jun 5, 2026
8949066
docs: add Gemma 4 Colab notebook (#2274)
abetlen Jun 5, 2026
7a2a36d
docs: fix Gemma 4 Colab notebook (#2275)
abetlen Jun 5, 2026
7f16fe1
docs: add Gemma 4 QAT Colab notebook (#2276)
abetlen Jun 6, 2026
ed83366
feat: update llama.cpp to 5a69c9743 (#2277)
abetlen Jun 6, 2026
66635a0
feat(example): Updated server example (batch processing, `/v1/respons…
abetlen Jun 7, 2026
cf18830
feat: update llama.cpp to ggml-org/llama.cpp@465b1f0e7 (#2278)
abetlen Jun 7, 2026
380177b
chore: bump version to 0.3.27 (#2279)
abetlen Jun 7, 2026
fe927bd
feat(example): add OpenAI-compatible embeddings endpoint (#2281)
abetlen Jun 7, 2026
db66da3
feat: update llama.cpp to ggml-org/llama.cpp@9e3b928fd (#2282)
abetlen Jun 7, 2026
fddee27
feat(example): align server MTP support with llama.cpp (#2283)
abetlen Jun 7, 2026
8e470ac
chore: bump version to 0.3.28 (#2284)
abetlen Jun 7, 2026
a72325b
fix(example): avoid duplicate streamed response deltas (#2285)
abetlen Jun 8, 2026
411e0f4
fix(example): derive streaming response parser boundaries from schema…
abetlen Jun 8, 2026
7eb494d
fix(ci): repair Linux accelerator wheels (#2286)
abetlen Jun 8, 2026
d4ac2c2
fix(example): support multi-step Responses tool streaming (#2288)
abetlen Jun 8, 2026
e8191f0
fix(example): correct GPT-OSS tool calling config for server example …
abetlen Jun 8, 2026
e107999
feat: update llama.cpp to 8f83d6c27 (#2290)
abetlen Jun 8, 2026
051dda2
feat(example): support server video inputs and Gemma text tool calls …
abetlen Jun 9, 2026
0edb5d8
feat: update llama.cpp to ggml-org/llama.cpp@e3471b3e7 (#2294)
abetlen Jun 9, 2026
b5eefc8
feat: update llama.cpp to ggml-org/llama.cpp@76da2450a (#2295)
abetlen Jun 10, 2026
19ea70c
feat: update llama.cpp to ggml-org/llama.cpp@ac4cddeb0 (#2297)
abetlen Jun 11, 2026
65b50ca
feat: update llama.cpp to ggml-org/llama.cpp@3e7bd4f39 (#2298)
abetlen Jun 12, 2026
565d3c5
feat: update llama.cpp to ggml-org/llama.cpp@f05cf4676 (#2300)
abetlen Jun 13, 2026
a52702f
feat(example): use MTMD batch encoding (#2301)
abetlen Jun 13, 2026
ddc0d15
chore: bump version to 0.3.29 (#2302)
abetlen Jun 13, 2026
e807092
fix(ci): skip mtmd CLI wrappers in package builds (#2303)
abetlen Jun 13, 2026
3850aff
fix(ci): use C++ compiler for Docker builds (#2304)
abetlen Jun 14, 2026
541b08c
feat: update llama.cpp to ggml-org/llama.cpp@6e9007ae6 (#2307)
abetlen Jun 15, 2026
824565a
feat: update llama.cpp to 6eab47181 (#2308)
abetlen Jun 15, 2026
822146b
feat: update llama.cpp to e3a74b299 (#2310)
abetlen Jun 16, 2026
a804233
feat: add Pyodide wheel support (#2309)
abetlen Jun 16, 2026
ddb6a05
chore: bump version to 0.3.30 (#2311)
abetlen Jun 16, 2026
7440aaa
feat: update llama.cpp to f449e0553 (#2312)
abetlen Jun 20, 2026
b11fe07
chore: bump version to 0.3.31 (#2317)
abetlen Jun 20, 2026
9be3cd1
fix: preserve recurrent/hybrid model state when the full prompt is al…
allthatido Jun 22, 2026
4bee85b
feat: update llama.cpp to 92e854ab8 (#2318)
abetlen Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Try the following:
1. `git clone https://github.com/abetlen/llama-cpp-python`
2. `cd llama-cpp-python`
3. `rm -rf _skbuild/` # delete any old builds
4. `python setup.py develop`
4. `python -m pip install .`
5. `cd ./vendor/llama.cpp`
6. Follow [llama.cpp's instructions](https://github.com/ggerganov/llama.cpp#build) to `cmake` llama.cpp
7. Run llama.cpp's `./main` with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, [log an issue with llama.cpp](https://github.com/ggerganov/llama.cpp/issues)
Expand Down
10 changes: 9 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,12 @@ updates:
- package-ecosystem: "pip" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
interval: "daily"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
- package-ecosystem: "docker"
directory: "/"
schedule:
interval: "daily"
202 changes: 180 additions & 22 deletions .github/workflows/build-and-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,63 +11,221 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
os: [ubuntu-22.04, windows-2022, macos-14, macos-15]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v6
with:
submodules: "true"
submodules: "recursive"

# Used to host cibuildwheel
- uses: actions/setup-python@v3
- uses: actions/setup-python@v6
with:
python-version: "3.9"

- name: Install cibuildwheel
run: python -m pip install cibuildwheel==2.12.1
- name: Install dependencies (Linux/MacOS)
if: runner.os != 'Windows'
run: |
python -m pip install --upgrade pip
python -m pip install uv
RUST_LOG=trace python -m uv pip install -e .[all] --verbose
shell: bash

- name: Install dependencies
- name: Install dependencies (Windows)
if: runner.os == 'Windows'
env:
RUST_LOG: trace
run: |
python -m pip install --upgrade pip
python -m pip install -e .[all]
python -m pip install uv
python -m uv pip install -e .[all] --verbose
shell: cmd

- name: Build wheels
uses: pypa/cibuildwheel@v3.4.1
env:
# Keep repair disabled by default for non-Linux platforms in this job.
CIBW_REPAIR_WHEEL_COMMAND: ""
# Linux needs auditwheel repair so manylinux and musllinux wheels are
# published with distinct platform tags instead of generic linux tags.
CIBW_REPAIR_WHEEL_COMMAND_LINUX: "LD_LIBRARY_PATH=/project/llama_cpp/lib auditwheel repair -w {dest_dir} {wheel}"
# cibuildwheel v3 defaults to manylinux_2_28 images whose current
# GCC toolchain emits symbols newer than the policy allows.
CIBW_MANYLINUX_X86_64_IMAGE: "manylinux2014"
# The release wheel is tagged py3-none, so one build per platform
# covers all supported Python versions and avoids duplicate names.
CIBW_BUILD_LINUX: "cp38-*"
CIBW_BUILD_MACOS: "cp39-*"
CIBW_BUILD_WINDOWS: "cp39-*"
# Skip cibuildwheel's default i686 sidecar and keep Linux release
# wheels on a portable x86_64 CPU baseline.
CIBW_ARCHS_LINUX: "auto64"
CIBW_ARCHS_WINDOWS: "AMD64"
CIBW_ENVIRONMENT_LINUX: CMAKE_ARGS="-DGGML_NATIVE=off"
# Keep macOS release wheels on a portable CPU baseline instead of
# inheriting the hosted runner's native flags.
CIBW_ENVIRONMENT_MACOS: CMAKE_ARGS="-DGGML_NATIVE=off"
with:
package-dir: .
output-dir: wheelhouse

- uses: actions/upload-artifact@v7
with:
name: wheels-${{ matrix.os }}
path: ./wheelhouse/*.whl

build_wheels_arm64:
name: Build arm64 wheels
runs-on: ubuntu-24.04-arm
steps:
- uses: actions/checkout@v6
with:
submodules: "recursive"

- name: Build wheels
uses: pypa/cibuildwheel@v3.4.1
env:
CIBW_SKIP: "pp*"
CIBW_REPAIR_WHEEL_COMMAND: "LD_LIBRARY_PATH=$PWD/llama_cpp/lib auditwheel repair -w {dest_dir} {wheel}"
CIBW_ARCHS: "aarch64"
# Keep this consistent with the x86_64 Linux release wheels.
CIBW_MANYLINUX_AARCH64_IMAGE: "manylinux2014"
# Keep native arm64 builds on a portable CPU baseline instead of
# tuning wheels to the hosted runner.
CIBW_ENVIRONMENT: CMAKE_ARGS="-DGGML_NATIVE=off"
# The release wheel is tagged py3-none, so one build covers all
# supported Python versions and avoids duplicate wheel names.
CIBW_BUILD: "cp38-*"
with:
output-dir: wheelhouse

- name: Upload wheels as artifacts
uses: actions/upload-artifact@v7
with:
name: wheels_arm64
path: ./wheelhouse/*.whl

build_wheels_riscv64:
name: Build riscv64 wheel
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
submodules: "recursive"

- name: Set up QEMU
uses: docker/setup-qemu-action@v4
with:
platforms: linux/riscv64

- name: Build wheels
run: python -m cibuildwheel --output-dir wheelhouse
uses: pypa/cibuildwheel@v3.4.1
env:
CIBW_SKIP: "*musllinux* pp*"
CIBW_REPAIR_WHEEL_COMMAND: ""
CIBW_ARCHS: "riscv64"
# Build riscv64 wheels against a conservative baseline instead of
# enabling RVV-related extensions from the build container.
CIBW_ENVIRONMENT: CMAKE_ARGS="-DGGML_NATIVE=off -DGGML_RVV=off -DGGML_RV_ZFH=off -DGGML_RV_ZVFH=off -DGGML_RV_ZICBOP=off -DGGML_RV_ZIHINTPAUSE=off"
# The release wheel is tagged py3-none, so one riscv64 build is
# enough and avoids duplicate same-name release artifacts.
CIBW_BUILD: "cp310-*"
with:
output-dir: wheelhouse

- name: Upload wheels as artifacts
uses: actions/upload-artifact@v7
with:
name: wheels_riscv64
path: ./wheelhouse/*.whl

build_wheels_pyodide:
name: Build Pyodide wheel
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
submodules: "recursive"

- uses: actions/setup-python@v6
with:
python-version: "3.12"

- name: Build wheel
uses: pypa/cibuildwheel@v4.1.0
env:
CIBW_PLATFORM: "pyodide"
CIBW_BUILD: "cp314-pyodide_wasm32"
CIBW_BUILD_VERBOSITY: "1"
CIBW_REPAIR_WHEEL_COMMAND: ""
CIBW_BEFORE_TEST: "curl -L --fail --retry 3 -o /tmp/stories260K.gguf https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf"
CIBW_TEST_COMMAND: "python -c \"import llama_cpp.mtmd_cpp as mtmd; from llama_cpp import Llama; print('mtmd marker', mtmd.mtmd_default_marker().decode()); llm = Llama(model_path='/tmp/stories260K.gguf', n_ctx=64, n_batch=8, n_threads=1, verbose=False); print('loaded', llm.n_vocab(), llm.n_ctx()); print('generated', llm('Once upon a', max_tokens=1, temperature=0)['choices'][0]['text'])\""
CMAKE_ARGS: "-DLLAMA_WASM_MEM64=OFF -DEMSCRIPTEN_SYSTEM_PROCESSOR=wasm32 -DGGML_NATIVE=OFF -DGGML_OPENMP=OFF -DGGML_METAL=OFF -DGGML_BLAS=OFF -DGGML_CUDA=OFF -DGGML_HIP=OFF -DGGML_VULKAN=OFF -DGGML_OPENCL=OFF -DGGML_RPC=OFF -DLLAMA_CURL=OFF -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_TOOLS=OFF -DLLAMA_BUILD_SERVER=OFF"
with:
output-dir: wheelhouse

- uses: actions/upload-artifact@v3
- name: Upload wheels as artifacts
uses: actions/upload-artifact@v7
with:
name: wheels_pyodide
path: ./wheelhouse/*.whl

build_sdist:
name: Build source distribution
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v6
with:
submodules: "recursive"

- uses: actions/setup-python@v6
with:
submodules: "true"
- uses: actions/setup-python@v3
- name: Install dependencies
python-version: "3.9"

- name: Install dependencies (Linux/MacOS)
if: runner.os != 'Windows'
run: |
python -m pip install --upgrade pip build
python -m pip install -e .[all]
python -m pip install --upgrade pip
python -m pip install uv
RUST_LOG=trace python -m uv pip install -e .[all] --verbose
python -m uv pip install build
shell: bash

- name: Install dependencies (Windows)
if: runner.os == 'Windows'
env:
RUST_LOG: trace
run: |
python -m pip install --upgrade pip
python -m pip install uv
python -m uv pip install -e .[all] --verbose
python -m uv pip install build
shell: cmd

- name: Build source distribution
run: |
python -m build --sdist
- uses: actions/upload-artifact@v3

- uses: actions/upload-artifact@v7
with:
name: sdist
path: ./dist/*.tar.gz

release:
name: Release
needs: [build_wheels, build_sdist]
needs: [build_wheels, build_wheels_arm64, build_wheels_riscv64, build_wheels_pyodide, build_sdist]
if: startsWith(github.ref, 'refs/tags/')
runs-on: ubuntu-latest

steps:
- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v8
with:
name: artifact
merge-multiple: true
path: dist
- uses: softprops/action-gh-release@v1

- uses: softprops/action-gh-release@v3
with:
files: dist/*
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
41 changes: 30 additions & 11 deletions .github/workflows/build-docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,32 +9,51 @@ permissions:
jobs:
docker:
name: Build and push Docker image
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v6
with:
submodules: "true"
submodules: "recursive"

- name: Set image tag
run: |
if [[ "${GITHUB_REF_TYPE}" == "tag" ]]; then
image_tag="${GITHUB_REF_NAME}"
else
image_tag="${GITHUB_REF_NAME//\//-}"
fi
echo "IMAGE_TAG=$image_tag" >> "$GITHUB_ENV"

- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v4

- name: Login to GitHub Container Registry
uses: docker/login-action@v2
uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push
uses: docker/build-push-action@v4
id: docker_build
uses: docker/build-push-action@v7
with:
context: .
file: "docker/simple/Dockerfile"
push: true # push to registry
pull: true # always fetch the latest base images
platforms: linux/amd64,linux/arm64 # build for both amd64 and arm64
tags: ghcr.io/abetlen/llama-cpp-python:latest
push: ${{ startsWith(github.ref, 'refs/tags/') }}
pull: true
platforms: linux/amd64,linux/arm64
tags: |
ghcr.io/abetlen/llama-cpp-python:latest
ghcr.io/abetlen/llama-cpp-python:${{ env.IMAGE_TAG }}
build-args: |
BUILDKIT_INLINE_CACHE=1

- name: Publish to GitHub Tag
if: steps.docker_build.outputs.digest && startsWith(github.ref, 'refs/tags/')
run: |
echo "Docker image published for tag: ${{ github.ref_name }}"
Loading