Tags: reffdev/node-llama-cpp
Tags
feat(minor): customize `postinstall` behavior (withcatai#582) * feat: customize `postinstall` behavior * feat: experimental support for context KV cache type configurations * feat: support `NVFP4` quants
feat: automatic checkpoints for models that need it (withcatai#573) * feat: automatic checkpoints for models that need it (such as Qwen 3.5 due to its hybrid architecture) * feat(`QwenChatWrapper`): Qwen 3.5 support * feat(`inspect gpu` command): detect and report missing prebuilt binary modules and custom npm registry * feat: initial disk cache dir option for future optimizations (disabled for now) * fix: Qwen 3.5 memory estimation * fix: grammar use with HarmonyChatWrapper * fix: add mistral think segment detection * fix: compress excessively long segments from the current response on context shift instead of throwing an error * fix: default thinking budget to 75% of the context size to prevent low-quality responses * fix: bugs
feat(`getLlama`): `build: "autoAttempt"` (withcatai#564) * feat(`getLlama`): `build: "autoAttempt"` * feat: get rid of octokit * fix(CLI): disable Direct I/O by default * fix: Bun segmentation fault on process exit with undisposed `Llama` * fix: detect glibc inside Nix * fix: stricter CI build flag * chore: update `simple-git` * chore: switch off of `tsconfig.json` deprecated configs * docs: clarify `getLlama`'s `build` option logic
feat: Exclude Top Choices (XTC) (withcatai#553) * feat: Exclude Top Choices (XTC) support * feat: DRY (Don't Repeat Yourself) repeat penalty support * feat: Tiny Aya support * fix: adjust the default VRAM padding config to reserve enough memory for compute buffers * fix: adapt to breaking `llama.cpp` changes * fix: support function call syntax with optional whitespace prefix * fix: find the provided cmake path * fix: change the default value of `useDirectIo` to `false` * fix: Vulkan device dedupe
fix: adapt to `llama.cpp` changes (withcatai#547) * fix: adapt to `llama.cpp` changes * fix: change the level of common logs
feat(`LlamaCompletion`): `stopOnAbortSignal` (withcatai#538) * feat(`LlamaCompletion`): `stopOnAbortSignal` * feat(`LlamaModel`): `useDirectIo` * fix: support new CUDA 13.1 archs * fix: build the prebuilt binaries with CUDA 13.1 instead of 13.0 * docs: stopping a text completion generation
PreviousNext