paged-attention

Here are 12 public repositories matching this topic...

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Apr 20, 2026
Python

lumia431 / photon_infer

Star

A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching

modern-cpp inference-engine ai-infra vllm llm-inference paged-attention continuous-batching

Updated Jan 2, 2026
C++

Implementation of PagedAttention from vLLM paper - a breakthrough attention algorithm that treats KV cache like virtual memory. Eliminates memory fragmentation, increases batch sizes, and dramatically improves LLM serving throughput.

memory-optimization kv-cache llm-inference paged-attention transformer-optimization

Updated Dec 3, 2025
Python

Navi-AI-Lab / nvllm

Star

(Experimental) A high-throughput and memory-efficient inference and serving engine for LLMs with a optimized GB10 kernel

nvidia cuda-kernels cutlass local-inference vllm llm-inference qwen paged-attention self-hosted-ai gb10 sm120 nvfp4 dgx-spark fp4-quantization attention-kernel fp8-kv-cache

Updated Apr 24, 2026
Python

gyunggyung / Agent.cpp

Star

High-performance On-Device MoA (Mixture of Agents) Engine in C++. Optimized for CPU inference with RadixCache & PagedAttention. (Tiny-MoA Native)

c cpp moa on-device-ai llm llamacpp llama-cpp ggml paged-attention cpu-optimization mixture-of-agents radix-attention

Updated Jan 25, 2026
C++

achi9629 / llm-inference-engine

Star

A from scratch LLM inference engine build in PyTorch with custom GPT2/LLaMA/ transformers, kv cache, paged kv cache, continuous batching and A100 benchmarks

nlp deep-learning transformers autoregressive mistral inference-engine model-serving fastapi gpt-2 gpt2 kv-cache llm llm-serving vllm llm-inference paged-attention mistral-7b continuous-batching paged-kv-cache

Updated Apr 10, 2026
Python

MrAMS / llaisys

Star

AI Infra 手写算子实现Qwen2推理，支持Paged Attention

ai-infra paged-attention qwen2

Updated Sep 28, 2025
C++

jrw96 / kv-cache-sim

Star

Discrete-event simulator for LLM inference serving — PagedAttention memory management and continuous batching

simulation inference kv-cache llm paged-attention

Updated Apr 3, 2026
Python

giulio98 / langchain-pced

Star

LangChain integration for Parallel Context-of-Experts Decoding (PCED)

transformers rag langchain paged-attention constrastive-decoding pced

Updated Feb 13, 2026
Python

yogik84 / Tiny-MoA

Star

🤖 Enhance task management with Tiny MoA, a GPU-free multi-agent system that plans, reasons, and collaborates efficiently in real time.

c lightweight cpp falcon agents moa uv on-device-ai llm llamacpp llama-cpp ggml paged-attention cpu-optimization tool-calling lfm2 radix-attention

Updated Apr 24, 2026
Python

nshkrdotcom / vllm

Sponsor

Star

vLLM - High-throughput, memory-efficient LLM inference engine with PagedAttention, continuous batching, CUDA/HIP optimization, quantization (GPTQ/AWQ/INT4/INT8/FP8), tensor/pipeline parallelism, OpenAI-compatible API, multi-GPU/TPU/Neuron support, prefix caching, and multi-LoRA capabilities

Updated Apr 23, 2026
Elixir

LessUp / hetero-paged-infer

Star

High-Performance LLM Inference Engine with PagedAttention & Continuous Batching | 高性能LLM推理引擎 - 内存浪费<5%, 吞吐率+50%

rust machine-learning high-performance transformer gpu-computing production-ready systems-programming inference-engine serving kv-cache llm vllm llm-inference paged-attention continuous-batching

Updated Apr 23, 2026
Rust

Improve this page

Add a description, image, and links to the paged-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the paged-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paged-attention

Here are 12 public repositories matching this topic...

xlite-dev / Awesome-LLM-Inference

lumia431 / photon_infer

VARUN3WARE / Paged-Attention

Navi-AI-Lab / nvllm

gyunggyung / Agent.cpp

achi9629 / llm-inference-engine

MrAMS / llaisys

jrw96 / kv-cache-sim

giulio98 / langchain-pced

yogik84 / Tiny-MoA

nshkrdotcom / vllm

LessUp / hetero-paged-infer

Improve this page

Add this topic to your repo