Skip to content

FasterDecoding/SnapKV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SnapKV

We introduce an innovative and out-of-box KV cache compression method, SnapKV.

Comprehensive Experiment Results on LongBench Pressure Test Result on Needle-in-a-Haystack

Quick Start

Use SnapKV-optimized Models

SnapKV-optimized models are all under models file, which could be directly imported and used the same like baseline models. For example:

from models.modeling_mistral import MistralForCausalLM as SnapKVMistralForCausalLM
model = SnapKVMistralForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    device_map="auto",
    use_flash_attention_2=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_name,
    padding_side="right",
    use_fast=False,
)

Customize Your SnapKV-optimized Models

SnapKV can be easily integrate with other models. You can follow the comment marked with [SnapKV] in existing models to constrcut your own models. The detailed algorithm of SnapKV is in snapkv_utils.py

Motivation

The observations and motivations behind SnapKV could be found at observation folder

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors