cache_interface

User Controllable Caching

This is an example to demonstrate user controllable caching (e.g., specify whether to cache a request or not).

Prerequisites

Your server should have at least 1 GPU.

This will use the port 8000 for 1 vllm.

Steps

Start the vllm engine at port 8000:

CUDA_VISIBLE_DEVICES=0 LMCACHE_USE_EXPERIMENTAL=True LMCACHE_CONFIG_FILE=example.yaml vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096  --gpu-memory-utilization 0.8 --port 8000 --kv-transfer-config '{"kv_connector":"LMCacheConnector", "kv_role":"kv_both"}'

Send a request to vllm engine with store_cache: True:

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "prompt": "Explain the significance of KV cache in language models.",
    "max_tokens": 10,
    "store_cache": True,
  }'

You should be able to see logs indicating the KV cache is stored:

DEBUG LMCache: Store skips 0 tokens and then stores 13 tokens [2025-03-02 21:58:55,147]

Send request to vllm engine 2:

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "prompt": "What's the weather today in Chicago?",
    "max_tokens": 10,
    "store_cache": False,
  }'

You should be able to see logs indicating the KV cache is NOT stored:

DEBUG LMCache: User has specified not to store the cache [2025-03-02 21:54:58,380]

Note that cache is stored by default.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
example.yaml		example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

User Controllable Caching

Prerequisites

Steps

FilesExpand file tree

cache_interface

Directory actions

More options

Directory actions

More options

Latest commit

History

cache_interface

Folders and files

parent directory

README.md

User Controllable Caching

Prerequisites

Steps