pin

LMCache Pin/Persistence

This is an example to demonstrate how to pin/persist a request's KV cache in an LMCacheEngine externally.

Prerequisites

Your server should have at least 1 GPU.

This will use port 8000 for 1 vllm and port 8001 for LMCache controller.

Steps

Start the vllm engine at port 8000:

CUDA_VISIBLE_DEVICES=0 LMCACHE_CONFIG_FILE=example.yaml vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096  --gpu-memory-utilization 0.8 --port 8000 --kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

Start the lmcache controller at port 9000 and the monitor at port 9001:

lmcache_controller --host localhost --port 9000 --monitor-port 9001

Send a request to vllm engine:

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "prompt": "Explain the significance of KV cache in language models.",
    "max_tokens": 10
  }'

Pin a request's KV cache in the system:

curl -X POST http://localhost:9000/pin \
  -H "Content-Type: application/json" \
  -d '{
    "tokens": [128000, 849, 21435, 279, 26431, 315, 85748, 6636, 304, 4221, 4211, 13]
  }'

You should be able to see a return message indicating the KV cache has been successfully pinned in the system:

{"success": true, "event_id": "xxx"}

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
example.yaml		example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

LMCache Pin/Persistence

Prerequisites

Steps

FilesExpand file tree

pin

Directory actions

More options

Directory actions

More options

Latest commit

History

pin

Folders and files

parent directory

README.md

LMCache Pin/Persistence

Prerequisites

Steps