lookup

LMCache Lookup

This is an example to demonstrate how to check the existence of and pin a request's KV cache in an LMCacheEngine externally.

Prerequisites

Your server should have at least 1 GPU.

This will use port 8000 for 1 vllm and port 8001 for LMCache controller.

Steps

Start the vllm engine at port 8000:

CUDA_VISIBLE_DEVICES=0 LMCACHE_CONFIG_FILE=example.yaml vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096  --gpu-memory-utilization 0.8 --port 8000 --kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

Start the lmcache controller at port 9000 and the monitor at port 9001:

lmcache_controller --host localhost --port 9000 --monitor-port 9001

Send a request to vllm engine:

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "prompt": "Explain the significance of KV cache in language models.",
    "max_tokens": 10
  }'

Send a lookup request to lmcache controller:

curl -X POST http://localhost:9000/lookup \
  -H "Content-Type: application/json" \
  -d '{
    "tokens": [128000, 849, 21435, 279, 26431, 315, 85748, 6636, 304, 4221, 4211, 13]
  }'

The above request returns the cache information.

You should be able to see a return message:

{"event_id": "xxx", "lmcache_default_instance": ("LocalCPUBackend", 12)}

lmcache_default_instance indicates the instance_id and ("LocalCPUBackend", 12) indicates the cache location within that instance and matched prefix length. event_id is an identifier of the controller operation, which can be ignored in this functionality.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
example.yaml		example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

LMCache Lookup

Prerequisites

Steps

FilesExpand file tree

lookup

Directory actions

More options

Directory actions

More options

Latest commit

History

lookup

Folders and files

parent directory

README.md

LMCache Lookup

Prerequisites

Steps