README.md

reflex-server

reflex-server owns the HTTP gateway (Axum) and builds the reflex binary.

It depends on the core library crate (reflex-cache) for the cache tiers, storage, embeddings, and vector DB integration.

Run (Dev)

# Requires Qdrant running (gRPC on 6334)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

RUST_LOG=debug cargo run -p reflex-server

Run (Release)

cargo run -p reflex-server --release

Docker Compose

From repo root:

docker compose up -d

Endpoints

GET /healthz
GET /ready
POST /v1/chat/completions (OpenAI-compatible)

Response Status

Responses include X-Reflex-Status:

hit-l1-exact: exact request match
hit-l3-verified: semantic hit verified by L3
miss: forwarded to provider and stored

The response choices[].message.content is Tauq-encoded.

Configuration

Most commonly used env vars:

Variable	Default	Notes
`REFLEX_PORT`	`8080`	HTTP port
`REFLEX_BIND_ADDR`	`127.0.0.1`	Bind address
`REFLEX_QDRANT_URL`	`http://localhost:6334`	Qdrant gRPC
`REFLEX_STORAGE_PATH`	`./.data`	Storage base path
`REFLEX_L1_CAPACITY`	`10000`	L1 capacity
`REFLEX_MODEL_PATH`	(unset)	Unset = stub embedder
`REFLEX_RERANKER_PATH`	(unset)	Optional reranker
`REFLEX_RERANKER_THRESHOLD`	`0.70`	L3 threshold
`REFLEX_MOCK_PROVIDER`	(unset)	Set to bypass real provider calls

Point Your Agent

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=sk-your-key

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-your-key")
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quicksort"}],
)

GPU Features

Build/run with one of:

--features metal (Apple Silicon)
--features cuda (NVIDIA)
--features cpu (docs.rs-style CPU flag; usually you can omit)

Example:

cargo run -p reflex-server --release --no-default-features --features metal

Tests

cargo test -p reflex-server

Real integration tests (needs Qdrant):

REFLEX_QDRANT_URL=http://localhost:6334 cargo test -p reflex-server --test integration_real

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflex-server

Run (Dev)

Run (Release)

Docker Compose

Endpoints

Response Status

Configuration

Point Your Agent

Python (openai SDK)

GPU Features

Tests

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

reflex-server

Run (Dev)

Run (Release)

Docker Compose

Endpoints

Response Status

Configuration

Point Your Agent

Python (openai SDK)

GPU Features

Tests