Learn how to compress and optimize your prompt contexts using ContextLab's built-in strategies.
ContextLab provides four compression strategies:
- Deduplication: Remove near-duplicate chunks
- Summarization: Condense content via LLM or extractive methods
- Sliding Window: Keep only recent or most salient chunks
- Hybrid: Combine multiple strategies
Remove redundant chunks based on embedding similarity.
contextlab compress <run_id> --strategy dedup --dedup-threshold 0.95from contextlab import analyze, compress
# Analyze first
report = await analyze(paths=["docs/*.md"])
# Compress
result = await compress(
chunks=report.chunks,
strategy="dedup",
model="gpt-4o-mini",
threshold=0.95 # Similarity threshold (0-1)
)
print(f"Removed {result.original_chunks - result.compressed_chunks} duplicate chunks")
print(f"Compression ratio: {result.compression_ratio:.2%}")- Document collections with repeated content
- Multiple sources covering similar topics
- FAQ lists with similar questions
threshold(default: 0.95): Similarity threshold for considering chunks duplicates- 0.95+: Near-identical text only
- 0.8-0.95: Similar meaning, different wording
- <0.8: May remove genuinely different content
Condense chunks while preserving key information.
result = await compress(
chunks=report.chunks,
strategy="summarize",
model="gpt-4o-mini",
target_ratio=0.5, # Compress to 50% of original
use_llm=False # Use extractive method
)result = await compress(
chunks=report.chunks,
strategy="summarize",
model="gpt-4o-mini",
target_ratio=0.5,
use_llm=True # Requires OpenAI API key
)- Long documents that can be condensed
- Verbose content with key takeaways
- When meaning preservation is more important than exact wording
target_ratio(default: 0.5): Target size as fraction of original (0-1)use_llm(default: True): Use LLM vs extractive summarization
Keep only a subset of chunks based on recency or salience.
Keep the most recent N chunks:
result = await compress(
chunks=report.chunks,
strategy="sliding_window",
model="gpt-4o-mini",
window_size=10,
mode="recent"
)Keep the top N chunks by salience:
result = await compress(
chunks=report.chunks,
strategy="sliding_window",
model="gpt-4o-mini",
window_size=10,
mode="salient"
)- Chat/conversation history (recent mode)
- Knowledge base queries (salient mode)
- When you need a fixed-size context
window_size(default: 5): Number of chunks to keepmode(default: "recent"): "recent" or "salient"
Combine multiple strategies for optimal compression.
contextlab compress <run_id> \
--strategy hybrid \
--dedup-threshold 0.95 \
--window-size 20 \
--summarizeresult = await compress(
chunks=report.chunks,
strategy="hybrid",
model="gpt-4o-mini",
dedup_threshold=0.95, # First: remove duplicates
window_size=20, # Second: keep top 20 salient
summarize=True, # Third: summarize remaining
target_ratio=0.7 # Target 70% of original
)The hybrid strategy applies transformations in order:
- Deduplication: Remove near-duplicates
- Windowing (if
window_sizespecified): Keep top N by salience - Summarization (if
summarize=True): Condense remaining chunks
- Complex documents requiring multiple optimizations
- When you need aggressive compression
- Production systems with tight token budgets
After compression, optimize chunk selection for a specific token limit:
from contextlab import optimize
# Compress first
result = await compress(
chunks=report.chunks,
strategy="hybrid",
model="gpt-4o-mini"
)
# Then optimize under budget
plan = await optimize(
report, # Original report
limit=8000, # Token limit
strategy="greedy" # or "ilp" for optimal solution
)
print(f"Selected {len(plan.kept_chunks)} chunks under {plan.limit} tokens")
print(f"Total relevance: {plan.total_relevance:.3f}")Run all strategies and compare results:
strategies = ["dedup", "summarize", "sliding_window", "hybrid"]
results = {}
for strategy in strategies:
result = await compress(
chunks=report.chunks,
strategy=strategy,
model="gpt-4o-mini"
)
results[strategy] = result
# Compare
for name, result in results.items():
print(f"{name:15} → {result.compressed_tokens:5} tokens "
f"({result.compression_ratio:.1%})")- Start with dedup: Always remove duplicates first
- Test compression ratios: Start conservative (0.7-0.8), then increase
- Validate results: Check that compressed content maintains meaning
- Use hybrid for production: Combines best of all strategies
- Monitor quality: Track relevance scores after compression
async def optimize_context(documents: list[str], limit: int):
# Step 1: Analyze
report = await analyze(text="\n\n".join(documents))
# Step 2: Compress with hybrid strategy
result = await compress(
chunks=report.chunks,
strategy="hybrid",
dedup_threshold=0.95,
window_size=50,
summarize=True,
target_ratio=0.6
)
# Step 3: Optimize under budget
plan = await optimize(report, limit=limit)
# Step 4: Reconstruct optimized context
optimized_text = "\n\n".join(c.text for c in plan.kept_chunks)
return optimized_text, plan
# Use it
optimized, plan = await optimize_context(my_documents, limit=8000)
print(f"Compressed from {plan.limit} to {plan.final_tokens} tokens")