Skip to content

Latest commit

 

History

History
270 lines (197 loc) · 6.21 KB

File metadata and controls

270 lines (197 loc) · 6.21 KB

Tutorial 2: Compression Strategies

Learn how to compress and optimize your prompt contexts using ContextLab's built-in strategies.

Overview

ContextLab provides four compression strategies:

  1. Deduplication: Remove near-duplicate chunks
  2. Summarization: Condense content via LLM or extractive methods
  3. Sliding Window: Keep only recent or most salient chunks
  4. Hybrid: Combine multiple strategies

Deduplication

Remove redundant chunks based on embedding similarity.

CLI

contextlab compress <run_id> --strategy dedup --dedup-threshold 0.95

Python SDK

from contextlab import analyze, compress

# Analyze first
report = await analyze(paths=["docs/*.md"])

# Compress
result = await compress(
    chunks=report.chunks,
    strategy="dedup",
    model="gpt-4o-mini",
    threshold=0.95  # Similarity threshold (0-1)
)

print(f"Removed {result.original_chunks - result.compressed_chunks} duplicate chunks")
print(f"Compression ratio: {result.compression_ratio:.2%}")

When to use

  • Document collections with repeated content
  • Multiple sources covering similar topics
  • FAQ lists with similar questions

Parameters

  • threshold (default: 0.95): Similarity threshold for considering chunks duplicates
    • 0.95+: Near-identical text only
    • 0.8-0.95: Similar meaning, different wording
    • <0.8: May remove genuinely different content

Summarization

Condense chunks while preserving key information.

Extractive Summarization (No API)

result = await compress(
    chunks=report.chunks,
    strategy="summarize",
    model="gpt-4o-mini",
    target_ratio=0.5,  # Compress to 50% of original
    use_llm=False      # Use extractive method
)

LLM-based Summarization

result = await compress(
    chunks=report.chunks,
    strategy="summarize",
    model="gpt-4o-mini",
    target_ratio=0.5,
    use_llm=True  # Requires OpenAI API key
)

When to use

  • Long documents that can be condensed
  • Verbose content with key takeaways
  • When meaning preservation is more important than exact wording

Parameters

  • target_ratio (default: 0.5): Target size as fraction of original (0-1)
  • use_llm (default: True): Use LLM vs extractive summarization

Sliding Window

Keep only a subset of chunks based on recency or salience.

Recent Mode

Keep the most recent N chunks:

result = await compress(
    chunks=report.chunks,
    strategy="sliding_window",
    model="gpt-4o-mini",
    window_size=10,
    mode="recent"
)

Salient Mode

Keep the top N chunks by salience:

result = await compress(
    chunks=report.chunks,
    strategy="sliding_window",
    model="gpt-4o-mini",
    window_size=10,
    mode="salient"
)

When to use

  • Chat/conversation history (recent mode)
  • Knowledge base queries (salient mode)
  • When you need a fixed-size context

Parameters

  • window_size (default: 5): Number of chunks to keep
  • mode (default: "recent"): "recent" or "salient"

Hybrid Strategy

Combine multiple strategies for optimal compression.

CLI

contextlab compress <run_id> \
    --strategy hybrid \
    --dedup-threshold 0.95 \
    --window-size 20 \
    --summarize

Python SDK

result = await compress(
    chunks=report.chunks,
    strategy="hybrid",
    model="gpt-4o-mini",
    dedup_threshold=0.95,  # First: remove duplicates
    window_size=20,        # Second: keep top 20 salient
    summarize=True,        # Third: summarize remaining
    target_ratio=0.7       # Target 70% of original
)

Pipeline

The hybrid strategy applies transformations in order:

  1. Deduplication: Remove near-duplicates
  2. Windowing (if window_size specified): Keep top N by salience
  3. Summarization (if summarize=True): Condense remaining chunks

When to use

  • Complex documents requiring multiple optimizations
  • When you need aggressive compression
  • Production systems with tight token budgets

Budget Optimization

After compression, optimize chunk selection for a specific token limit:

from contextlab import optimize

# Compress first
result = await compress(
    chunks=report.chunks,
    strategy="hybrid",
    model="gpt-4o-mini"
)

# Then optimize under budget
plan = await optimize(
    report,  # Original report
    limit=8000,  # Token limit
    strategy="greedy"  # or "ilp" for optimal solution
)

print(f"Selected {len(plan.kept_chunks)} chunks under {plan.limit} tokens")
print(f"Total relevance: {plan.total_relevance:.3f}")

Comparing Strategies

Run all strategies and compare results:

strategies = ["dedup", "summarize", "sliding_window", "hybrid"]
results = {}

for strategy in strategies:
    result = await compress(
        chunks=report.chunks,
        strategy=strategy,
        model="gpt-4o-mini"
    )
    results[strategy] = result

# Compare
for name, result in results.items():
    print(f"{name:15}{result.compressed_tokens:5} tokens "
          f"({result.compression_ratio:.1%})")

Best Practices

  1. Start with dedup: Always remove duplicates first
  2. Test compression ratios: Start conservative (0.7-0.8), then increase
  3. Validate results: Check that compressed content maintains meaning
  4. Use hybrid for production: Combines best of all strategies
  5. Monitor quality: Track relevance scores after compression

Example: End-to-End Pipeline

async def optimize_context(documents: list[str], limit: int):
    # Step 1: Analyze
    report = await analyze(text="\n\n".join(documents))

    # Step 2: Compress with hybrid strategy
    result = await compress(
        chunks=report.chunks,
        strategy="hybrid",
        dedup_threshold=0.95,
        window_size=50,
        summarize=True,
        target_ratio=0.6
    )

    # Step 3: Optimize under budget
    plan = await optimize(report, limit=limit)

    # Step 4: Reconstruct optimized context
    optimized_text = "\n\n".join(c.text for c in plan.kept_chunks)

    return optimized_text, plan

# Use it
optimized, plan = await optimize_context(my_documents, limit=8000)
print(f"Compressed from {plan.limit} to {plan.final_tokens} tokens")

Next Steps