Problem • How it works • Quick start • Configuration • Comparison
Long-running AI agents hit the context window wall. The built-in SummarizationMiddleware does a single-pass summary with a generic prompt — it works, but it loses critical details and has no lightweight fallbacks.
compact-middleware takes the battle-tested compaction pipeline from Claude Code and makes it a composable DeepAgents middleware. One import, and your agents handle 10x longer conversations without blowing the context window.
| What goes wrong | Built-in middleware | compact-middleware |
|---|---|---|
| Summary loses file paths, code, user feedback | Generic prompt | 9-section structured prompt |
| Every compaction = expensive LLM call | Yes | 3 free levels tried first |
| Recently-read files vanish after compaction | Yes | Auto-restores top 5 files + active plan |
| Compaction fails and retries forever | Yes | Circuit breaker after 3 failures |
| No way to clear stale tool results cheaply | Correct | Time-based microcompaction (free) |
| Token counting is pure heuristic | Yes | Hybrid: real API usage + heuristic tail |
Lightweight levels run every turn (each with its own trigger), matching Claude Code's query.ts pipeline. The expensive LLM summarization only fires when tokens exceed the global threshold:
flowchart TD
A["Every turn"] --> B{"1. COLLAPSE"}
B -->|"Group consecutive read/search\ninto badge summaries"| C{"2. TRUNCATE"}
C -->|"Shorten large tool args\nin old messages"| D{"3. MICROCOMPACT"}
D -->|"Clear stale tool results\n(time gap > 60 min)"| E{"Token threshold\nexceeded?"}
E -->|No| F["Done — no LLM call needed"]
E -->|Yes| G{"4. SUMMARIZE"}
G -->|"9-section structured summary\n+ restore files & plan"| H["Done"]
style F fill:#2d6a4f,color:#fff
style H fill:#2d6a4f,color:#fff
style B fill:#1a1a2e,color:#fff
style C fill:#1a1a2e,color:#fff
style D fill:#1a1a2e,color:#fff
style G fill:#e63946,color:#fff
Levels 1-3 are free (no LLM call) and run unconditionally — each has its own internal trigger (group size, arg length, time gap). Only level 4 is gated by the global token threshold.
pip install compact-middleware
# Or from source
pip install git+https://github.com/emanueleielo/compact-middleware.gitfrom deepagents import create_deep_agent
from compact_middleware import CompactionMiddleware, CompactionToolMiddleware
mw = CompactionMiddleware(
model="anthropic:claude-sonnet-4-6",
backend=backend,
)
tool_mw = CompactionToolMiddleware(mw) # optional: lets the agent compact manually
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
tools=[read_file, edit_file, execute],
system_prompt="You are a coding assistant.",
backend=backend,
middleware=[mw, tool_mw],
)
# That's it — compaction triggers automatically at ~85% context usage
result = agent.invoke({"messages": [("human", "Refactor the auth module")]})from deepagents import create_deep_agent
from deepagents.backends import FilesystemBackend
from langchain.chat_models import init_chat_model
from compact_middleware import (
CompactionConfig,
CompactionMiddleware,
CompactionToolMiddleware,
)
backend = FilesystemBackend(root_dir="/data/workspace")
model = init_chat_model("anthropic:claude-sonnet-4-6")
mw = CompactionMiddleware(model=model, backend=backend)
tool_mw = CompactionToolMiddleware(mw)
agent = create_deep_agent(
model=model,
tools=[search_tool, execute_tool, edit_file_tool],
system_prompt="You are a senior engineer.",
backend=backend,
middleware=[mw, tool_mw],
memory=["/memory/AGENTS.md"],
interrupt_on={"edit_file": True},
)
# Async works too — all operations have async variants
result = await agent.ainvoke({"messages": [("human", "Add pagination to the API")]})The defaults match Claude Code's production settings. Override what you need:
from compact_middleware import CompactionConfig, CompactionMiddleware
from compact_middleware.config import (
CollapseConfig,
MicrocompactConfig,
RestorationConfig,
TokenBudgetConfig,
TruncateArgsConfig,
)
config = CompactionConfig(
# --- When to compact ---
trigger=("fraction", 0.80), # at 80% of context window (default: 0.85)
keep=("messages", 10), # keep last 10 messages after compaction
# --- Circuit breaker ---
max_consecutive_failures=5, # default: 3
# --- Custom summary instructions ---
custom_instructions="Focus on code diffs and test output. Include file paths verbatim.",
suppress_follow_up_questions=True, # resume without asking "where were we?"
)
mw = CompactionMiddleware(model=model, backend=backend, config=config)config = CompactionConfig(
microcompact=MicrocompactConfig(
enabled=True, # default
gap_threshold_minutes=30, # clear after 30 min gap (default: 60)
keep_recent=3, # always keep last 3 results (default: 5)
compactable_tools={ # which tools' results can be cleared
"read_file", "execute", "grep", "glob",
"web_search", "web_fetch", "edit_file", "write_file",
},
),
)config = CompactionConfig(
truncate_args=TruncateArgsConfig(
trigger=("fraction", 0.80), # when to start truncating
max_length=1_000, # chars per arg value (default: 2000)
truncate_all_tools=True, # all tools, not just write/edit (default)
),
)config = CompactionConfig(
collapse=CollapseConfig(
enabled=True,
min_group_size=3, # need 3+ consecutive reads to collapse (default: 2)
collapse_tools={"read_file", "grep", "glob", "web_search"},
),
)config = CompactionConfig(
restoration=RestorationConfig(
enabled=True,
max_files=3, # re-read top 3 recent files (default: 5)
file_budget_chars=30_000, # total budget for restored content
per_file_chars=10_000, # max per file
restore_plans=True, # re-attach active plan state
),
)config = CompactionConfig(
token_budget=TokenBudgetConfig(
per_tool_chars=50_000, # max chars per tool result
per_message_chars=200_000, # aggregate max per message turn
),
)The trigger and keep parameters accept three formats:
# Absolute token count
trigger=("tokens", 170_000)
# Fraction of context window (requires model with known context size)
trigger=("fraction", 0.85)
# Message count
trigger=("messages", 50)
# Multiple triggers (any fires)
trigger=[("fraction", 0.85), ("messages", 100)]| Feature | SummarizationMiddleware |
compact-middleware |
|---|---|---|
| Summary prompt | Generic | 9-section structured |
| Pre-summarization optimization | — | Collapse + Truncate + Microcompact |
| Partial compaction (prefix/suffix) | — | Yes |
| Post-compaction restoration | — | Files + Plans |
| Circuit breaker | — | Configurable |
| PTL error recovery | — | Head truncation + retry |
| Token counting | Heuristic only | Hybrid (real API + heuristic) |
| Argument truncation | write_file, edit_file |
All tools |
| Time-based clearing | — | Configurable gap threshold |
| Message collapsing | — | Consecutive read/search |
| Custom summary instructions | — | Yes |
| Async support | Yes | Yes (concurrent offload + summary) |
Unlike a generic "summarize this conversation", the compaction prompt enforces 9 sections that preserve what agents actually need:
- Primary Request & Intent — what the user asked for
- Key Technical Concepts — frameworks, patterns, technologies
- Files & Code Sections — paths, snippets, why they matter
- Errors & Fixes — what broke and how it was resolved
- Problem Solving — debugging strategies and troubleshooting
- All User Messages — verbatim non-tool messages (catches intent drift)
- Pending Tasks — what's still open
- Current Work — exactly where things left off
- Optional Next Step — with direct quotes to prevent task drift
An <analysis> scratchpad is used during generation for quality, then stripped from the final summary.
Most middleware estimates tokens with len(text) / 4. That's a guess.
compact-middleware uses a hybrid approach ported from Claude Code:
Messages: [Human] [AI] [Tool] [AI with usage={input:45000, output:1200}] [Human] [AI]
^ ^ ^
real: 46,200 tokens heuristic only
(from API response) (for these 2)
It walks messages backwards, finds the last AIMessage with real API token usage (response_metadata.usage), and estimates only the messages after it. Falls back to pure heuristic when no API response is available.
Works with Anthropic, OpenAI, and any LangChain-compatible provider.
compact_middleware/
├── middleware.py CompactionMiddleware + CompactionToolMiddleware
├── decision.py Multi-level cascade engine
├── compaction.py LLM summarization (full + partial)
├── tokens.py Hybrid token counting (real API + heuristic)
├── prompts.py 9-section prompt templates
├── microcompact.py Time-based tool result clearing
├── collapse.py Message collapsing
├── truncation.py Argument truncation
├── restoration.py Post-compaction file/plan restoration
├── config.py All configuration dataclasses
└── state.py State schema (TypedDict events)
git clone https://github.com/emanueleielo/compact-middleware
cd compact-middleware
pip install -e ".[dev]"
pytest -v # run tests
ruff check . # lint
mypy compact_middleware # typecheck