Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions .github/workflows/installation-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,9 @@ jobs:
shell: bash
run: |
set -eo pipefail
whl_name=$(ls *server*.whl)
python -m pip install --find-links . "$whl_name"
common_whl=$(ls memmachine_common-*.whl | head -1)
server_whl=$(ls memmachine_server-*.whl | head -1)
python -m pip install --find-links . "$common_whl" "$server_whl"

#
# ─────────────────────────────────────
Expand Down Expand Up @@ -105,8 +106,9 @@ jobs:
run: |
set -eo pipefail
export PYTHONUTF8=1
whl_name=$(ls *server*.whl)
python -m pip install --find-links . "$whl_name"
common_whl=$(ls memmachine_common-*.whl | head -1)
server_whl=$(ls memmachine_server-*.whl | head -1)
python -m pip install --find-links . "$common_whl" "$server_whl"

#
# ─────────────────────────────────────
Expand All @@ -124,8 +126,9 @@ jobs:
shell: bash
run: |
set -eo pipefail
whl_name=$(ls *server*.whl)
python -m pip install --find-links . "$whl_name"
common_whl=$(ls memmachine_common-*.whl | head -1)
server_whl=$(ls memmachine_server-*.whl | head -1)
python -m pip install --find-links . "$common_whl" "$server_whl"

#
# ─────────────────────────────────────
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/test-server-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,9 @@ jobs:
shell: bash
run: |
set -eo pipefail
whl_name=$(ls dist/memmachine_server-*.whl | head -1)
pip install --find-links dist/ "$whl_name"
common_whl=$(ls dist/memmachine_common-*.whl | head -1)
server_whl=$(ls dist/memmachine_server-*.whl | head -1)
pip install --find-links dist/ "$common_whl" "$server_whl"

- name: Test server package imports
shell: bash
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ configuration.yml
config.yml
cfg.yml
sqlitetest.db
.memmachine_skill_cache.json

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down Expand Up @@ -223,3 +224,6 @@ site/

# Ignore documentation generated by extensions
.spelling

# Evaluiation results
evaluation/retrieval_skill/result/
2 changes: 1 addition & 1 deletion docs/install_guide/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,4 @@ Find specific configuration and integration details for using MemMachine in popu
<Card title="LlamaIndex" icon="wand-magic-sparkles" href="/install_guide/integrate/LlamaIndex">
**Data-Driven Memory.** Add persistent context to your LlamaIndex chat engines, enabling agents to recall user preferences across diverse data sources.
</Card>
</Columns>
</Columns>
50 changes: 46 additions & 4 deletions docs/open_source/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -145,22 +145,64 @@ To see a complete example of a potential `cfg.yml` file, check out [GPU-based Sa
retrieval_agent:
llm_model: my-agent-llm
reranker: my-rrf-reranker
agent_session_provider: openai
agent_session_timeout_seconds: 180
agent_session_max_combined_calls: 10
agent_session_log_raw_output: true
agent_session_max_retry_interval_seconds: 120
openai_native_agent_environment: local
```
</Tab>
<Tab title="With Comments">
```yaml
retrieval_agent:
llm_model: my-agent-llm # ID of the Language Model from 'resources.language_models'
reranker: my-rrf-reranker # ID of the Reranker from 'resources.rerankers'
agent_session_provider: openai # Provider runtime for provider-native retrieval sessions
agent_session_timeout_seconds: 180 # Global timeout budget per retrieval-agent session
agent_session_max_combined_calls: 10 # Total memmachine_search call budget across the session
agent_session_log_raw_output: true # Emit raw provider payloads to debug logs
agent_session_max_retry_interval_seconds: 120 # Retry backoff cap for provider session calls
openai_native_agent_environment: local # OpenAI attachment shell environment (local | container_auto)
```
</Tab>
</Tabs>

**Parameter Descriptions:**
| Parameter | Description | Default |
|-----------------------|-----------------------------------------------------------------------------|--------------|
| `llm_model` | The language model ID used by retrieval-agent routing/rewrite steps. | Auto-resolved from configured memory models when omitted. |
| `reranker` | The reranker ID used by retrieval-agent result reranking. | Auto-resolved from episodic long-term-memory reranker when omitted. |
| Parameter | Description | Default |
|-----------|-------------|---------|
| `llm_model` | Language-model ID used to enable the retrieval-agent path. When omitted, MemMachine falls back to the short-term-memory model, then the semantic-memory model if available. | Auto-resolved |
| `reranker` | Reranker ID used for retrieval-agent result reranking. When omitted, MemMachine falls back to the episodic long-term-memory reranker. | Auto-resolved |
| `agent_session_provider` | Provider runtime used for the provider-native retrieval session. Supported values: `openai`, `anthropic`. | `openai` |
| `agent_session_timeout_seconds` | Global timeout budget for one retrieval-agent session. | `180` |
| `agent_session_max_combined_calls` | Combined call budget for top-level and branch `memmachine_search` calls in one session. | `10` |
| `agent_session_log_raw_output` | Whether to write raw provider payloads to debug logs. Use with care in production. | `true` |
| `agent_native_bundle_root` | Optional directory used to materialize markdown bundles for provider attachments. | System temp dir |
| `agent_session_max_retry_interval_seconds` | Maximum retry backoff interval for provider session retries. | `120` |
| `openai_native_agent_environment` | Shell environment type used for OpenAI provider attachments. Supported values: `local`, `container_auto`. | `local` |
| `anthropic_model` | Anthropic model ID used when `agent_session_provider=anthropic`. | `claude-sonnet-4-5` |
| `anthropic_api_key` | Anthropic API key used when `agent_session_provider=anthropic`. Supports `$ENV` and `${ENV}` syntax. | `null` |
| `anthropic_base_url` | Optional Anthropic API base URL override. | `null` |
| `anthropic_max_output_tokens` | Max output tokens per Anthropic Messages call during a retrieval session. | `2048` |

<Note>
`retrieval_agent` is only used when a client sends `agent_mode=true`.
Even if you run the live provider session with Anthropic, `llm_model` and
`reranker` still need to resolve successfully so MemMachine can enable the
retrieval-agent path.
</Note>

Example Anthropic configuration:

```yaml
retrieval_agent:
llm_model: my-agent-llm
reranker: my-rrf-reranker
agent_session_provider: anthropic
anthropic_model: claude-sonnet-4-5
anthropic_api_key: ${ANTHROPIC_API_KEY}
anthropic_max_output_tokens: 2048
```
</Accordion>

<Accordion title="Semantic Memory">
Expand Down
142 changes: 80 additions & 62 deletions docs/open_source/retrieval_agent_architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,97 +4,115 @@ description: "How MemMachine uses intelligent orchestration to solve complex, mu
icon: "brain-circuit"
---

Sometimes a single vector search isn't enough. `agent_mode` adds an intelligent orchestration layer to episodic long-term memory retrieval, moving beyond simple similarity to active reasoning. Use `agent_mode=true` in your memory search APIs to handle complex questions that require more than one step to answer.
Sometimes a single vector search is not enough. When you pass
`agent_mode=true`, MemMachine switches from the default episodic search path to
the top-level `RetrievalAgent`.

## How it Works
The current retrieval agent is a provider-native, multi-turn search loop. It
runs inside one LLM session, uses `memmachine_search` as its only external tool,
and can enter an internal chain-of-query (`coq`) branch for multi-hop
questions.

## How It Works

<Steps>
<Step title="Smart Routing">
The `ToolSelectAgent` analyzes your query to decide the best path forward: a direct lookup, splitting the query into parts, or following a chain of evidence.
<Step title="Provider-Native Session">
MemMachine uploads a markdown agent bundle and runs the retrieval loop
inside a provider-native session. OpenAI Responses is the default runtime;
Anthropic Messages can be enabled in `retrieval_agent` configuration.
</Step>
<Step title="Single Tool Contract">
The top-level agent uses only one external tool: `memmachine_search`.
Routing is internal to the session. There is no separate routing tool call
exposed to the client.
</Step>
<Step title="Specialized Execution">
The orchestrator hands the query to a specialized agent that can dig through your memory based on the specific complexity of your question.
<Step title="Internal COQ Branch">
For multi-hop questions, the agent can switch to attached `coq` guidance.
That branch keeps the same session alive and resolves dependency hops in
order, still using only `memmachine_search`.
</Step>
<Step title="Evidence Aggregation">
MemMachine gathers all the findings, reranks them for precision, and returns the results along with clear metrics on how it found the answer.
<Step title="Merge, Rerank, Return">
MemMachine merges semantic and episodic evidence across hops, reranks the
final long-term episodic results, and returns both the answer context and a
`retrieval_trace`.
</Step>
</Steps>

![Retrieval Agent Orchestration](/images/intelligent_orchestration.png)

## Agent Taxonomy

Not all queries are created equal. We use different agents to handle different levels of complexity:
## Current Components

| Agent | Role | Best For |
| Component | Role | Notes |
| :--- | :--- | :--- |
| `MemMachineAgent` | Direct Retrieval | Simple, one-shot lookups. |
| `SplitQueryAgent` | Parallel Search | Queries with multiple independent entities or constraints. |
| `ChainOfQueryAgent` | Multi-hop Retrieval | Complex relationship chains where facts depend on each other. |
| `RetrievalAgent` | Top-level orchestrator | Activated by `agent_mode=true`. |
| `memmachine_search` | Only external tool | Used for every retrieval hop. |
| `coq` | Internal chain-of-query branch | Used for multi-hop dependency resolution. |

---
## When To Use `agent_mode`

## Why Agentic Retrieval Matters
`agent_mode` is most useful when the answer cannot be recovered from one direct
lookup, for example:

Standard vector search works great when a query maps directly to a single memory. However, real-world questions are often "messy."
- Multi-hop relationship chains such as `person -> spouse -> employer`.
- Comparison questions that require resolving both sides before deciding.
- Queries where the final attribute depends on an intermediate identity lookup.
- Cases where you want the retrieval trace for debugging and evaluation.

`agent_mode` is designed for scenarios that require:
- **Multi-hop chains:** Where Fact B can't be found until you find Fact A.
- **Relationship traversal:** Jumping across entities (e.g., `Person` -> `Organization` -> `Role`).
- **Mixed constraints:** Filtering by time, location, and role simultaneously in steps.
- **Sufficiency checks:** Ensuring the agent doesn't stop until it actually has enough evidence.
<Note>
Setting `agent_mode=false` keeps the standard `EpisodicMemory` search path. It
is faster for simple lookups and should remain the default when you do not need
multi-step reasoning.
</Note>

### The "Spouse" Problem (Multi-hop struggle)
Imagine asking: *"What is the current company of the spouse of the CEO of Acme?"*
## Example: Resolving an Intermediate Hop

A standard search might over-focus on "Acme" and "CEO," completely missing the spouse's data because that entity hasn't been identified yet.
Question:

### How MemMachine Fixes This
1. **Detection:** `ToolSelectAgent` sees the complexity and routes to a chain-based strategy.
2. **Iteration:** `ChainOfQueryAgent` finds the CEO first, identifies the spouse, and then searches for that spouse's company.
3. **Verification:** At each step, the agent checks if it has enough info to move forward.
4. **Ranking:** All gathered evidence is combined and ranked to give you the most relevant answer.
> "Where was the father of Rembrandt's wife born?"

<Note>
Setting `agent_mode=false` (default) uses the standard `EpisodicMemory` path. This is faster for simple queries but may struggle with multi-step reasoning.
</Note>
A direct search can easily over-focus on Rembrandt or his wife. The retrieval
agent instead works hop-by-hop:

### Workflow Diagram
1. Search for Rembrandt's wife.
2. Resolve her father.
3. Search for that father's birthplace.
4. Stop only when the final asked attribute is supported.

This diagram shows how our Intelligent Orchestration resolves these patterns by branching between standard and agentic paths:
## Configuration

![Retrieval Agent Workflow Diagram](/images/retrieval-agent-workflow.png)
Runtime configuration lives under the top-level `retrieval_agent` section in
`cfg.yml`. Key settings include:

---
- `llm_model` and `reranker` for the retrieval path.
- `agent_session_provider` to choose `openai` or `anthropic`.
- Session budgets such as `agent_session_timeout_seconds` and
`agent_session_max_combined_calls`.
- Provider-specific settings such as `anthropic_model` and
`anthropic_api_key`.

## Configuration & Extension
See the [Configuration](/open_source/configuration) page for the full field list
and examples.

You can fine-tune how these agents behave using the `extra_params` dictionary.
## Search API Example

```python
# Example: Tuning agent behavior for higher precision
memory.search(
query="Find the CEO of Acme's spouse's current company",
results = memory.search(
query="Where was the father of Rembrandt's wife born?",
agent_mode=True,
extra_params={
"max_attempts": 3, # How many hops to allow
"confidence_score": 0.85 # Threshold for stopping early
}
)
```

<Tip>**Pro Tip:** If the `ChainOfQueryAgent` isn't navigating your data correctly, check the `selected_tool` and `confidence_scores` in your metadata. You can often fix "lost" agents by providing more context in the `combined_prompt` within your config.</Tip>

------
print(results.content.retrieval_trace)
```

## Metrics and Telemetry
## Retrieval Trace

Transparency is key. We provide detailed metrics so you can see exactly how the agent "thought" through your query:
When `agent_mode=true`, MemMachine can return a `retrieval_trace` describing
how the search ran.

| **Metric** | **Purpose** |
| ---------------------- | ------------------------------------------------------------ |
| `selected_tool` | Identifies which agent was chosen to handle the heavy lifting. |
| `queries` | Shows the specific sub-queries generated during the process. |
| `memory_search_called` | The total number of times the agent hit the database. |
| `llm_time` | How long the orchestration/reasoning steps took. |
| `confidence_scores` | The certainty level for each hop in a chain. |
| Metric | Purpose |
| :--- | :--- |
| `selected_agent` | Internal route label, typically `memmachine_search` or `coq`. |
| `selected_agent_name` | Human-readable route label such as `MemMachineSearch` or `ChainOfQueryAgent`. |
| `memory_search_called` | Total number of `memmachine_search` calls executed. |
| `orchestrator_step_count` | Number of orchestration steps completed in the session. |
| `orchestrator_trace` | Full event trace for debugging and evaluation. |
| `stage_results` | Optional intermediate hop summaries when stage-result mode is enabled. |
| `stage_result_memory_returned` | Indicates whether confident stage summaries were returned as memory lines. |
Loading
Loading