AbstractOperators for agentic patterns by mahikaw · Pull Request #1784 · NVIDIA/NeMo-Retriever

mahikaw · 2026-04-02T18:55:25Z

Description

This PR introduces two new agentic retrieval operators built on top of AbstractOperator,

SubQueryGeneratorOperator — expands each query row into multiple sub-query rows via an LLM. Supports three built-in strategies: decompose (split into sub-aspects), hyde (generate hypothetical answer passages),
and multi_perspective (rephrase from diverse angles). Downstream retrieval operators can then independently retrieve per sub-query, enabling RRF-style aggregation.
SelectionAgentOperator — re-ranks a set of retrieved candidate documents using an agentic LLM loop. The LLM is given the query and candidate docs, then calls a log_selected_documents tool to produce a final
ranked list. Supports a think tool for extended reasoning.
Adds openai>=1.0 as an optional [llm] dependency in pyproject.toml (lazily imported so workers stay serializable).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

copy-pr-bot · 2026-04-02T18:55:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-09T23:35:22Z

Greptile Summary

This PR introduces three new agentic retrieval operators (SubQueryGeneratorOperator, SelectionAgentOperator, ReActAgentOperator) and a RRFAggregatorOperator, plus a new invoke_chat_completion_step helper in chat_completions.py. All previously raised P0/P1 issues (logger NameError, missing exc_info=True, _parse_invoke_urls/_post_with_retries not imported, bare KeyError on env vars, unguarded LLM calls, E402 import ordering) have been addressed. Remaining findings are P2 style and consistency concerns.

Confidence Score: 5/5

Safe to merge — all prior P0/P1 issues are resolved; remaining findings are P2 style concerns.

All previously flagged critical and warning-level issues have been fixed in this revision. Remaining issues are minor: an unused parameter, an exception-handler inconsistency in the single-query fast path, and a MagicMock without spec=. None of these affect correctness or data integrity.

react_agent_operator.py (unused credential param in _call_retriever, single-query fast path); test_agentic_operators.py (MagicMock spec)

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py	New ReActAgentOperator with full ReAct loop, tool specs, and prompt rendering. Minor concerns: unused credential parameter in _call_retriever and single-query fast path lacks the same exception safety net as the multi-query path.
nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py	New SelectionAgentOperator with agentic LLM loop. All previous issues (logger NameError, E402, missing exc_info, empty choices guard, hallucination warning) are resolved.
nemo_retriever/src/nemo_retriever/graph/subquery_operator.py	New SubQueryGeneratorOperator with three built-in strategies. _parse_json_list fence-stripping bug fixed, _resolve_api_key moved above try block, specific exception handling with exc_info=True throughout.
nemo_retriever/src/nemo_retriever/graph/rrf_aggregator_operator.py	New RRFAggregatorOperator implementing standard reciprocal rank fusion. Pure pandas logic, no external dependencies, well-tested.
nemo_retriever/src/nemo_retriever/nim/chat_completions.py	Adds invoke_chat_completion_step for single-step tool-aware calls. Previously flagged NameError for _parse_invoke_urls/_post_with_retries is resolved via module-level import on line 20.
nemo_retriever/tests/test_agentic_operators.py	New test file covering RRF, prompt rendering, SelectionAgent, ReActAgent, _parse_json_list, SubQuery preprocess, _build_system_prompt, and _generate_one. MagicMock at line 293 should use spec= to catch API drift.

Sequence Diagram

sequenceDiagram
    participant User
    participant SubQuery as SubQueryGeneratorOperator
    participant ReAct as ReActAgentOperator
    participant LLM as LLM Endpoint
    participant Retriever as retriever_fn
    participant RRF as RRFAggregatorOperator
    participant Selection as SelectionAgentOperator

    Note over User,Selection: Path A — SubQuery expansion
    User->>SubQuery: query DataFrame
    SubQuery->>LLM: decompose / hyde / multi_perspective
    LLM-->>SubQuery: JSON list of sub-queries
    SubQuery-->>User: exploded sub-query DataFrame

    Note over User,Selection: Path B — ReAct + RRF + Selection
    User->>ReAct: query DataFrame
    loop ReAct loop (max_steps)
        ReAct->>LLM: messages + [think, retrieve, final_results] tools
        LLM-->>ReAct: tool_calls
        alt retrieve
            ReAct->>Retriever: (subquery, top_k)
            Retriever-->>ReAct: [{doc_id, text, score}]
        else final_results
            ReAct-->>ReAct: record final_doc_ids, break
        end
    end
    ReAct-->>RRF: (query_id, step_idx, doc_id, rank) rows
    RRF-->>Selection: fused (query_id, doc_id, rrf_score, text)
    loop Selection loop (max_steps)
        Selection->>LLM: messages + [think, log_selected_documents] tools
        LLM-->>Selection: tool_calls
        alt log_selected_documents
            Selection-->>Selection: validate doc_ids, break
        end
    end
    Selection-->>User: (query_id, doc_id, rank, message)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 437-441

Comment:
**Single-query fast path skips the exception safety net present in the multi-query path**

When `len(query_rows) == 1`, `_run_single_query` is called directly with no surrounding `try/except`. The multi-query path (lines 446-463) wraps `future.result()` with handlers for `TimeoutError`, `RuntimeError`, `requests.RequestException`, `json.JSONDecodeError`, and a broad `except Exception` fallback, ensuring one failing query never kills the batch. In the single-query branch any unhandled exception from `_run_single_query` propagates straight through `process()`. While the inner loop already catches most expected errors, aligning both paths prevents silent behavioral divergence for unexpected exceptions.

```suggestion
        if len(query_rows) == 1:
            # Fast path: single query, no threading overhead
            qid, qtxt = query_rows[0]
            try:
                rows.extend(self._run_single_query(qid, qtxt, api_key))
            except Exception as exc:
                logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc, exc_info=True)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 657-693

Comment:
**Unused credential parameter in `_call_retriever`**

The third parameter (`Optional[str]`) is accepted but never used inside the method — `self._retriever_fn` is called with only `query_text` and `fetch_k`. Every call site in `_run_single_query` passes the resolved credential, which is silently dropped. Either remove the parameter (the retriever is expected to close over its own auth) or document why it is forwarded but unused, to avoid misleading future readers.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/tests/test_agentic_operators.py
Line: 293

Comment:
**`MagicMock` without `spec=` will hide future `retriever_fn` contract drift**

Per the mocking-discipline rule, `MagicMock(return_value=...)` without `spec=` or `spec_set=` accepts any attribute access or call signature on the mock, so a change to the expected `retriever_fn(query_text, top_k)` signature would go undetected. Use `spec=` with the concrete callable type or a `Callable` annotation to catch API drift early.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (16): Last reviewed commit: "more exception detailing" | Re-trigger Greptile}

jperez999

Great first run. Lets add some unit tests for expected data the might come in (and error edge cases) for each of the operators.

mahikaw · 2026-04-15T19:01:31Z

@greptileai this pr should aim to use NIM instead of openAI APIs. with that updated understanding, re-review it.

greptile-apps · 2026-04-22T16:44:09Z

Want your agent to iterate on Greptile's feedback? Try greploops.

…ors, vanilla retrieval

…de benchmark numbers

…completion_step

…AgentOperator, RRFAggregatorOperator

mahikaw force-pushed the mahikaw/subquery_abstract_operator branch 2 times, most recently from 68db72d to e8d2518 Compare April 3, 2026 18:30

mahikaw changed the title ~~[WIP] AbstractOperator for subquery decomposition~~ [WIP] AbstractOperators for agentic patterns Apr 3, 2026

mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from b7f7371 to 6bec8c0 Compare April 9, 2026 23:29

mahikaw marked this pull request as ready for review April 9, 2026 23:30

mahikaw requested review from a team as code owners April 9, 2026 23:30

mahikaw requested a review from jdye64 April 9, 2026 23:30

mahikaw changed the title ~~[WIP] AbstractOperators for agentic patterns~~ AbstractOperators for agentic patterns Apr 9, 2026

greptile-apps Bot reviewed Apr 9, 2026

View reviewed changes

jperez999 requested changes Apr 14, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated

greptile-apps Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py Outdated

greptile-apps Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py Outdated

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated

greptile-apps Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py Outdated

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread nemo_retriever/tests/test_agentic_operators.py

mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 0116107 to b635f58 Compare April 21, 2026 18:29

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/nim/chat_completions.py

greptile-apps Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py

greptile-apps Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated

mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 93993e8 to 499803f Compare April 22, 2026 16:51

greptile-apps Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py

mahikaw requested a review from jperez999 April 22, 2026 20:15

jperez999 requested changes Apr 22, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py

Mahika Wason and others added 3 commits April 22, 2026 15:40

Subquery generator AbstractOperator init

5f1af2f

adding selection agent operator: doc retrieval ranking

1284d31

e2e example script: graph-based ingestion, graph-based agentic operat…

a75148c

…ors, vanilla retrieval

mahikaw added 15 commits April 22, 2026 15:40

cleanup: renaming to be more contextual

21b1fbf

fix: pre-commit failures

a4e7971

Removing sample pipeline from this PR, to be added separately alongsi…

82a0f3a

…de benchmark numbers

added ReactAgentOperator and removed openai API dep

94bc907

precommit fixes

5651ff2

added RRFAggregatorOperator

b6db032

updated SelectionAgentOperator to use NIMs

009b95a

updated SubQueryGeneratorOperator to use NIMs, and added invoke_chat_…

47c4309

…completion_step

Added unit tests for agentic operators: SelectionAgentOperator, ReAct…

44fdc17

…AgentOperator, RRFAggregatorOperator

Adding parallel tool call params and other clean up

a4ba6d6

addressed review comments: cleanup + added unit tests

1adceec

more fixes

e7e1446

minor logger placement change

94bc16b

formatting fixes: precommit

031c806

more error handling

04f1a1d

mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 76d47fa to d8c2d33 Compare April 22, 2026 22:40

added exception detailling

bf5850f

mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from d8c2d33 to bf5850f Compare April 22, 2026 22:46

more exception detailing

c1f28be

mahikaw requested a review from jperez999 April 22, 2026 23:05

Conversation

mahikaw commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 2, 2026

Uh oh!

greptile-apps Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jperez999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mahikaw commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mahikaw commented Apr 2, 2026 •

edited

Loading

greptile-apps Bot commented Apr 9, 2026 •

edited

Loading