Skip to content

AbstractOperators for agentic patterns#1784

Open
mahikaw wants to merge 20 commits intomainfrom
mahikaw/subquery_abstract_operator
Open

AbstractOperators for agentic patterns#1784
mahikaw wants to merge 20 commits intomainfrom
mahikaw/subquery_abstract_operator

Conversation

@mahikaw
Copy link
Copy Markdown
Collaborator

@mahikaw mahikaw commented Apr 2, 2026

Description

This PR introduces two new agentic retrieval operators built on top of AbstractOperator,

  • SubQueryGeneratorOperator — expands each query row into multiple sub-query rows via an LLM. Supports three built-in strategies: decompose (split into sub-aspects), hyde (generate hypothetical answer passages),
    and multi_perspective (rephrase from diverse angles). Downstream retrieval operators can then independently retrieve per sub-query, enabling RRF-style aggregation.
  • SelectionAgentOperator — re-ranks a set of retrieved candidate documents using an agentic LLM loop. The LLM is given the query and candidate docs, then calls a log_selected_documents tool to produce a final
    ranked list. Supports a think tool for extended reasoning.
  • Adds openai>=1.0 as an optional [llm] dependency in pyproject.toml (lazily imported so workers stay serializable).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch 2 times, most recently from 68db72d to e8d2518 Compare April 3, 2026 18:30
@mahikaw mahikaw changed the title [WIP] AbstractOperator for subquery decomposition [WIP] AbstractOperators for agentic patterns Apr 3, 2026
@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from b7f7371 to 6bec8c0 Compare April 9, 2026 23:29
@mahikaw mahikaw marked this pull request as ready for review April 9, 2026 23:30
@mahikaw mahikaw requested review from a team as code owners April 9, 2026 23:30
@mahikaw mahikaw requested a review from jdye64 April 9, 2026 23:30
@mahikaw mahikaw changed the title [WIP] AbstractOperators for agentic patterns AbstractOperators for agentic patterns Apr 9, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR introduces three new agentic retrieval operators (SubQueryGeneratorOperator, SelectionAgentOperator, ReActAgentOperator) and a RRFAggregatorOperator, plus a new invoke_chat_completion_step helper in chat_completions.py. All previously raised P0/P1 issues (logger NameError, missing exc_info=True, _parse_invoke_urls/_post_with_retries not imported, bare KeyError on env vars, unguarded LLM calls, E402 import ordering) have been addressed. Remaining findings are P2 style and consistency concerns.

Confidence Score: 5/5

Safe to merge — all prior P0/P1 issues are resolved; remaining findings are P2 style concerns.

All previously flagged critical and warning-level issues have been fixed in this revision. Remaining issues are minor: an unused parameter, an exception-handler inconsistency in the single-query fast path, and a MagicMock without spec=. None of these affect correctness or data integrity.

react_agent_operator.py (unused credential param in _call_retriever, single-query fast path); test_agentic_operators.py (MagicMock spec)

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py New ReActAgentOperator with full ReAct loop, tool specs, and prompt rendering. Minor concerns: unused credential parameter in _call_retriever and single-query fast path lacks the same exception safety net as the multi-query path.
nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py New SelectionAgentOperator with agentic LLM loop. All previous issues (logger NameError, E402, missing exc_info, empty choices guard, hallucination warning) are resolved.
nemo_retriever/src/nemo_retriever/graph/subquery_operator.py New SubQueryGeneratorOperator with three built-in strategies. _parse_json_list fence-stripping bug fixed, _resolve_api_key moved above try block, specific exception handling with exc_info=True throughout.
nemo_retriever/src/nemo_retriever/graph/rrf_aggregator_operator.py New RRFAggregatorOperator implementing standard reciprocal rank fusion. Pure pandas logic, no external dependencies, well-tested.
nemo_retriever/src/nemo_retriever/nim/chat_completions.py Adds invoke_chat_completion_step for single-step tool-aware calls. Previously flagged NameError for _parse_invoke_urls/_post_with_retries is resolved via module-level import on line 20.
nemo_retriever/tests/test_agentic_operators.py New test file covering RRF, prompt rendering, SelectionAgent, ReActAgent, _parse_json_list, SubQuery preprocess, _build_system_prompt, and _generate_one. MagicMock at line 293 should use spec= to catch API drift.

Sequence Diagram

sequenceDiagram
    participant User
    participant SubQuery as SubQueryGeneratorOperator
    participant ReAct as ReActAgentOperator
    participant LLM as LLM Endpoint
    participant Retriever as retriever_fn
    participant RRF as RRFAggregatorOperator
    participant Selection as SelectionAgentOperator

    Note over User,Selection: Path A — SubQuery expansion
    User->>SubQuery: query DataFrame
    SubQuery->>LLM: decompose / hyde / multi_perspective
    LLM-->>SubQuery: JSON list of sub-queries
    SubQuery-->>User: exploded sub-query DataFrame

    Note over User,Selection: Path B — ReAct + RRF + Selection
    User->>ReAct: query DataFrame
    loop ReAct loop (max_steps)
        ReAct->>LLM: messages + [think, retrieve, final_results] tools
        LLM-->>ReAct: tool_calls
        alt retrieve
            ReAct->>Retriever: (subquery, top_k)
            Retriever-->>ReAct: [{doc_id, text, score}]
        else final_results
            ReAct-->>ReAct: record final_doc_ids, break
        end
    end
    ReAct-->>RRF: (query_id, step_idx, doc_id, rank) rows
    RRF-->>Selection: fused (query_id, doc_id, rrf_score, text)
    loop Selection loop (max_steps)
        Selection->>LLM: messages + [think, log_selected_documents] tools
        LLM-->>Selection: tool_calls
        alt log_selected_documents
            Selection-->>Selection: validate doc_ids, break
        end
    end
    Selection-->>User: (query_id, doc_id, rank, message)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 437-441

Comment:
**Single-query fast path skips the exception safety net present in the multi-query path**

When `len(query_rows) == 1`, `_run_single_query` is called directly with no surrounding `try/except`. The multi-query path (lines 446-463) wraps `future.result()` with handlers for `TimeoutError`, `RuntimeError`, `requests.RequestException`, `json.JSONDecodeError`, and a broad `except Exception` fallback, ensuring one failing query never kills the batch. In the single-query branch any unhandled exception from `_run_single_query` propagates straight through `process()`. While the inner loop already catches most expected errors, aligning both paths prevents silent behavioral divergence for unexpected exceptions.

```suggestion
        if len(query_rows) == 1:
            # Fast path: single query, no threading overhead
            qid, qtxt = query_rows[0]
            try:
                rows.extend(self._run_single_query(qid, qtxt, api_key))
            except Exception as exc:
                logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc, exc_info=True)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 657-693

Comment:
**Unused credential parameter in `_call_retriever`**

The third parameter (`Optional[str]`) is accepted but never used inside the method — `self._retriever_fn` is called with only `query_text` and `fetch_k`. Every call site in `_run_single_query` passes the resolved credential, which is silently dropped. Either remove the parameter (the retriever is expected to close over its own auth) or document why it is forwarded but unused, to avoid misleading future readers.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/tests/test_agentic_operators.py
Line: 293

Comment:
**`MagicMock` without `spec=` will hide future `retriever_fn` contract drift**

Per the mocking-discipline rule, `MagicMock(return_value=...)` without `spec=` or `spec_set=` accepts any attribute access or call signature on the mock, so a change to the expected `retriever_fn(query_text, top_k)` signature would go undetected. Use `spec=` with the concrete callable type or a `Callable` annotation to catch API drift early.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (16): Last reviewed commit: "more exception detailing" | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Copy link
Copy Markdown
Collaborator

@jperez999 jperez999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great first run. Lets add some unit tests for expected data the might come in (and error edge cases) for each of the operators.

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
@mahikaw
Copy link
Copy Markdown
Collaborator Author

mahikaw commented Apr 15, 2026

@greptileai this pr should aim to use NIM instead of openAI APIs. with that updated understanding, re-review it.

Comment thread nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py Outdated
Comment thread nemo_retriever/tests/test_agentic_operators.py
@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 0116107 to b635f58 Compare April 21, 2026 18:29
Comment thread nemo_retriever/src/nemo_retriever/nim/chat_completions.py
Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

Comment thread nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py Outdated
@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 93993e8 to 499803f Compare April 22, 2026 16:51
Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py
@mahikaw mahikaw requested a review from jperez999 April 22, 2026 20:15
Comment thread nemo_retriever/src/nemo_retriever/graph/subquery_operator.py
@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from 76d47fa to d8c2d33 Compare April 22, 2026 22:40
@mahikaw mahikaw force-pushed the mahikaw/subquery_abstract_operator branch from d8c2d33 to bf5850f Compare April 22, 2026 22:46
@mahikaw mahikaw requested a review from jperez999 April 22, 2026 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants