Conversation
68db72d to
e8d2518
Compare
b7f7371 to
6bec8c0
Compare
Greptile SummaryThis PR introduces three new agentic retrieval operators (
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py | New ReActAgentOperator with full ReAct loop, tool specs, and prompt rendering. Minor concerns: unused credential parameter in _call_retriever and single-query fast path lacks the same exception safety net as the multi-query path. |
| nemo_retriever/src/nemo_retriever/graph/selection_agent_operator.py | New SelectionAgentOperator with agentic LLM loop. All previous issues (logger NameError, E402, missing exc_info, empty choices guard, hallucination warning) are resolved. |
| nemo_retriever/src/nemo_retriever/graph/subquery_operator.py | New SubQueryGeneratorOperator with three built-in strategies. _parse_json_list fence-stripping bug fixed, _resolve_api_key moved above try block, specific exception handling with exc_info=True throughout. |
| nemo_retriever/src/nemo_retriever/graph/rrf_aggregator_operator.py | New RRFAggregatorOperator implementing standard reciprocal rank fusion. Pure pandas logic, no external dependencies, well-tested. |
| nemo_retriever/src/nemo_retriever/nim/chat_completions.py | Adds invoke_chat_completion_step for single-step tool-aware calls. Previously flagged NameError for _parse_invoke_urls/_post_with_retries is resolved via module-level import on line 20. |
| nemo_retriever/tests/test_agentic_operators.py | New test file covering RRF, prompt rendering, SelectionAgent, ReActAgent, _parse_json_list, SubQuery preprocess, _build_system_prompt, and _generate_one. MagicMock at line 293 should use spec= to catch API drift. |
Sequence Diagram
sequenceDiagram
participant User
participant SubQuery as SubQueryGeneratorOperator
participant ReAct as ReActAgentOperator
participant LLM as LLM Endpoint
participant Retriever as retriever_fn
participant RRF as RRFAggregatorOperator
participant Selection as SelectionAgentOperator
Note over User,Selection: Path A — SubQuery expansion
User->>SubQuery: query DataFrame
SubQuery->>LLM: decompose / hyde / multi_perspective
LLM-->>SubQuery: JSON list of sub-queries
SubQuery-->>User: exploded sub-query DataFrame
Note over User,Selection: Path B — ReAct + RRF + Selection
User->>ReAct: query DataFrame
loop ReAct loop (max_steps)
ReAct->>LLM: messages + [think, retrieve, final_results] tools
LLM-->>ReAct: tool_calls
alt retrieve
ReAct->>Retriever: (subquery, top_k)
Retriever-->>ReAct: [{doc_id, text, score}]
else final_results
ReAct-->>ReAct: record final_doc_ids, break
end
end
ReAct-->>RRF: (query_id, step_idx, doc_id, rank) rows
RRF-->>Selection: fused (query_id, doc_id, rrf_score, text)
loop Selection loop (max_steps)
Selection->>LLM: messages + [think, log_selected_documents] tools
LLM-->>Selection: tool_calls
alt log_selected_documents
Selection-->>Selection: validate doc_ids, break
end
end
Selection-->>User: (query_id, doc_id, rank, message)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 437-441
Comment:
**Single-query fast path skips the exception safety net present in the multi-query path**
When `len(query_rows) == 1`, `_run_single_query` is called directly with no surrounding `try/except`. The multi-query path (lines 446-463) wraps `future.result()` with handlers for `TimeoutError`, `RuntimeError`, `requests.RequestException`, `json.JSONDecodeError`, and a broad `except Exception` fallback, ensuring one failing query never kills the batch. In the single-query branch any unhandled exception from `_run_single_query` propagates straight through `process()`. While the inner loop already catches most expected errors, aligning both paths prevents silent behavioral divergence for unexpected exceptions.
```suggestion
if len(query_rows) == 1:
# Fast path: single query, no threading overhead
qid, qtxt = query_rows[0]
try:
rows.extend(self._run_single_query(qid, qtxt, api_key))
except Exception as exc:
logger.warning("ReActAgentOperator: query %r failed: %s", qid, exc, exc_info=True)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/react_agent_operator.py
Line: 657-693
Comment:
**Unused credential parameter in `_call_retriever`**
The third parameter (`Optional[str]`) is accepted but never used inside the method — `self._retriever_fn` is called with only `query_text` and `fetch_k`. Every call site in `_run_single_query` passes the resolved credential, which is silently dropped. Either remove the parameter (the retriever is expected to close over its own auth) or document why it is forwarded but unused, to avoid misleading future readers.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/tests/test_agentic_operators.py
Line: 293
Comment:
**`MagicMock` without `spec=` will hide future `retriever_fn` contract drift**
Per the mocking-discipline rule, `MagicMock(return_value=...)` without `spec=` or `spec_set=` accepts any attribute access or call signature on the mock, so a change to the expected `retriever_fn(query_text, top_k)` signature would go undetected. Use `spec=` with the concrete callable type or a `Callable` annotation to catch API drift early.
How can I resolve this? If you propose a fix, please make it concise.Reviews (16): Last reviewed commit: "more exception detailing" | Re-trigger Greptile
jperez999
left a comment
There was a problem hiding this comment.
Great first run. Lets add some unit tests for expected data the might come in (and error edge cases) for each of the operators.
|
@greptileai this pr should aim to use NIM instead of openAI APIs. with that updated understanding, re-review it. |
0116107 to
b635f58
Compare
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
93993e8 to
499803f
Compare
…de benchmark numbers
…AgentOperator, RRFAggregatorOperator
76d47fa to
d8c2d33
Compare
d8c2d33 to
bf5850f
Compare
Description
This PR introduces two new agentic retrieval operators built on top of AbstractOperator,
and multi_perspective (rephrase from diverse angles). Downstream retrieval operators can then independently retrieve per sub-query, enabling RRF-style aggregation.
ranked list. Supports a think tool for extended reasoning.
Checklist