Conversation
Greptile SummaryThis PR refactors
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/audio/asr_actor.py | Refactors ASRActor into _ASRBaseActor + ASRCPUActor/ASRGPUActor + ArchetypeOperator; _transcribe_local and the local-model path in _transcribe_one are unreachable dead code left over from the refactoring. |
| nemo_retriever/src/nemo_retriever/audio/init.py | Adds ASRCPUActor and ASRGPUActor to the public all export; straightforward and correct. |
| nemo_retriever/tests/test_asr_actor.py | New and updated tests cover remote/local/GPU actor paths, segment_audio, and archetype resolution; good coverage of the happy paths. |
| nemo_retriever/tests/test_operator_flags_and_cpu_actors.py | Adds ASRCPUActor/ASRGPUActor/ASRActor to the GPU/CPU flag and archetype classification tests; all assertions are accurate. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["ASRActor(params)\n(ArchetypeOperator)"] --> B{"prefers_cpu_variant?\n(_use_remote(params))"}
B -- "Yes (URL set)" --> C["ASRCPUActor\n→ _get_client() → remote gRPC"]
B -- "No (local model)" --> D{"GPU available?"}
D -- "Yes" --> E["ASRGPUActor\n→ _create_local_model()"]
D -- "No" --> F["ASRCPUActor\n→ _create_local_model()"]
C --> G["_call_remote_batch()\n→ _transcribe_one() per row\n→ _transcribe_remote()"]
E --> H["_call_local_batch()\nbatched transcribe()"]
F --> H
G --> I["_build_output_rows()"]
H --> I
Comments Outside Diff (3)
-
nemo_retriever/src/nemo_retriever/audio/asr_actor.py, line 372-376 (link)_transcribe_localis unreachable dead code_transcribe_oneis only ever called from_call_remote_batch(line 177), which itself is only called whenself._client is not None(line 166). Therefore theelsebranch here — and the entire_transcribe_localmethod — can never be reached in normal execution. This appears to be an incomplete refactoring: local-model transcription is now handled exclusively in_call_local_batch, but the per-row fallback in_transcribe_onewas not removed.Consider either removing
_transcribe_localand theelsebranch, or restructuring_call_remote_batchto reuse_call_local_batchfor rows where the remote call fails.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py Line: 372-376 Comment: **`_transcribe_local` is unreachable dead code** `_transcribe_one` is only ever called from `_call_remote_batch` (line 177), which itself is only called when `self._client is not None` (line 166). Therefore the `else` branch here — and the entire `_transcribe_local` method — can never be reached in normal execution. This appears to be an incomplete refactoring: local-model transcription is now handled exclusively in `_call_local_batch`, but the per-row fallback in `_transcribe_one` was not removed. Consider either removing `_transcribe_local` and the `else` branch, or restructuring `_call_remote_batch` to reuse `_call_local_batch` for rows where the remote call fails. How can I resolve this? If you propose a fix, please make it concise.
-
nemo_retriever/src/nemo_retriever/audio/asr_actor.py, line 238-247 (link)Silent row truncation if model returns fewer transcripts than paths
zip(rows_list, transcripts)silently drops tail rows whentranscriptsis shorter thanrows_list. This can happen ifself._model.transcribeskips or omits results for empty-string paths ("") that are appended topaths_for_modelwhen path resolution fails (e.g. bothpathandbytesare missing). Adding a length check with a warning would surface this early.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py Line: 238-247 Comment: **Silent row truncation if model returns fewer transcripts than paths** `zip(rows_list, transcripts)` silently drops tail rows when `transcripts` is shorter than `rows_list`. This can happen if `self._model.transcribe` skips or omits results for empty-string paths (`""`) that are appended to `paths_for_model` when path resolution fails (e.g. both `path` and `bytes` are missing). Adding a length check with a warning would surface this early. How can I resolve this? If you propose a fix, please make it concise.
-
nemo_retriever/src/nemo_retriever/audio/asr_actor.py, line 420-433 (link)apply_asr_to_dfkwargs are accepted but ignoredThe
**kwargs: Anyparameter is declared but never forwarded toactor()orASRParams. Either pass it through or remove the parameter to avoid silently swallowing caller-supplied options.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py Line: 420-433 Comment: **`apply_asr_to_df` kwargs are accepted but ignored** The `**kwargs: Any` parameter is declared but never forwarded to `actor()` or `ASRParams`. Either pass it through or remove the parameter to avoid silently swallowing caller-supplied options. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py
Line: 372-376
Comment:
**`_transcribe_local` is unreachable dead code**
`_transcribe_one` is only ever called from `_call_remote_batch` (line 177), which itself is only called when `self._client is not None` (line 166). Therefore the `else` branch here — and the entire `_transcribe_local` method — can never be reached in normal execution. This appears to be an incomplete refactoring: local-model transcription is now handled exclusively in `_call_local_batch`, but the per-row fallback in `_transcribe_one` was not removed.
Consider either removing `_transcribe_local` and the `else` branch, or restructuring `_call_remote_batch` to reuse `_call_local_batch` for rows where the remote call fails.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py
Line: 238-247
Comment:
**Silent row truncation if model returns fewer transcripts than paths**
`zip(rows_list, transcripts)` silently drops tail rows when `transcripts` is shorter than `rows_list`. This can happen if `self._model.transcribe` skips or omits results for empty-string paths (`""`) that are appended to `paths_for_model` when path resolution fails (e.g. both `path` and `bytes` are missing). Adding a length check with a warning would surface this early.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py
Line: 420-433
Comment:
**`apply_asr_to_df` kwargs are accepted but ignored**
The `**kwargs: Any` parameter is declared but never forwarded to `actor()` or `ASRParams`. Either pass it through or remove the parameter to avoid silently swallowing caller-supplied options.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into archetype" | Re-trigger Greptile
Greptile SummaryThis PR refactors
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/audio/asr_actor.py | Introduces ASRCPUActor (remote gRPC or CPU-local fallback), ASRGPUActor (GPU-local only), and ASRActor archetype. Two issues: (1) _call_local_batch lacks per-row error recovery unlike _call_remote_batch, and empty-path strings can crash the whole batch; (2) _transcribe_local and the else branch in _transcribe_one are dead code. |
| nemo_retriever/src/nemo_retriever/audio/init.py | Adds ASRCPUActor and ASRGPUActor to public exports; straightforward change. |
| nemo_retriever/tests/test_asr_actor.py | Good coverage for CPU/GPU actor split, archetype resolution, remote vs. local model path, empty batch, segmentation, and rejection of remote config on GPU actor. |
| nemo_retriever/tests/test_operator_flags_and_cpu_actors.py | Adds ASRCPUActor/ASRGPUActor to the flags-and-archetype test suite; clean additions consistent with existing patterns. |
Comments Outside Diff (2)
-
nemo_retriever/src/nemo_retriever/audio/asr_actor.py, line 238-246 (link)Missing per-row error isolation in local batch path
The remote batch path (
_call_remote_batch) wraps each row in atry/exceptso one bad file only skips that row. The local batch path has no equivalent guard: ifself._model.transcribe(paths_for_model)raises on any input (including the""strings appended for rows where bothpathandrawwere unavailable — line 235), the entire batch crashes. Additionally, when the model returns fewer transcripts than input paths (e.g., it silently skips empty-path entries), the barezipon line 246 silently truncates output rows, causing undetected data loss.Consider filtering empty paths before calling the model, and using
itertools.zip_longest(with a fallback of"") to ensure every input row produces an output row.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py Line: 238-246 Comment: **Missing per-row error isolation in local batch path** The remote batch path (`_call_remote_batch`) wraps each row in a `try/except` so one bad file only skips that row. The local batch path has no equivalent guard: if `self._model.transcribe(paths_for_model)` raises on *any* input (including the `""` strings appended for rows where both `path` and `raw` were unavailable — line 235), the entire batch crashes. Additionally, when the model returns fewer transcripts than input paths (e.g., it silently skips empty-path entries), the bare `zip` on line 246 silently truncates output rows, causing undetected data loss. Consider filtering empty paths before calling the model, and using `itertools.zip_longest` (with a fallback of `""`) to ensure every input row produces an output row. How can I resolve this? If you propose a fix, please make it concise.
-
nemo_retriever/src/nemo_retriever/audio/asr_actor.py, line 270-287 (link)_transcribe_localis unreachable dead code_transcribe_localis only invoked from theelsebranch of_transcribe_one(lines 372–376), but_transcribe_oneis exclusively called from_call_remote_batch, which is itself guarded byif self._client is not Noneinprocess. This means theelsebranch in_transcribe_one— and therefore_transcribe_localentirely — can never execute. The local single-row transcription logic is also re-implemented (with differences) inside_call_local_batch, adding to the confusion.Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py Line: 270-287 Comment: **`_transcribe_local` is unreachable dead code** `_transcribe_local` is only invoked from the `else` branch of `_transcribe_one` (lines 372–376), but `_transcribe_one` is exclusively called from `_call_remote_batch`, which is itself guarded by `if self._client is not None` in `process`. This means the `else` branch in `_transcribe_one` — and therefore `_transcribe_local` entirely — can never execute. The local single-row transcription logic is also re-implemented (with differences) inside `_call_local_batch`, adding to the confusion. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py
Line: 238-246
Comment:
**Missing per-row error isolation in local batch path**
The remote batch path (`_call_remote_batch`) wraps each row in a `try/except` so one bad file only skips that row. The local batch path has no equivalent guard: if `self._model.transcribe(paths_for_model)` raises on *any* input (including the `""` strings appended for rows where both `path` and `raw` were unavailable — line 235), the entire batch crashes. Additionally, when the model returns fewer transcripts than input paths (e.g., it silently skips empty-path entries), the bare `zip` on line 246 silently truncates output rows, causing undetected data loss.
Consider filtering empty paths before calling the model, and using `itertools.zip_longest` (with a fallback of `""`) to ensure every input row produces an output row.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/audio/asr_actor.py
Line: 270-287
Comment:
**`_transcribe_local` is unreachable dead code**
`_transcribe_local` is only invoked from the `else` branch of `_transcribe_one` (lines 372–376), but `_transcribe_one` is exclusively called from `_call_remote_batch`, which is itself guarded by `if self._client is not None` in `process`. This means the `else` branch in `_transcribe_one` — and therefore `_transcribe_local` entirely — can never execute. The local single-row transcription logic is also re-implemented (with differences) inside `_call_local_batch`, adding to the confusion.
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "Merge branch 'main' into archetype" | Re-trigger Greptile
Description
Fix ASR actor to have both CPU(url) and GPU(model) run capability with archetype
Checklist