python: add support for adhoc query as pyarrow table#5814
python: add support for adhoc query as pyarrow table#5814monochromatti wants to merge 3 commits intofeldera:mainfrom
Conversation
4065f37 to
edcaa7e
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
LGTM — but see inline: there is an existing open PR covering the same feature.
|
hi @monochromatti this looks good thanks a lot for your contribution. @abhizer can you review this |
|
I'd like input on whether to return Generator[pyarrow.RecordBatch, ...] or a pyarrow.Table directly. The latter is the current state of the PR, but after some thinking it feels like generating batches is more in style with similar existing functionality and better suited for big payloads. |
abhizer
left a comment
There was a problem hiding this comment.
Thank you!
As a heads up, the reason we didn't merge the prior PR is because the server intermittently sent bad data and we were unable to figure out why.
We normally return a generator, and it might be a good idea to keep this behavior consistent. |
edcaa7e to
dd5c74e
Compare
|
@monochromatti please re-request a review from @abhizer when this is ready again |
379bfe8 to
5f06e6a
Compare
5f06e6a to
2cc02ae
Compare
|
Rebased on main to solve a uv.lock conflict |
|
Sorry I might be missing something, but the PR still requires an approval to run CI? |
|
Done! |
|
The "Pre Merge Queue Tasks" CI failure looks transient — the failing step is the Rust build check, but this PR has no Rust changes. The same step failed and then passed for other PRs around the same time. Could someone re-trigger CI? |
|
CI is still showing a failure on "Pre Merge Queue Tasks" from Apr 4 — looks like nobody re-triggered it yet. Could someone queue a fresh run? This is a Python-only PR and that step has been transiently failing for unrelated Rust check reasons. |
|
You might have to run "ruff format" for it to pass the pre merge queue. |
2cc02ae to
d0a2187
Compare
Signed-off-by: Mattias Matthiesen <mattias.matthiesen@eviny.no>
Signed-off-by: Mattias Matthiesen <mattias.matthiesen@eviny.no>
Signed-off-by: Mattias Matthiesen <mattias.matthiesen@eviny.no>
d0a2187 to
541b6b7
Compare
|
Updated PR body and solved uv.lock conflict (exclude-newer timestamp). @abhizer |
|
Thank you! |
Ran tests locally against a running Feldera API.
From
python/:tests/runtime_aggtest):uv run python -m pytest tests/ --ignore=tests/runtime_aggtest -ra122 passed, 45 skippeduv run python -m pytest tests/platform/test_shared_pipeline.py::TestPipeline::test_adhoc_query_arrow -quv run python -m pytest tests/unit/test_query_as_arrow.py -qChecklist
Breaking Changes?
Mark if you think the answer is yes for any of these components:
Describe Incompatible Changes
None.
Summary
This PR adds Arrow IPC query support to the Python SDK so ad-hoc query results can be consumed as streamed PyArrow record batches.
What changed
FelderaClient.query_as_arrow(pipeline_name, query) -> Generator[pyarrow.RecordBatch, None, None]Pipeline.query_arrow(query) -> Generator[pyarrow.RecordBatch, None, None]pip install "feldera[arrow]"Notes
stream=True) and yielded batch-by-batch.pyarrow.Tablewhen desired viapyarrow.Table.from_batches(...).