Skip to content

feat: pass calling SessionContext to Python UDTF callbacks#1555

Open
timsaucer wants to merge 4 commits into
apache:mainfrom
timsaucer:feat/df54-followups-wave3
Open

feat: pass calling SessionContext to Python UDTF callbacks#1555
timsaucer wants to merge 4 commits into
apache:mainfrom
timsaucer:feat/df54-followups-wave3

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Related to #1533

Rationale for this change

Upstream DataFusion deprecated call() on table functions so that we could support far more features. Instead we should use call_with_args. That is already being done in main using default table function arguments. This PR exposes the full functionality of the function signature.

What changes are included in this PR?

  • Enhance table functions to take the table parameters.
  • Add unit tests

Are there any user-facing changes?

Yes, this changes the expected arguments to pure python table functions.

DataFusion 53 added `TableFunctionImpl::call_with_args(TableFunctionArgs)`
where `TableFunctionArgs` carries both the positional expression
arguments and the calling `&dyn Session`. The pure-Python UDTF path
previously discarded everything but the exprs.

Thread the session through when the user callback's signature opts in
by declaring a `session` keyword parameter (or `**kwargs`). At call
time we downcast the `&dyn Session` to its canonical `SessionState`
impl and build a fresh `SessionContext` over the same Arc-shared state,
exposed to Python as a `datafusion.SessionContext` wrapper. Existing
callbacks whose signatures do not declare `session` continue to be
called with the positional expression arguments only — no behavior
change for current users.

Note: a UDTF body cannot drive a fresh `ctx.sql(...).collect()` on the
passed-in session because the outer SQL execution already holds the
tokio runtime. Use the session for metadata access (catalogs, UDF
lookups, config) rather than nested DataFrame collection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The doc comment implied a foreign FFI session was a real input. No
current path reaches a pure-Python UDTF with a non-SessionState
session: the SQL planner and __call__ both hand a SessionState, and a
ForeignSession would only arrive via FFI-export of the UDTF, which
datafusion-python does not do. Reword to state the guard is defensive
and rewrap the error string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review May 27, 2026 23:17
@timsaucer timsaucer mentioned this pull request May 27, 2026
11 tasks
Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM the accepts_session bit feels a little bit like a weird work around. Motivation makes sense and implementation is reasonable.

Comment thread crates/core/src/udtf.rs Outdated
impl PyTableFunction {
#[new]
#[pyo3(signature=(name, func, session))]
#[pyo3(signature=(name, func, session, accepts_session=false))]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an internal class I wonder if we need this extra argument. It feels like potentially the presence of the session could already encode this, but maybe that's more confusing than just adding the extra argument.

timsaucer and others added 2 commits May 28, 2026 17:27
Replaces signature sniffing with an explicit ``with_session=True`` kwarg
on ``TableFunction`` / ``udtf``. Avoids name-based detection footguns
(positional-only ``session`` params, accidental ``**kwargs`` opt-in,
shadowing by unrelated params) and makes author intent visible at
registration. Also documents the feature in the UDTF user guide.

Rust field renamed ``accepts_session`` -> ``inject_session_on_call`` to
match the Python-side opt-in semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Raise TypeError when with_session=True is combined with an FFI-exported
table function (one exposing __datafusion_table_function__). The Rust
FFI branch does not consult the flag, so it would silently be dropped;
guard both TableFunction.__init__ and the udtf() convenience entry.

Qualify the doc claim that mutations through the injected session
propagate to the caller: registry mutations do (shared Arc registries),
but config changes do not (SessionConfig is cloned). Mirror the caveat
in TableFunction.__init__ per the user-guide caveats convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants