Skip to content

feat: Experimental transpilation of unannotated python callables#17419

Draft
TrevorBergeron wants to merge 1 commit into
mainfrom
tbergeron_transpile_path
Draft

feat: Experimental transpilation of unannotated python callables#17419
TrevorBergeron wants to merge 1 commit into
mainfrom
tbergeron_transpile_path

Conversation

@TrevorBergeron

Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an experimental Python transpiler feature, allowing standard Python callables to be transpiled and executed within DataFrame.map, DataFrame.apply, Series.apply, and Series.combine. It replaces the func_to_op helper with func_to_expr, which generates a new CallableExpression to manage argument binding. Feedback on the changes highlights critical validation gaps in both DataFrame.apply and CallableExpression.apply where missing checks for unexpected keyword arguments, duplicate arguments, or extra positional arguments could lead to silent correctness bugs or cryptic errors.

Comment on lines +4745 to +4761
# Bind the extra arguments (args and kwargs) starting from parameter 1
bindings = {}
# Positional arguments:
for idx, val in enumerate(args):
param_name = callable_expr.arg_specs[idx + 1].name
bindings[param_name] = val
# Keyword arguments:
for key, val in kwargs.items():
bindings[key] = val

# Bind defaults for other parameters (excluding the first 'row' parameter)
for spec in callable_expr.arg_specs[1:]:
if (
spec.name not in bindings
and spec.default_value is not inspect.Parameter.empty
):
bindings[spec.name] = spec.default_value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The manual binding of args and kwargs lacks validation for extra positional arguments and unexpected keyword arguments. This can lead to cryptic IndexError exceptions or silent correctness bugs where unexpected keyword arguments accidentally override free variables (columns) in the expression.

                # Bind the extra arguments (args and kwargs) starting from parameter 1
                bindings = {}
                allowed_params = callable_expr.arg_specs[1:]
                allowed_names = {spec.name for spec in allowed_params}
                func_name = getattr(func, "__name__", "callable")

                # Positional arguments:
                for idx, val in enumerate(args):
                    if idx >= len(allowed_params):
                        raise TypeError(
                            f"too many positional arguments for {func_name}: expected {len(allowed_params)}, got {len(args)}"
                        )
                    param_name = allowed_params[idx].name
                    bindings[param_name] = val

                # Keyword arguments:
                for key, val in kwargs.items():
                    if key not in allowed_names:
                        raise TypeError(
                            f"{func_name}() got an unexpected keyword argument '{key}'"
                        )
                    if key in bindings:
                        raise TypeError(
                            f"{func_name}() got multiple values for argument '{key}'"
                        )
                    bindings[key] = val

                # Bind defaults for other parameters (excluding the first 'row' parameter)
                for spec in allowed_params:
                    if (
                        spec.name not in bindings
                        and spec.default_value is not inspect.Parameter.empty
                    ):
                        bindings[spec.name] = spec.default_value

Comment on lines +76 to +105
bindings = {}
pos_idx = 0

def to_expr(val):
if isinstance(val, ex.Expression):
return val
return ex.const(val)

for spec in self.arg_specs:
if spec.is_varargs:
raise NotImplementedError(
"varargs in compiled python functions is not supported"
)

This should handle anything that might be passed to eg map, combine, other pandas methods that take a function.
if pos_idx < len(args):
bindings[spec.name] = to_expr(args[pos_idx])
pos_idx += 1
elif spec.name in kwargs:
bindings[spec.name] = to_expr(kwargs[spec.name])
elif spec.default_value is not inspect.Parameter.empty:
bindings[spec.name] = to_expr(spec.default_value)
else:
raise TypeError(f"missing required argument: '{spec.name}'")

It should raise a TypeError if the object is not a supported type.
if pos_idx < len(args):
raise TypeError(
f"too many positional arguments: expected {len(self.arg_specs)}, got {len(args)}"
)

Args:
op: The object to convert.
return self.expr.bind_variables(bindings)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

CallableExpression.apply does not validate unexpected keyword arguments or check if a keyword argument has already been supplied as a positional argument. This can lead to silent correctness bugs where keyword arguments are ignored or unexpected arguments are accepted.

        bindings = {}
        pos_idx = 0
        arg_names = {spec.name for spec in self.arg_specs}

        # Validate unexpected keyword arguments
        for key in kwargs:
            if key not in arg_names:
                raise TypeError(f"got an unexpected keyword argument '{key}'")

        def to_expr(val):
            if isinstance(val, ex.Expression):
                return val
            return ex.const(val)

        for spec in self.arg_specs:
            if spec.is_varargs:
                raise NotImplementedError(
                    "varargs in compiled python functions is not supported"
                )

            if pos_idx < len(args):
                if spec.name in kwargs:
                    raise TypeError(f"got multiple values for keyword argument '{spec.name}'")
                bindings[spec.name] = to_expr(args[pos_idx])
                pos_idx += 1
            elif spec.name in kwargs:
                bindings[spec.name] = to_expr(kwargs[spec.name])
            elif spec.default_value is not inspect.Parameter.empty:
                bindings[spec.name] = to_expr(spec.default_value)
            else:
                raise TypeError(f"missing required argument: '{spec.name}'")

        if pos_idx < len(args):
            raise TypeError(
                f"too many positional arguments: expected {len(self.arg_specs)}, got {len(args)}"
            )

        return self.expr.bind_variables(bindings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant