feat: Experimental transpilation of unannotated python callables#17419
feat: Experimental transpilation of unannotated python callables#17419TrevorBergeron wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental Python transpiler feature, allowing standard Python callables to be transpiled and executed within DataFrame.map, DataFrame.apply, Series.apply, and Series.combine. It replaces the func_to_op helper with func_to_expr, which generates a new CallableExpression to manage argument binding. Feedback on the changes highlights critical validation gaps in both DataFrame.apply and CallableExpression.apply where missing checks for unexpected keyword arguments, duplicate arguments, or extra positional arguments could lead to silent correctness bugs or cryptic errors.
| # Bind the extra arguments (args and kwargs) starting from parameter 1 | ||
| bindings = {} | ||
| # Positional arguments: | ||
| for idx, val in enumerate(args): | ||
| param_name = callable_expr.arg_specs[idx + 1].name | ||
| bindings[param_name] = val | ||
| # Keyword arguments: | ||
| for key, val in kwargs.items(): | ||
| bindings[key] = val | ||
|
|
||
| # Bind defaults for other parameters (excluding the first 'row' parameter) | ||
| for spec in callable_expr.arg_specs[1:]: | ||
| if ( | ||
| spec.name not in bindings | ||
| and spec.default_value is not inspect.Parameter.empty | ||
| ): | ||
| bindings[spec.name] = spec.default_value |
There was a problem hiding this comment.
The manual binding of args and kwargs lacks validation for extra positional arguments and unexpected keyword arguments. This can lead to cryptic IndexError exceptions or silent correctness bugs where unexpected keyword arguments accidentally override free variables (columns) in the expression.
# Bind the extra arguments (args and kwargs) starting from parameter 1
bindings = {}
allowed_params = callable_expr.arg_specs[1:]
allowed_names = {spec.name for spec in allowed_params}
func_name = getattr(func, "__name__", "callable")
# Positional arguments:
for idx, val in enumerate(args):
if idx >= len(allowed_params):
raise TypeError(
f"too many positional arguments for {func_name}: expected {len(allowed_params)}, got {len(args)}"
)
param_name = allowed_params[idx].name
bindings[param_name] = val
# Keyword arguments:
for key, val in kwargs.items():
if key not in allowed_names:
raise TypeError(
f"{func_name}() got an unexpected keyword argument '{key}'"
)
if key in bindings:
raise TypeError(
f"{func_name}() got multiple values for argument '{key}'"
)
bindings[key] = val
# Bind defaults for other parameters (excluding the first 'row' parameter)
for spec in allowed_params:
if (
spec.name not in bindings
and spec.default_value is not inspect.Parameter.empty
):
bindings[spec.name] = spec.default_value| bindings = {} | ||
| pos_idx = 0 | ||
|
|
||
| def to_expr(val): | ||
| if isinstance(val, ex.Expression): | ||
| return val | ||
| return ex.const(val) | ||
|
|
||
| for spec in self.arg_specs: | ||
| if spec.is_varargs: | ||
| raise NotImplementedError( | ||
| "varargs in compiled python functions is not supported" | ||
| ) | ||
|
|
||
| This should handle anything that might be passed to eg map, combine, other pandas methods that take a function. | ||
| if pos_idx < len(args): | ||
| bindings[spec.name] = to_expr(args[pos_idx]) | ||
| pos_idx += 1 | ||
| elif spec.name in kwargs: | ||
| bindings[spec.name] = to_expr(kwargs[spec.name]) | ||
| elif spec.default_value is not inspect.Parameter.empty: | ||
| bindings[spec.name] = to_expr(spec.default_value) | ||
| else: | ||
| raise TypeError(f"missing required argument: '{spec.name}'") | ||
|
|
||
| It should raise a TypeError if the object is not a supported type. | ||
| if pos_idx < len(args): | ||
| raise TypeError( | ||
| f"too many positional arguments: expected {len(self.arg_specs)}, got {len(args)}" | ||
| ) | ||
|
|
||
| Args: | ||
| op: The object to convert. | ||
| return self.expr.bind_variables(bindings) |
There was a problem hiding this comment.
CallableExpression.apply does not validate unexpected keyword arguments or check if a keyword argument has already been supplied as a positional argument. This can lead to silent correctness bugs where keyword arguments are ignored or unexpected arguments are accepted.
bindings = {}
pos_idx = 0
arg_names = {spec.name for spec in self.arg_specs}
# Validate unexpected keyword arguments
for key in kwargs:
if key not in arg_names:
raise TypeError(f"got an unexpected keyword argument '{key}'")
def to_expr(val):
if isinstance(val, ex.Expression):
return val
return ex.const(val)
for spec in self.arg_specs:
if spec.is_varargs:
raise NotImplementedError(
"varargs in compiled python functions is not supported"
)
if pos_idx < len(args):
if spec.name in kwargs:
raise TypeError(f"got multiple values for keyword argument '{spec.name}'")
bindings[spec.name] = to_expr(args[pos_idx])
pos_idx += 1
elif spec.name in kwargs:
bindings[spec.name] = to_expr(kwargs[spec.name])
elif spec.default_value is not inspect.Parameter.empty:
bindings[spec.name] = to_expr(spec.default_value)
else:
raise TypeError(f"missing required argument: '{spec.name}'")
if pos_idx < len(args):
raise TypeError(
f"too many positional arguments: expected {len(self.arg_specs)}, got {len(args)}"
)
return self.expr.bind_variables(bindings)
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕