refactor(bigframes): Extract json conversions to distinct ops#17473
refactor(bigframes): Extract json conversions to distinct ops#17473TrevorBergeron wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors JSON casting by replacing the generic AsTypeOp with explicit ToJSON and JSONDecode operations across the compilation backends (Ibis, Polars, and SQLGlot). Feedback suggests optimizing DataFrame.astype to avoid a performance regression caused by column-by-column assignment on non-JSON types, and improving type safety in the Polars compiler by using a more specific type annotation for the ToJSON operation.
| result = self.copy() | ||
| for col in result.columns: | ||
| result[col] = result[col].astype(dtype, errors=errors) | ||
| return result |
There was a problem hiding this comment.
Looping over columns and performing individual series assignments (result[col] = result[col].astype(...)) introduces a significant performance regression for non-JSON data types. In bigframes, column-by-column assignment creates multiple intermediate DataFrame states and nested expression trees, which can lead to slow compilation times and complex SQL queries.
We can avoid this overhead by fast-pathing the common cases where no JSON columns are involved, or where all columns are being cast to/from JSON, applying the operation to all columns at once using _apply_unary_op.
safe = errors == "null"
dtype = bigframes.dtypes.bigframes_type(dtype)
if dtype == bigframes.dtypes.JSON_DTYPE:
return self._apply_unary_op(ops.ToJSON(safe=safe))
if all(t == bigframes.dtypes.JSON_DTYPE for t in self.dtypes):
return self._apply_unary_op(ops.JSONDecode(to_type=dtype, safe=safe))
if not any(t == bigframes.dtypes.JSON_DTYPE for t in self.dtypes):
return self._apply_unary_op(ops.AsTypeOp(to_type=dtype, safe=safe))
result = self.copy()
for col in result.columns:
result[col] = result[col].astype(dtype, errors=errors)
return result| return input.str.json_decode(_DTYPE_MAPPING[op.to_type]) | ||
|
|
||
| @compile_op.register(json_ops.ToJSON) | ||
| def _(self, op: ops.ScalarOp, input: pl.Expr) -> pl.Expr: |
There was a problem hiding this comment.
For consistency and better type safety, annotate the op parameter with the specific json_ops.ToJSON type instead of the generic ops.ScalarOp, matching the pattern used in other registered operations (such as ops.ToArrayOp and json_ops.JSONDecode).
| def _(self, op: ops.ScalarOp, input: pl.Expr) -> pl.Expr: | |
| def _(self, op: json_ops.ToJSON, input: pl.Expr) -> pl.Expr: |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕