-
Notifications
You must be signed in to change notification settings - Fork 1.7k
refactor(bigframes): Extract json conversions to distinct ops #17473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -442,17 +442,16 @@ def astype( | |
| if errors not in ["raise", "null"]: | ||
| raise ValueError("Arg 'error' must be one of 'raise' or 'null'") | ||
|
|
||
| safe_cast = errors == "null" | ||
|
|
||
| if isinstance(dtype, dict): | ||
| result = self.copy() | ||
| for col, to_type in dtype.items(): | ||
| result[col] = result[col].astype(to_type) | ||
| result[col] = result[col].astype(to_type, errors=errors) | ||
| return result | ||
|
|
||
| dtype = bigframes.dtypes.bigframes_type(dtype) | ||
|
|
||
| return self._apply_unary_op(ops.AsTypeOp(dtype, safe_cast)) | ||
| result = self.copy() | ||
| for col in result.columns: | ||
| result[col] = result[col].astype(dtype, errors=errors) | ||
| return result | ||
|
Comment on lines
+451
to
+454
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looping over columns and performing individual series assignments ( We can avoid this overhead by fast-pathing the common cases where no JSON columns are involved, or where all columns are being cast to/from JSON, applying the operation to all columns at once using safe = errors == "null"
dtype = bigframes.dtypes.bigframes_type(dtype)
if dtype == bigframes.dtypes.JSON_DTYPE:
return self._apply_unary_op(ops.ToJSON(safe=safe))
if all(t == bigframes.dtypes.JSON_DTYPE for t in self.dtypes):
return self._apply_unary_op(ops.JSONDecode(to_type=dtype, safe=safe))
if not any(t == bigframes.dtypes.JSON_DTYPE for t in self.dtypes):
return self._apply_unary_op(ops.AsTypeOp(to_type=dtype, safe=safe))
result = self.copy()
for col in result.columns:
result[col] = result[col].astype(dtype, errors=errors)
return result |
||
|
|
||
| def _should_sql_have_index(self) -> bool: | ||
| """Should the SQL we pass to BQML and other I/O include the index?""" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency and better type safety, annotate the
opparameter with the specificjson_ops.ToJSONtype instead of the genericops.ScalarOp, matching the pattern used in other registered operations (such asops.ToArrayOpandjson_ops.JSONDecode).