Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
There was a problem hiding this comment.
Summary
This PR addresses two distinct compatibility issues under #14573:
-
Arrow type downcast for custom component v1 serialization (
component_arrow.py): Pandas 3.x defaults tolarge_string(backed by PyArrow'sStringDtype), which older Arrow JS libraries bundled in third-party custom components cannot decode. A new_convert_df_to_compat_arrow_bytesfunction downcastslarge_string,large_binary, andlarge_listtypes to their standard counterparts before IPC serialization. -
Normalize modern CSS color formats (
utils.ts): Modern CSS color formats (oklch,oklab,lch,lab,color()) are valid in browsers but crashcolor2k's parser when used in Streamlit custom themes. A newnormalizeColorfunction uses a canvas 2D context to let the browser convert these formats to hex/rgba before passing them to color2k.
Reviewer consensus: All three reviewers (claude-4.6-opus-high-thinking, gemini-3.1-pro, gpt-5.3-codex-high) completed their reviews and approved. No critical or blocking issues were identified.
Code Quality
All three reviewers agree the changes are well-structured, cleanly scoped, and follow existing codebase patterns. The frontend normalizeColor function uses a clean two-phase approach (fast path via color2k, fallback via canvas), and the backend _downcast_arrow_large_types is cleanly recursive.
Two minor cross-cutting observations were noted:
- The
_convert_df_to_compat_arrow_bytesfunction partially duplicates thetry/exceptpattern fromdataframe_util.convert_pandas_df_to_arrow_bytes. A shared helper could reduce this, though it is non-blocking. (Raised by 2/3 reviewers.) - The docstring for
_convert_df_to_compat_arrow_bytesslightly misrepresents the delegation chain. (Raised by 1/3 reviewers.)
Test Coverage
All reviewers agreed the test coverage is adequate for the scope of this PR:
- Backend: Three tests cover
large_stringdowncast,large_binarydowncast, and preservation of standard types. - Frontend: Three tests cover standard color passthrough, OKLCH normalization via canvas mock, and fallback when canvas context is unavailable.
- E2E: No new E2E tests, which all reviewers found acceptable given the nature of the changes.
Agreed gap (3/3 reviewers): No test covers the large_list / LargeListType recursive downcast path. Adding a test with e.g. pa.large_list(pa.large_string()) would improve confidence.
Backwards Compatibility
All reviewers unanimously agree the changes are fully backwards compatible:
- The Arrow downcast only applies to v1 custom component marshalling, not the main dataframe element path.
normalizeColorreturns the original color when it already parses correctly (fast path) or when normalization fails (fallback).
Security & Risk
No security concerns identified by any reviewer. The changes are purely data-format and presentation-layer fixes with no new dependencies, no external communication, and no modification of security-sensitive surfaces.
Regression risk is low — the Arrow downcast is scoped to v1 custom components only, and the color normalization is defensive with graceful fallback on any failure.
External test recommendation
- Recommend external_test: No
- Triggered categories: None
- Evidence:
lib/streamlit/components/v1/component_arrow.py: Internal Arrow IPC serialization changes only — no routing, auth, WebSocket, embedding, or cross-origin changes.frontend/lib/src/theme/utils.ts: Local theme color normalization — no network, iframe, storage, or security header changes.
- Confidence: High (unanimous across all 3 reviewers)
Accessibility
No accessibility impact. The frontend changes are limited to the theme utility layer (color normalization). No UI components, ARIA attributes, focus behavior, or interactive elements are modified. (Unanimous agreement.)
Recommendations
- Improve canvas color normalization robustness: The current sentinel-based
#000000check has an edge case with colors that legitimately resolve to black. Using a two-sentinel approach eliminates this. (Raised by 2/3 reviewers; see inline comment.) - Add a test for
large_listdowncast: The recursive path in_downcast_arrow_large_typesforLargeListTypeand nestedListTypeis untested. (Raised by 3/3 reviewers.) - Consider handling
pa.DictionaryType: Categorical columns could have large value types. (Raised by 1/3 reviewers; see inline comment.) - Add logging in
_convert_df_to_compat_arrow_byteserror handler: Match the observability pattern fromconvert_pandas_df_to_arrow_bytes. (Raised by 1/3 reviewers.)
Verdict
APPROVED: Well-scoped, low-risk bug fixes for pandas 3.x Arrow compatibility in v1 custom components and modern CSS color format crashes in theming. All three reviewers approved unanimously. Both changes are backwards compatible, properly defensive, and adequately tested. The recommendations above are non-blocking improvements.
Consolidated AI review by claude-4.6-opus-high-thinking. Source reviews: claude-4.6-opus-high-thinking, gemini-3.1-pro, gpt-5.3-codex-high.
This review also includes 6 inline comment(s) on specific code lines.
| ctx.fillStyle = "#000000" | ||
| ctx.fillStyle = color | ||
| // If the browser didn't accept the color, fillStyle stays as "#000000". | ||
| if ( |
There was a problem hiding this comment.
suggestion: The sentinel-based black detection has a subtle edge case: a modern CSS color that legitimately resolves to black (e.g., oklch(0 0 0)) would be incorrectly treated as "rejected by the browser" and returned as the original unparsable string. A more robust approach is to test against two different initial colors:
ctx.fillStyle = "#123456"
ctx.fillStyle = color
const color1 = ctx.fillStyle
ctx.fillStyle = "#654321"
ctx.fillStyle = color
const color2 = ctx.fillStyle
if (color1 !== color2) {
return color
}
return color1This removes the need for the #000000/black special-casing entirely. (Raised by 2/3 reviewers.)
| return pa.list_(_downcast_arrow_large_types(field_type.value_type)) | ||
| if isinstance(field_type, pa.ListType): | ||
| return pa.list_(_downcast_arrow_large_types(field_type.value_type)) | ||
| return field_type |
There was a problem hiding this comment.
suggestion: Consider also handling pa.DictionaryType (used by pandas for Categorical columns) whose values could be a large type:
if isinstance(field_type, pa.DictionaryType):
return pa.dictionary(
field_type.index_type,
_downcast_arrow_large_types(field_type.value_type),
field_type.ordered,
)| assert len(proto.data) > 0 | ||
| assert len(proto.columns) > 0 | ||
|
|
||
| def test_marshall_downcasts_large_string_types(self): |
There was a problem hiding this comment.
suggestion: _downcast_arrow_large_types also handles pa.LargeListType and nested pa.ListType with large value types (lines 54–57 of component_arrow.py), but no tests cover those branches. Consider adding a test with e.g. pa.large_list(pa.large_string()) to verify the recursive downcast. (Raised by 3/3 reviewers.)
| pa.ArrowTypeError, | ||
| pa.ArrowInvalid, | ||
| pa.ArrowNotImplementedError, | ||
| ): |
There was a problem hiding this comment.
suggestion: The existing convert_pandas_df_to_arrow_bytes in dataframe_util.py logs an info-level message when Arrow conversion fails. Consider adding a similar _LOGGER.info(...) here to preserve the same observability for custom component marshalling failures.
| def _convert_df_to_compat_arrow_bytes(df: DataFrame) -> bytes: | ||
| """Convert a DataFrame to Arrow IPC bytes with large types downcast. | ||
|
|
||
| This wraps ``dataframe_util.convert_pandas_df_to_arrow_bytes`` and |
There was a problem hiding this comment.
nitpick: The docstring says this function "wraps dataframe_util.convert_pandas_df_to_arrow_bytes", but it actually reimplements the table-creation and error-handling logic and only delegates to convert_arrow_table_to_arrow_bytes for final serialization. Consider updating to say "mirrors the conversion logic of ... and additionally ..." for accuracy.
| """ | ||
| import pyarrow as pa | ||
|
|
||
| try: |
There was a problem hiding this comment.
thought: This try/except block duplicates the pattern from dataframe_util.convert_pandas_df_to_arrow_bytes. Extracting a shared convert_pandas_df_to_arrow_table(df) -> pa.Table helper in dataframe_util.py could reduce this duplication. Non-blocking for this PR.
Adds normalizeColor() that converts CSS Color Level 4 formats (oklch, oklab, lch, lab) to hex via browser canvas, preventing color2k parse failures that crash the app on load. Fixes #14573
8cb78ed to
9a3f707
Compare
|
The AI review job failed to complete. Please check the workflow run for details. You can retry by adding the 'ai-review' label again or manually triggering the workflow. |
|
The AI review job failed to complete. Please check the workflow run for details. You can retry by adding the 'ai-review' label again or manually triggering the workflow. |
|
FYI: Related to #12681 |
This pull request introduces a new utility function,
normalizeColor, to standardize CSS color strings (including modern formats like OKLCH) to a browser-supported format. The function is integrated throughout the theme utilities to ensure consistent color parsing and handling. Comprehensive unit tests fornormalizeColorare also added to verify its behavior under various scenarios.Color normalization improvements:
normalizeColorfunction inutils.tsto convert modern CSS color formats to a browser-supported format using a canvas context, with fallback to the original string if normalization fails.parseColorfunction to usenormalizeColorwhen returning colors, ensuring all parsed colors are normalized.createThemeandcreateSidebarTheme) to normalize background and secondary background colors, improving compatibility and consistency for user-supplied or computed color values. [1] [2]Testing enhancements:
normalizeColorinutils.test.ts, covering standard colors, OKLCH normalization via canvas, and fallback behavior when the canvas context is unavailable.normalizeColorin the test imports to enable direct testing.