You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update AVG(integer) rule: rewrite to AVG(CAST(col AS DOUBLE)) for integer inputs
- AVG(INT/BIGINT/SMALLINT/TINYINT) is safely rewritable via CAST to DOUBLE
- Remove known_unsupported entries for AVG(INT) tests now covered by the rewrite
(cast_to_date_003, select_no_from_002, grouping_sets_002, cube_agg_002, null-handling_015)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: python/felderize/spark/data/skills/spark_skills.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,10 +20,10 @@ These are systematic differences between Spark and Feldera to be aware of during
20
20
21
21
-**[GBD-WHITESPACE] Whitespace definition:** Spark treats `' '` (space), `\t` (tab), `\n` (newline), `\r` (carriage return), and other Unicode whitespace as "whitespace" in any operation that involves trimming or whitespace-awareness. Feldera follows the SQL standard and only considers ASCII space (0x20) as whitespace. This affects `TRIM`, `LTRIM`, `RTRIM`, `CAST(str AS BOOLEAN)`, and any other function that implicitly strips whitespace. If the input may contain `\t` or `\n` at the edges, the results will differ.
22
22
23
-
-**[GBD-INT-DIV] Integer division:** When both operands are integers, Spark returns DOUBLE (e.g. `95/100 = 0.95`); Feldera performs integer division (e.g. `95/100 = 0`). Cast at least one operand to DOUBLE when fractional results are needed.
23
+
-**[GBD-INT-DIV] Integer division:** When both operands are integers, Spark returns DECIMAL (e.g. `95/100 = 0.95`); Feldera performs integer division (e.g. `95/100 = 0`). Cast at least one operand to DECIMAL when fractional results are needed.
24
24
25
25
-**[GBD-AGG-TYPE] Aggregate return types on numeric inputs:** Spark often widens numeric aggregates to DOUBLE regardless of input type; Feldera follows the SQL standard and preserves the input type. Key cases:
26
-
-`AVG(integer_col)` — Spark returns DOUBLE (`AVG(1,2)` = `1.5`); Feldera returns INT (`AVG(1,2)` = `1`). No rewrite possible.
26
+
-`AVG(integer_col)` — Spark returns DOUBLE (`AVG(1,2)` = `1.5`); Feldera returns INT (`AVG(1,2)` = `1`). **Rewrite: `AVG(CAST(col AS DOUBLE))`** only when the input type is confirmed integer (INT, BIGINT, SMALLINT, TINYINT) — derive from schema or column definition. If the type cannot be determined, leave as-is and flag [GBD-AGG-TYPE].
27
27
-`STDDEV_SAMP/STDDEV_POP(decimal_col)` — Spark widens to DOUBLE; Feldera preserves DECIMAL scale. No rewrite possible.
@@ -163,7 +163,7 @@ These Spark functions exist in Feldera — translate directly:
163
163
164
164
| Spark | Feldera | Notes |
165
165
|-------|---------|-------|
166
-
|`AVG(col)`|Same | → [GBD-AGG-TYPE]: integer input returns INT not DOUBLE; decimal input preserves scale |
166
+
|`AVG(col)`|`AVG(CAST(col AS DOUBLE))` if col is integer type; `AVG(col)` otherwise | Integer input: rewrite to return DOUBLE matching Spark. Decimal/float: leave as-is → [GBD-AGG-TYPE]scale mismatch|
167
167
|`STDDEV_SAMP(col)`| Same | → [GBD-AGG-TYPE]: decimal input preserves scale, not widened to DOUBLE |
168
168
|`STDDEV_POP(col)`| Same | → [GBD-AGG-TYPE]: decimal input preserves scale, not widened to DOUBLE |
169
169
|`every(col)`| Same | Alias for `bool_and` — supported in Feldera |
0 commit comments