Skip to content

Commit aaf02dd

Browse files
wilmaontherunclaude
andcommitted
Update AVG(integer) rule: rewrite to AVG(CAST(col AS DOUBLE)) for integer inputs
- AVG(INT/BIGINT/SMALLINT/TINYINT) is safely rewritable via CAST to DOUBLE - Remove known_unsupported entries for AVG(INT) tests now covered by the rewrite (cast_to_date_003, select_no_from_002, grouping_sets_002, cube_agg_002, null-handling_015) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b438037 commit aaf02dd

File tree

2 files changed

+447
-3
lines changed

2 files changed

+447
-3
lines changed

python/felderize/spark/data/skills/spark_skills.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ These are systematic differences between Spark and Feldera to be aware of during
2020

2121
- **[GBD-WHITESPACE] Whitespace definition:** Spark treats `' '` (space), `\t` (tab), `\n` (newline), `\r` (carriage return), and other Unicode whitespace as "whitespace" in any operation that involves trimming or whitespace-awareness. Feldera follows the SQL standard and only considers ASCII space (0x20) as whitespace. This affects `TRIM`, `LTRIM`, `RTRIM`, `CAST(str AS BOOLEAN)`, and any other function that implicitly strips whitespace. If the input may contain `\t` or `\n` at the edges, the results will differ.
2222

23-
- **[GBD-INT-DIV] Integer division:** When both operands are integers, Spark returns DOUBLE (e.g. `95/100 = 0.95`); Feldera performs integer division (e.g. `95/100 = 0`). Cast at least one operand to DOUBLE when fractional results are needed.
23+
- **[GBD-INT-DIV] Integer division:** When both operands are integers, Spark returns DECIMAL (e.g. `95/100 = 0.95`); Feldera performs integer division (e.g. `95/100 = 0`). Cast at least one operand to DECIMAL when fractional results are needed.
2424

2525
- **[GBD-AGG-TYPE] Aggregate return types on numeric inputs:** Spark often widens numeric aggregates to DOUBLE regardless of input type; Feldera follows the SQL standard and preserves the input type. Key cases:
26-
- `AVG(integer_col)` — Spark returns DOUBLE (`AVG(1,2)` = `1.5`); Feldera returns INT (`AVG(1,2)` = `1`). No rewrite possible.
26+
- `AVG(integer_col)` — Spark returns DOUBLE (`AVG(1,2)` = `1.5`); Feldera returns INT (`AVG(1,2)` = `1`). **Rewrite: `AVG(CAST(col AS DOUBLE))`** only when the input type is confirmed integer (INT, BIGINT, SMALLINT, TINYINT) — derive from schema or column definition. If the type cannot be determined, leave as-is and flag [GBD-AGG-TYPE].
2727
- `STDDEV_SAMP/STDDEV_POP(decimal_col)` — Spark widens to DOUBLE; Feldera preserves DECIMAL scale. No rewrite possible.
2828
- `AVG(decimal_col)` — Spark returns `DECIMAL(p+4, s+4)`; Feldera returns `DECIMAL(p,s)` (same scale). No rewrite possible.
2929

@@ -163,7 +163,7 @@ These Spark functions exist in Feldera — translate directly:
163163

164164
| Spark | Feldera | Notes |
165165
|-------|---------|-------|
166-
| `AVG(col)` | Same |[GBD-AGG-TYPE]: integer input returns INT not DOUBLE; decimal input preserves scale |
166+
| `AVG(col)` | `AVG(CAST(col AS DOUBLE))` if col is integer type; `AVG(col)` otherwise | Integer input: rewrite to return DOUBLE matching Spark. Decimal/float: leave as-is → [GBD-AGG-TYPE] scale mismatch |
167167
| `STDDEV_SAMP(col)` | Same |[GBD-AGG-TYPE]: decimal input preserves scale, not widened to DOUBLE |
168168
| `STDDEV_POP(col)` | Same |[GBD-AGG-TYPE]: decimal input preserves scale, not widened to DOUBLE |
169169
| `every(col)` | Same | Alias for `bool_and` — supported in Feldera |

0 commit comments

Comments
 (0)