Skip to content

Commit 7d56201

Browse files
committed
revised samples and skills
1 parent b10f7a4 commit 7d56201

File tree

10 files changed

+34
-19
lines changed

10 files changed

+34
-19
lines changed

python/felderize/spark/cli.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -173,12 +173,8 @@ def example(name: str | None, validate: bool, json_output: bool, no_docs: bool,
173173
if not name:
174174
click.echo("Available examples:\n")
175175
for ex_name, files in pairs.items():
176-
if isinstance(files, Path):
177-
preview = files.read_text().strip().split("\n")[0]
178-
click.echo(f" {ex_name:20s} {preview} [combined]")
179-
else:
180-
preview = files[0].read_text().strip().split("\n")[0]
181-
click.echo(f" {ex_name:20s} {preview}")
176+
tag = "[combined]" if isinstance(files, Path) else "[schema+query]"
177+
click.echo(f" {ex_name:20s} {tag}")
182178
click.echo("\nRun one with: felderize example <name>")
183179
return
184180

python/felderize/spark/data/samples/aggregation_date_trunc.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,3 @@ FROM orders WHERE status IN ('PAID', 'SHIPPED')
3232
GROUP BY region, TIMESTAMP_TRUNC(created_at, MONTH);
3333
```
3434

35-
Rewrites: `STRING``VARCHAR`, remove `USING parquet`, `CREATE OR REPLACE TEMP VIEW``CREATE VIEW`, `date_trunc('MONTH', x)``TIMESTAMP_TRUNC(x, MONTH)`.

python/felderize/spark/data/samples/array_lambda.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,3 @@ SELECT row_id, TRANSFORM(nums, x -> x + 1) AS nums_plus_one
1616
FROM collection_events;
1717
```
1818

19-
Rewrites: `ARRAY<INT>``INT ARRAY`. Lambda syntax preserved.

python/felderize/spark/data/samples/array_map_functions.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,3 @@ SELECT user_id, CARDINALITY(tags) AS tag_count,
3232
FROM session_profiles;
3333
```
3434

35-
Rewrites: `ARRAY<STRING>``VARCHAR ARRAY`, `size()``CARDINALITY()`, `element_at(map, key)``map[key]`.

python/felderize/spark/data/samples/datediff.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,3 @@ Feldera:
1414
MAX(TIMESTAMPDIFF(DAY, shipped_at, delivered_at)) AS max_days
1515
```
1616

17-
Rewrites: `datediff(end, start)``TIMESTAMPDIFF(DAY, start, end)` (argument order reversed).

python/felderize/spark/data/samples/left_semi_join.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,3 @@ INNER JOIN order_facts o ON c.customer_id = o.customer_id
2020
WHERE o.amount >= 500;
2121
```
2222

23-
Rewrites: `LEFT SEMI JOIN``INNER JOIN` + `SELECT DISTINCT`.

python/felderize/spark/data/samples/null_safe_equality.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,3 @@ Feldera:
1414
ON p.email <=> e.email AND p.phone <=> e.phone
1515
```
1616

17-
Rewrites: none — `<=>` is supported in Feldera.

python/felderize/spark/data/samples/nvl_coalesce.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,3 @@ SELECT row_id, grp, COALESCE(amount, discount) AS resolved_amount
1616
FROM scalar_function_rows;
1717
```
1818

19-
Rewrites: `nvl(a, b)``COALESCE(a, b)`. Same for `ifnull(a, b)`.

python/felderize/spark/data/samples/row_number_topk.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,3 @@ SELECT category, product_id, revenue FROM (
2121
) ranked WHERE rn <= 3;
2222
```
2323

24-
ROW_NUMBER with TopK filter (`WHERE rn <= N`) passes through directly.

python/felderize/spark/data/skills/spark_skills.md

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,32 @@ Results are always `VARIANT` — cast to get a concrete type: `CAST(v['age'] AS
255255
| `json_array_length(s)` | `CARDINALITY(CAST(PARSE_JSON(s) AS VARIANT ARRAY))` | |
256256

257257
Notes:
258-
- Parse once, reuse: `SELECT PARSE_JSON(col) AS v, CAST(v['key'] AS VARCHAR) AS k` — lateral column aliases supported.
258+
- **Parse once, reuse — for simple SELECT (no GROUP BY).** When extracting multiple fields from the same JSON column without aggregation, call `PARSE_JSON` once as a lateral alias:
259+
260+
```sql
261+
-- Simple SELECT: parse once, reuse alias
262+
SELECT
263+
PARSE_JSON(payload) AS v,
264+
CAST(v['user_id'] AS VARCHAR) AS user_id,
265+
CAST(v['amount'] AS DOUBLE) AS amount,
266+
CAST(v['currency'] AS VARCHAR) AS currency
267+
FROM raw_events;
268+
```
269+
270+
- **With GROUP BY: use a CTE to pre-parse.** The lateral alias pattern breaks with GROUP BY because `payload` must be in the GROUP BY or an aggregate. Pre-parse in a CTE instead. The CTE goes *inside* `CREATE VIEW ... AS`, not before it:
271+
272+
```sql
273+
CREATE VIEW summary AS
274+
WITH parsed AS (
275+
SELECT *, PARSE_JSON(payload) AS v FROM raw_events
276+
)
277+
SELECT
278+
CAST(v['user_id'] AS VARCHAR) AS user_id,
279+
COUNT(*) AS cnt
280+
FROM parsed
281+
GROUP BY CAST(v['user_id'] AS VARCHAR);
282+
```
283+
259284
- For performance-critical paths, `CREATE TYPE` + `jsonstring_as_<type>` is more efficient than `PARSE_JSON` + bracket access.
260285

261286
#### Math functions
@@ -590,10 +615,12 @@ SELECT id, UPPER(name) AS name, SOUNDEX(name) AS sound FROM users
590615

591616
#### Bitwise (scalar)
592617

593-
| Function | Notes |
594-
|----------|-------|
595-
| `bit_and(a, b)`, `bit_or(a, b)`, `bit_xor(a, b)` | Scalar 2-arg form not supported; only aggregate `BIT_AND/OR/XOR(col)` exists |
596-
| `shiftleft`, `shiftright`, `shiftrightunsigned` | No bitwise shift syntax in Feldera |
618+
| Spark | Feldera | Notes |
619+
|-------|---------|-------|
620+
| `bit_and(a, b)` | `a & b` | Use bitwise operator |
621+
| `bit_or(a, b)` | `a \| b` | Use bitwise operator |
622+
| `bit_xor(a, b)` | `a ^ b` | Use bitwise operator |
623+
| `shiftleft`, `shiftright`, `shiftrightunsigned` || No bitwise shift syntax in Feldera |
597624

598625
#### Math
599626

0 commit comments

Comments
 (0)