Add medallion architecture demo by anandbraman · Pull Request #6389 · feldera/feldera

anandbraman · 2026-06-04T01:01:12Z

Ecommerce Medallion Architecture demo pipeline. Renamed the other demo files so that the medallion architecture demo shows up second

Describe Manual Test Plan

Ran the uv run crates/pipeline-manager/demos/run.py --api-url http://localhost:8080 command successfully

Checklist

Unit tests added/updated
Integration tests added/updated
Documentation updated
Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

OpenAPI / REST HTTP API / feldera-types / manager (What is a breaking change?)
Feldera SQL (Syntax, Semantics)
feldera-sqllib (incl. dependencies fxp, etc.) (What is a breaking change?)
Python SDK (What is a breaking change?)
fda (CLI arguments)
Adapters (including configuration)
Storage Format / Checkpoints
Others (specify)

Describe Incompatible Changes

Ecommerce Medallion Architecture demo pipeline. Renamed the other demo files so that the medallion architecture demo shows up second Signed-off-by: Anand Raman <anand.raman@feldera.com>

mythical-fred

Nice addition — medallion architecture is a great story to tell, and the bronze→silver→gold layering with LOCAL VIEWS for silver and MATERIALIZED for gold is exactly the right shape. The demo prose at the top is well written and the INSERT INTO bronze_suppliers ... 10000 ... trick to flip gold_realtime_inventory_alerts is a clean way to make incrementality visible in milliseconds.

A few semantic things to fix before merging, plus some cosmetic items. None of them break the demo, but a couple of them make the metrics technically wrong, which is bad in a showcase.

Main issues

gold_inventory_risk.avg_daily_units is SUM(quantity) / 30.0 over all-time orders, not over the last 30 days. There's no date filter on the sales subquery — SUM(quantity) covers the entire history of silver_confirmed_order_items. Dividing by 30 is then meaningless: if the dataset has 90 days of history, the "average daily units" is 3× too high; if it has 10 days, 3× too low. Either filter to a recent window (WHERE order_created_at > NOW() - INTERVAL '30' DAY) or divide by the actual number of days observed (COUNT(DISTINCT DATE_TRUNC(order_created_at, DAY))). As written, the CRITICAL/WARNING/OK classification is calibrated against a fictitious denominator. Inline comment with details.
gold_supplier_performance groups on (supplier_name, supplier_country, lead_time_days) rather than on supplier_id. This is the only join key all the way down — orders_by_supplier / products_by_supplier / orders_count_by_supplier / silver_inventory_by_supplier all hang off it. Two different suppliers that happen to share a (name, country, lead_time) tuple will silently collapse into one row, and silver_inventory_by_supplier further only groups by supplier_name, which compounds it. Use supplier_id as the grain throughout and carry the descriptive columns along for display.
The connector story disagrees with the connector config. The header says "fed by a Delta snapshot connector and then kept current by the CDC stream", and several Gold-view comments reference "MERGE commits transition order statuses (the core incremental demo)". But every bronze_* table only has a single mode: snapshot delta_table_input connector — no follower CDC stream. After the initial snapshot the only way to see incrementality is the ad-hoc INSERT trick shown in the prose. Either drop the CDC language or add the CDC follower connector (mode: cdc / transaction_mode: cdc).

Smaller things

silver_orders_enriched inner-joins to the aggregated order_items subquery, so orders with zero qualifying line items disappear before reaching gold — including the cancelled-but-empty case. If that's intentional (only emit orders that actually have line items), worth a one-line comment; otherwise LEFT JOIN and COALESCE the aggregates.
silver_inventory_current.total_sold mixes two event types into one number: SUM(ABS(...)) WHERE event_type='sale_reserve' minus SUM(...) WHERE event_type='cancellation_restock'. The sign convention buried in quantity_change (sale_reserve presumably stored negative? cancellation_restock positive?) isn't documented anywhere in the SQL. Even one comment about the sign of quantity_change per event_type would save the next reader.
Formatting is inconsistent — first column of several CREATE TABLEs starts at column 0 (event_id, order_item_id, product_id, inventory_event_id) while later columns are indented 4 spaces; the 'skip_unused_columns'/'connectors' keys also drift between column 0 and indented. Running the file through a formatter would help. The two ORDER BYs in gold_weekly_revenue_trend and gold_cancellation_impact window specs sit at column 0 inside an already-indented OVER (...) clause — same fix.
The DDL has no WHERE filter on bronze_clickstream_events anywhere downstream — none of silver or gold references it. If it's there as future material for someone to demo from, fine; if it's leftover, drop it (it has the largest row count and the heaviest connector, so it costs the most to ingest).
Prose nit: "Append-and-update tables" in the bronze description, but the bronze tables are declared with PRIMARY KEY (so they're really upsert tables). "Append-only" in the inline comment a few lines down then contradicts the prose. Pick one.

Will be happy to re-approve once 1, 2, and 3 are addressed.

mythical-fred · 2026-06-04T04:08:10Z

+    JOIN (
+        SELECT product_id,
+            SUM(quantity) AS units_sold_recent,
+            SUM(quantity) / 30.0 AS avg_daily_units,


SUM(quantity) here is over all of silver_confirmed_order_items, with no WHERE order_created_at > NOW() - INTERVAL '30' DAY and no GROUP BY ... DATE_TRUNC(...). Dividing the lifetime total by 30 doesn't produce "average daily units" — it scales whatever history the demo dataset happens to contain by a fixed denominator. The downstream days_of_stock_remaining = total_stock / avg_daily_units and the CRITICAL/WARNING thresholds (< lead_time_days * 1.5 / * 3.0) inherit the same bias. Either bound the window or divide by the actual number of distinct days observed.

mythical-fred · 2026-06-04T04:08:10Z

+    SUM(total_restocked) AS total_restocked,
+    SUM(total_returned) AS total_returned
+FROM silver_inventory_current
+GROUP BY supplier_name;


Grouping on supplier_name alone (and on (supplier_name, supplier_country, lead_time_days) in gold_supplier_performance) silently merges any two suppliers that share those values. Use supplier_id as the grain throughout silver and gold; carry name/country/lead_time as descriptive columns.

lalithsuresh · 2026-06-04T12:13:40Z

+-- Instead of recomputing silver and gold layers on a schedule, Feldera will maintain everything incrementally, using a fraction of the compute required by batch engines.
+--
+-- ### Bronze — Raw Ingestion (`CREATE TABLE`)
+-- Append-and-update tables that mirror the source systems with no transformation. Each is fed by a Delta snapshot connector and then kept current by the CDC stream.


I assume the cdc stream is not always on right in the sandbox? How do we capture this visually?

the cdc stream is a python script that reads the files from s3 and then pushes the changes to the pipeline. Makes it interactive for the users. I'll add a link to the docs which contains the script and the spark notebook for comparison shortly!

Added demo and demo assets - spark notebook, script to push cdc changes. Made docs updates to sidebar and added a 3 part demo Signed-off-by: Anand Raman <anand.raman@feldera.com>

added medallion architecture demo

ca79740

Ecommerce Medallion Architecture demo pipeline. Renamed the other demo files so that the medallion architecture demo shows up second Signed-off-by: Anand Raman <anand.raman@feldera.com>

mythical-fred suggested changes Jun 4, 2026

View reviewed changes

lalithsuresh approved these changes Jun 4, 2026

View reviewed changes

anandbraman added 2 commits June 4, 2026 17:24

added documentation for the medallion architecture demo

081848c

Added demo and demo assets - spark notebook, script to push cdc changes. Made docs updates to sidebar and added a 3 part demo Signed-off-by: Anand Raman <anand.raman@feldera.com>

wording

ffac1ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add medallion architecture demo#6389

Add medallion architecture demo#6389
anandbraman wants to merge 3 commits into
mainfrom
medallion-architecture-demo

anandbraman commented Jun 4, 2026

Uh oh!

mythical-fred left a comment

Uh oh!

mythical-fred Jun 4, 2026

Uh oh!

mythical-fred Jun 4, 2026

Uh oh!

Uh oh!

lalithsuresh Jun 4, 2026

Uh oh!

anandbraman Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anandbraman commented Jun 4, 2026

Describe Manual Test Plan

Checklist

Breaking Changes?

Describe Incompatible Changes

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Main issues

Smaller things

Uh oh!

mythical-fred Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

mythical-fred Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lalithsuresh Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

anandbraman Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants