|
| 1 | +# Zero ETL Materialized View templates |
| 2 | + |
| 3 | +This directory holds one generated SQL ERB template per Redshift materialized view in the Zero ETL |
| 4 | +analytics export. For each model that declares `export_to_analytics`, there are up to two files: |
| 5 | + |
| 6 | +- `<table>.sql.erb` — the **non-PII** view, created as `learning_platform_<env>.<table>`. |
| 7 | +- `<table>_pii.sql.erb` — the **PII** view, created as `learning_platform_<env>_pii.<table>`. |
| 8 | + |
| 9 | +The `<%=environment_type%>` ERB placeholders are filled in at provision time (`test`, `production`), |
| 10 | +so one template serves every environment. The materialized views read the Learning Platform MySQL |
| 11 | +data that Zero ETL continuously replicates into Redshift. |
| 12 | + |
| 13 | +## These files are generated — do not edit them by hand |
| 14 | + |
| 15 | +They are produced by `Cdo::Aws::Redshift::MaterializedViewManager.generate_all_ddl_templates` from |
| 16 | +the current ActiveRecord models. Regeneration is triggered: |
| 17 | + |
| 18 | +- automatically by `rake db:migrate` (development), so a migration that reshapes an exported table |
| 19 | + shows up here as a template diff; |
| 20 | +- on demand by `bundle exec rake analytics_export:generate_materialized_view_templates`; |
| 21 | +- and at the start of `analytics_export:provision_materialized_views`. |
| 22 | + |
| 23 | +To change a view, change its source — a Rails migration (columns) or the model's `data_classification` |
| 24 | +declaration (which columns land in which view) — then regenerate. Editing a `.sql.erb` directly will |
| 25 | +be overwritten on the next regeneration. |
| 26 | + |
| 27 | +Which columns appear is governed by each column's data classification: |
| 28 | + |
| 29 | +- non-PII view (`learning_platform_<env>`): `:public` and `:confidential` columns. |
| 30 | +- PII view (`learning_platform_<env>_pii`): `:public`, `:confidential`, and `:restricted` columns. |
| 31 | +- `:highly_restricted` columns appear in neither view. |
| 32 | + |
| 33 | +A model whose non-PII (or PII) projection has no columns produces no corresponding template — and a |
| 34 | +template that no longer maps to any view is pruned during regeneration, so a deleted file here means |
| 35 | +that view no longer exists. |
| 36 | + |
| 37 | +## Committing these files does NOT change Redshift |
| 38 | + |
| 39 | +Merging a change to this directory does **not** update the materialized views provisioned in the |
| 40 | +Redshift cluster. These templates only **record the pending change** so it is visible in code review |
| 41 | +and git history. Redshift cannot `ALTER` a materialized view, so any column or classification change |
| 42 | +requires a coordinated `DROP` + `CREATE` of the affected views — which drops and recomputes them and |
| 43 | +can briefly interrupt the data team's downstream dbt models and reports. We therefore never deploy |
| 44 | +these changes automatically. |
| 45 | + |
| 46 | +## Deploying the change to Redshift |
| 47 | + |
| 48 | +Before (or shortly after) merging a Pull Request that changes this directory, coordinate with the |
| 49 | +data analytics ("RED") team and the Infrastructure Engineering team to schedule the rebuild. An |
| 50 | +Infrastructure Engineer with admin AWS credentials then runs the provision task on their workstation: |
| 51 | + |
| 52 | + VALIDATE (preview — submits nothing to Redshift): |
| 53 | + |
| 54 | + code-dot-org $ export AWS_PROFILE=codeorg-admin |
| 55 | + code-dot-org/dashboard $ DRY_RUN=1 bundle exec rake 'analytics_export:provision_materialized_views[production]' |
| 56 | + # Review the Add / Update / Drop plan. It should list only the views changed in this branch. |
| 57 | + |
| 58 | + EXECUTE (DROP + CREATE the changed views, drop orphans): |
| 59 | + |
| 60 | + code-dot-org/dashboard $ bundle exec rake 'analytics_export:provision_materialized_views[production]' |
| 61 | + |
| 62 | +`provision_materialized_views` rebuilds only the views whose DDL actually changed (it compares a hash |
| 63 | +stored on each view) and drops views no longer backed by a model. After it completes, the views are |
| 64 | +empty until refreshed; `analytics_export:refresh_materialized_views[production]` (or the daily export |
| 65 | +job) populates them. Use `analytics_export:materialized_view_status[production]` to check state. |
| 66 | + |
| 67 | +See `dashboard/lib/tasks/analytics_exportable.rake` for the full task list and |
| 68 | +`lib/cdo/aws/redshift/materialized_view_manager.rb` for the generator and provisioner. |
0 commit comments