Populate ClickHouse analytics tables when seeding preview projects#1471
Conversation
Preview-mode deployments don't run the external-db-sync pipeline, so the ClickHouse analytics_internal.* tables stayed empty and the project overview dashboard reported 0 total users, 0 monthly active users, and no live users on the globe. - seedDummyAnalyticsMirrorTables: mirror seeded users / teams / contact channels into analytics_internal.users/teams/contact_channels so the metrics endpoint reports real totals. - seedDummyLiveTokenRefreshEvents: emit recent $token-refresh events across distinct countries so the overview globe shows live users. - bulkRandomTimestampOnDay and the page-view/click timestamps: clamp so seeded events are never dated in the future. - buildTokenRefreshClickhouseRow: shared helper for the $token-refresh ClickHouse row shape. - create-project: pre-warm the ClickHouse connection so the seeding inserts don't pay the cold-start cost. - projects-metrics: type the ClickHouse .json() results. Also includes a seeding performance optimization that skips redundant idempotency lookups when seeding a brand-new project.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a freshProject flag across seeders, reuses and pre-warms a ClickHouse admin client for preview seeding, batches team creation, centralizes token-refresh ClickHouse row construction, clamps generated timestamps to avoid future events, and reorders orchestration to defer payments. ChangesPreview seeding performance and idempotency optimization
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes the preview-mode dashboard showing zero users/activity by directly seeding ClickHouse
Confidence Score: 4/5Safe to merge; the new seeding functions are preview/demo-only and don't touch production user data paths. The implementation is well-structured: mirror table inserts use sync_sequence_id = 0 so real pipeline rows always win under FINAL, timestamp clamping prevents the future-dated-event bug it was designed to fix, and the fresh-project fast path correctly skips idempotency overhead. Two minor quality concerns: the ClickHouse pre-warm creates a throwaway client (the service wake-up benefit is real, but per-connection TLS is not amortized), and the pvTime/clickTime clamp to now.getTime() clusters multiple demo events at the exact same millisecond. seed-dummy-data.ts — the two new seeding functions and the timestamp clamping logic are the most behaviorally dense additions and worth a careful read. Important Files Changed
Sequence DiagramsequenceDiagram
participant Route as create-project/route.tsx
participant Seed as seedDummyProject
participant PG as Postgres
participant CH as ClickHouse
Route->>CH: SELECT 1 (pre-warm, unawaited)
Route->>Seed: seedDummyProject()
Seed->>PG: seedDummyTeams / seedDummyUsers
Seed->>PG: seedBulkSignupsAndActivity (async)
Seed->>PG: seedDummyEmails / SessionActivity / SessionReplays
Note over CH: service already awake
Seed->>CH: bulk $token-refresh / $page-view / $click events
Seed->>PG: await bulkSignupsPromise
par mirror tables
Seed->>PG: findMany users/teams/contacts
Seed->>CH: INSERT analytics_internal.users/teams/contact_channels
and live events
Seed->>PG: findMany projectUsers (non-anon, take 8)
Seed->>CH: INSERT 8 $token-refresh events at ~now
end
Seed->>Seed: seedDummyTransactions (fire-and-forget in preview)
Seed-->>Route: projectId
Route->>CH: await clickhouseWarmup (already resolved)
Route-->>Route: return project_id
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/backend/src/app/api/latest/internal/preview/create-project/route.tsx:44-46
**Warmup discards the warmed client**
`getClickhouseAdminClient()` calls `createClient(...)` on every invocation, so the client used for `SELECT 1` is immediately abandoned. Every subsequent call inside `seedDummyProject` — `seedDummyAnalyticsMirrorTables`, `seedDummyLiveTokenRefreshEvents`, etc. — creates its own fresh HTTP client and pays its own TLS negotiation. The warmup still triggers the ClickHouse Cloud service wake-up (which is indeed the dominant ~0.7 s cost), so the net effect is positive; but the per-client TLS round-trips are not amortized the way the comment implies. If the full TLS cost matters, consider caching a singleton client and passing it into `seedDummyProject`.
### Issue 2 of 2
apps/backend/src/lib/seed-dummy-data.ts:1932-1934
**Event timestamp clustering at `now`**
Both page-view and click offsets can push a same-day `visitTime` past the current moment, so the `Math.min` clamp collapses them to the exact same `now.getTime()` value. In practice this means dozens of `$page-view` and `$click` events share one millisecond, producing an unnatural spike at the seeding instant in the analytics time-series. Clamping to `now - 1` (or a small random jitter below `now`) would preserve the visual spread without ever producing future-dated events.
```suggestion
// Clamp to `now - 1ms`: visitTime is already clamped, but adding the
// offset can push a same-day event past `now` into the future. Using
// `now - 1` keeps the timestamp strictly in the past so it doesn't
// appear as a "current" spike in the analytics time-series.
const pvTime = new Date(Math.min(visitTime.getTime() + pvOffset, now.getTime() - 1));
```
Reviews (1): Last reviewed commit: "Populate ClickHouse analytics tables whe..." | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/backend/src/app/api/latest/internal/preview/create-project/route.tsx`:
- Around line 39-46: The clickhouseWarmup promise currently swallows errors via
.then(() => undefined, () => undefined) so failures never surface to the later
await and seedDummyProject can't detect ClickHouse issues; change the warm-up to
propagate failures (e.g. remove the rejection handler so the rejected promise
bubbles) or explicitly log and rethrow the error from the rejection handler
returned by getClickhouseAdminClient().command({ query: "SELECT 1" }) so await
clickhouseWarmup will fail and upstream code (seedDummyProject) can handle the
error.
In `@apps/backend/src/lib/seed-dummy-data.ts`:
- Around line 2049-2053: freshProject is used to skip idempotency but ClickHouse
writes remain append-only, so rerunning seeds duplicates analytics; modify
seed-dummy-data.ts to make reseeds idempotent by deleting or replacing the
previously-seeded rows in analytics_internal.events when freshProject is false:
add a targeted DELETE (or invoke a ReplacingMergeTree/replace path) for the
seeded event signatures/tokens before inserting in
seedDummySessionActivityEvents and seedBulkSignupsAndActivity, using the same
identifying fields/tokens those functions generate so you only remove the prior
seed rows for that project rather than all events.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 25e6e66f-9d5a-4737-83f1-08daebd95936
📒 Files selected for processing (3)
apps/backend/src/app/api/latest/internal/preview/create-project/route.tsxapps/backend/src/app/api/latest/internal/projects-metrics/route.tsxapps/backend/src/lib/seed-dummy-data.ts
- Reuse a single ClickHouse client across the preview create-project route's warm-up and every analytics seeder, so the connection/TLS handshake is established once instead of per seeder. - Format $token-refresh event_at consistently in buildTokenRefreshClickhouseRow so the historical and live seeders write identical timestamp strings. - Clear a project's previously-seeded analytics_internal.events rows before reseeding an existing project, so reseeds refresh analytics instead of duplicating them (the events table is append-only).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/backend/src/lib/seed-dummy-data.ts (1)
430-432: 💤 Low valueConsider defensive check per coding guidelines.
The non-null assertion is structurally safe since
Promise.allpreserves array length, but per coding guidelines, explicit checks are preferred.🔧 Optional defensive fix
teamsToCreate.forEach((team, index) => { - teamNameToId.set(team.displayName, createdTeams[index]!.id); + const created = createdTeams[index] ?? throwErr(`Team creation result missing at index ${index}`); + teamNameToId.set(team.displayName, created.id); });As per coding guidelines: "Code defensively; prefer ?? throwErr(...) over non-null assertions with explicit error messages"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/backend/src/lib/seed-dummy-data.ts` around lines 430 - 432, The loop using teamsToCreate.forEach sets teamNameToId with createdTeams[index]!.id using a non-null assertion; replace that with a defensive nullish-coalescing check that throws a clear error if createdTeams[index] is missing (e.g., use createdTeams[index] ?? throwErr("…") before accessing .id), referring to teamsToCreate, createdTeams, teamNameToId, displayName and id so the assignment never uses the ! operator and fails loudly with an explicit message when the created team is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/backend/src/lib/seed-dummy-data.ts`:
- Around line 430-432: The loop using teamsToCreate.forEach sets teamNameToId
with createdTeams[index]!.id using a non-null assertion; replace that with a
defensive nullish-coalescing check that throws a clear error if
createdTeams[index] is missing (e.g., use createdTeams[index] ?? throwErr("…")
before accessing .id), referring to teamsToCreate, createdTeams, teamNameToId,
displayName and id so the assignment never uses the ! operator and fails loudly
with an explicit message when the created team is absent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e8721cf-1448-4e13-8572-404f6c9f5eee
📒 Files selected for processing (3)
apps/backend/src/app/api/latest/internal/preview/create-project/route.tsxapps/backend/src/lib/clickhouse.tsxapps/backend/src/lib/seed-dummy-data.ts
…t/route.tsx Co-authored-by: Konsti Wohlwend <n2d4xc@gmail.com>
The function was used on line 48 but never imported, causing a runtime ReferenceError. Add the import alongside the other stack-shared imports.
Summary
In preview-mode deployments (
NEXT_PUBLIC_STACK_IS_PREVIEW=true) the project overview dashboard reported 0 total users, 0 monthly active users, and no live users on the globe. The internal metrics endpoint reads user/team totals from the ClickHouseanalytics_internal.*tables and "live users" from recent$token-refreshevents — but those tables are normally filled by the external-db-sync pipeline, which does not run in preview deployments, so they were empty.This makes the preview/demo dummy-data seeder populate ClickHouse directly:
seedDummyAnalyticsMirrorTables— mirrors the seeded users / teams / contact channels intoanalytics_internal.users/teams/contact_channelsso the metrics endpoint reports real totals.seedDummyLiveTokenRefreshEvents— emits recent$token-refreshevents across distinct countries so the overview globe shows live users.bulkRandomTimestampOnDayand the page-view/click timestamps are clamped so seeded events are never dated in the future (future-dated events permanently matched the unbounded "live users" query).buildTokenRefreshClickhouseRow— shared helper for the$token-refreshClickHouse row shape.create-project— pre-warms the ClickHouse connection so the seeding inserts don't pay the cold-start cost.projects-metrics— types the ClickHouse.json()results (fixes atscerror).Also bundles a seeding performance optimization that skips redundant idempotency lookups when seeding a brand-new project.
Notes:
sync_sequence_id = 0so that if the external-db-sync pipeline ever does run for the project, any real update supersedes the seeded placeholder underReplacingMergeTree+FINAL.Test plan
pnpm --filter @stackframe/backend typecheckpassespnpm --filter @stackframe/backend lintpassesanalytics_internal.users/teams/contact_channelspopulated for the seeded project$token-refreshevents inanalytics_internal.eventsSummary by CodeRabbit