This document is a technical and delivery plan for recreating DeepNotes in a new codebase while preserving user data and crypto semantics and planning a coordinated cutover; it does not promise wire compatibility with the legacy tRPC or KeyDB stack (see Decided stack below). It complements the product and architecture summaries in NON_TECHNICAL_OVERVIEW.md and TECHNICAL_OVERVIEW.md, and reflects a thorough read of the current monorepo layout, API boundaries, and operational patterns.
Audience: engineers and technical leads who will scope work, own compatibility, and sequence migration.
Non-goals here: detailed UI redesign, pricing, or product roadmap—only what is required to restart the implementation safely.
The following are set decisions for the new implementation (they override earlier “strawman” text elsewhere in this doc where they conflict).
| Area | Choice |
|---|---|
| HTTP API | No tRPC. Use a conventional HTTP API (e.g. REST under /api/... or a small set of versioned resource routes) with request/response schemas in code (Zod or Valibot) and a published OpenAPI spec. The client uses plain fetch or a thin typed client generated from that spec. |
| ORM / SQL | Drizzle for schema, migrations, and queries in the new project—replacing Knex + Objection from the current codebase. |
| Cache / sessions | Redis (standard open-source Redis or a managed Redis-compatible service). Not KeyDB as a hard dependency. |
| Key rotation | Removed from the product and codebase to reduce complexity. No user/group “rotate keys” WebSocket flows; no scheduled page key re-encryption in collab (next_key_rotation_date and related logic in the current app are dropped, not reimplemented). Existing ciphertext in Postgres that was encrypted with the current keys remains valid; you simply stop the rotation machinery. Revisit only if a future security incident requires forced re-keying. |
| Mobile / IAP billing | RevenueCat is out of scope for the new stack (no webhook, no client integration). Stripe remains the web subscription source of truth where billing applies. |
| Legacy wire compatibility | The tRPC (/trpc/...) wire protocol and superjson-shaped payloads are not a compatibility target. Backward compatibility for this plan means data (Postgres + decryptable page blobs with existing keys) and, where you choose to keep them, WebSocket protocols for realtime and collab—not HTTP API parity with the old app. Migration = one coordinated cutover to the new client and API (or a temporary legacy API gateway, which you are not committing to by default). |
| Hosting | Cloudflare as the primary production surface: Workers for the HTTP API (and Worker-backed WebSocket entrypoints where appropriate), Cloudflare Pages (or Workers static assets) for the built SPA and vite-ssg marketing output. PostgreSQL stays a separate managed database (not D1 for the main app DB unless you later commit to a full Postgres→D1 migration—out of scope for this plan); connect from Workers via Hyperdrive so each request uses a pooled connection instead of paying full TCP/TLS/auth cost on every isolate. Redis has no first-party Cloudflare equivalent—use Upstash Redis, Redis Cloud, or another Redis-compatible TCP or HTTP service reachable from Workers, with credentials in Wrangler secrets / dashboard. |
Redis vs KeyDB caveat: the current DataAbstraction layer in @stdlib/data defines Redis commands such as expiremember, which is a KeyDB / Redis Stack–style extension, not part of standard Redis. A migration to “normal Redis” must reimplement TTL and cache invalidation using standard commands (e.g. per-key EXPIRE, key naming, or hashes without field-level EXPIRE unless you accept HEXPIRE only on Redis 8+ or similar—decide in implementation).
Workers are not full Node by default: prefer frameworks that fit the Workers model (Hono is a strong default on Workers). If you standardize on Fastify, validate Wrangler compatibility_flags (e.g. nodejs_compat) and dependency support early—some npm packages assume long-lived processes or Node APIs Workers do not provide.
| Topic | Implication |
|---|---|
| Postgres + Drizzle | Point Drizzle/pg / Postgres.js at the Hyperdrive connection string (create a new client per request; Hyperdrive pools upstream). Avoid opening a raw remote Postgres connection from every Worker invocation without Hyperdrive—latency and connection limits will hurt. |
| Redis | Use an external Redis-compatible service (e.g. Upstash). Wire REDIS_URL (or vendor-specific HTTP APIs) via env bindings/secrets; document any TCP vs HTTP client choice for Workers. |
| Realtime / collab WebSockets | Stateful rooms (Yjs, presence, fan-out) map naturally to Durable Objects (WebSocket hibernation APIs where you need many idle connections). A plain Worker fetch upgrade can work for thin proxies, but collab-scale state belongs in DOs or a dedicated service—decide per protocol and load-test. |
| Static SPA + SSG | Pages (linked to the repo build) or Workers static assets for dist/; keep API on a Workers route or subdomain (api.…) so cookies, CORS, and Stripe webhooks have a clear origin story. |
| Scheduler / cron | Replace long-running scheduler processes with Cron Triggers on Workers where the job is idempotent and short; heavier work can enqueue to a queue (Queues) or stay on a small VM if you outgrow Worker CPU limits. |
| Observability | Workers Logs, Tail Workers, and tracing integrations replace “SSH and Prometheus on a box” for the edge tier; keep /metrics on any non-Worker services you retain, or adopt CF-compatible metrics. |
| Limits and cost | CPU time, concurrent requests, Durable Object billing, and Hyperdrive query limits are product inputs—set SLOs and load-test collab early. |
| Secrets | Wrangler secrets / dashboard for production; never ship DB or signing keys in the client bundle. |
| AGPL | Hosting on Cloudflare does not change AGPL-3.0 obligations; source remains available per license. |
Local dev: keep Docker Compose (Postgres + Redis) for laptops and CI; use wrangler dev with Hyperdrive local configuration to approximate production DB behavior.
| Goal | Meaning in practice |
|---|---|
| New project | A separate repository or clearly isolated worktree, with a modern default toolchain (lockfile, Node LTS, CI) chosen deliberately—not inherited from 2022-era constraints. |
| Backward compatible (data + crypto) | PostgreSQL data that existing users rely on: rows and bytea blobs remain readable after the migration, using the same client-side and server-stored key material as today, without the old tRPC HTTP contract. Cookie + JWT patterns can stay familiar to users, but the JSON bodies and paths of the new HTTP API are new. Realtime and collab WebSocket binary protocols are optional compatibility targets: simplest path is a new client written against documented protocols, whether or not they are byte-for-byte identical to the old server. |
| Better maintenance | Clear module boundaries, OpenAPI as the contract, Drizzle migrations, test coverage where risk is high (auth, crypto, payments, data transitions), and faster dev feedback (no default tsx --inspect-brk in hot paths, modern bundler, smaller “forked dependency” surface). |
License and obligations: the project is AGPL-3.0 (LICENSE); a restart does not change copyleft or deployment obligations. Keep compliance visible in the new repo.
The existing stack is already described accurately in TECHNICAL_OVERVIEW.md. These points matter most for a restart:
- Monorepo: pnpm workspaces + Turbo; TypeScript project references build server apps and shared
packages/*, while the Quasar client is compiled by Vite/Quasar (not the roottscgraph for the SPA code). - Client (
@deepnotes/client): Vue 3.2 + forked Quasar (@deepnotes/quasar,@deepnotes/quasar-app-vite), Vite ~2.9, Pinia, Tiptap + Yjs + SyncedStore; depends on workspace@deepnotes/app-serverfor the tRPCAppRoutertype. - App server (legacy): Fastify + tRPC v10 on
/trpc+ separate WebSocket handlers undersrc/websocket/(group invites, page moves, password/email, key rotation—dropped in the new stack, etc.); Stripe + (legacy) RevenueCat webhooks; Objection + Knex + Postgres; KeyDB viaDataAbstractionin@stdlib/data(large, cross-cutting). - Other services:
realtime-server(msgpackr-style live protocol),collab-server(Yjs-aligned binary protocol, path e.g.COLLAB_SERVER_URL/page:{pageId}on the client),scheduler,managerCLI. - Schema management:
postgres-init.sqlis a full dump-style artifact; there is no in-repo chain of versioned SQL migrations (a major operational and compatibility risk for any “new project” that must coexist with production).
- Type coupling: the client imports
@deepnotes/app-serverto shareAppRouter, wiring UI code to the entire server package graph. The new stack replaces that with OpenAPI-generated (or Zod-shared) client types. - Data layer:
DataAbstractioncentralizes Redis + pub/sub + in-process LRU and domain-specific hashes; it is powerful but hard to replace incrementally and scatters “truth” about cache key semantics. - Split protocols: business logic is split between tRPC and two WebSocket systems (app + realtime + collab), with sensitive flows (password change, legacy key rotation, etc.) on app-server WebSockets—duplication and more surfaces to regression-test. The restart consolidates HTTP behind one REST+OpenAPI layer and drops key-rotation paths entirely.
- Forked/patched dependencies: the client list includes
@deepnotes/*scoping for Quasar, Vite app plugin,superjson,ioredis,html2canvas, tiptap collaboration cursor, anddotenv-expandpatches—each is ongoing security and upgrade debt.
- Vite 2 and old Quasar app pipelines are far behind current Vite performance and ecosystem; 4 GB heap in client build scripts is a red flag.
- pnpm 7.6.0 and root
engines.node:>=14are out of step with modern LTS and with CI (e.g. Node 18/20 in places per docs)—harmonize early in a new project. - Client is outside
tsconfig.packages.jsonreferences, so the typecheck story is split (Quasar/Vue vs. packages), which often hurts “instant” feedback in the IDE.
- Server
devscripts in several apps usetsx --inspect-brk(breakpoint at startup unless changed)—fine for one-off debugging, expensive as a daily default. - Sparse automated tests in
apps/clientandapps/app-server(documented in TECHNICAL_OVERVIEW.md); most coverage lives in@stdlib/*and@deeplib/misc, so refactors are high manual verification cost.
- Migrations: without a repeatable, ordered migration story, any new service that shares the same DB is gambling on one-off DBA steps.
- Multi-service deployment: five long-running entrypoints (client builds aside) increase coordination cost; a restart is an opportunity to document and eventually consolidate (only after contracts are clear).
- OpenAPI 3 spec (generated or hand-maintained) + Zod/Valibot as the single source of truth for request/response bodies.
- Drizzle
schema.ts(or split table files) drives migrations; the OpenAPI and DB layer stay in sync by convention and CI checks (or codegen from one source—pick one pattern and stick to it).
Legacy tRPC (reference only for migration and parity checklists, not a wire target): the old app exposed ~60+ procedure implementations under apps/app-server/src/trpc/api/, with routers for users (account, pages), sessions, groups, pages. When porting behavior, use that tree and router.ts as a checklist of features to re-cover in REST, not as something to emulate on the wire.
- The old tRPC paths (
POST /trpc/...,superjson) and relying on@deepnotes/app-server’sAppRouterin the client go away. - Feature parity (login, page CRUD, groups, etc.) is implemented as new REST (or resource-style) endpoints. Old mobile/desktop builds that still call
/trpcwould need a dedicated compatibility gateway—out of scope unless you add it explicitly for a straggler user cohort.
srv/fastify registers hand-written handlers under apps/app-server/src/websocket/ (groups, pages/move, user account, key rotation for users and groups). For the new stack:
- Reimplement the flows you still need (e.g. password change, page move, group invites) on new WebSocket paths or the REST layer, with fresh message shapes documented next to OpenAPI.
- Do not reimplement user or group rotate keys (see Decided stack above).
- If you need one “sensitive” channel, prefer fewer long-lived WebSockets and more idempotent REST for anything that is not high-frequency.
- Realtime: JWT from cookies on HTTP upgrade; msgpackr-based custom protocol;
/metricson the HTTP side. Reference:apps/client/src/code/areas/realtime/. - Collab: per-page Yjs-style updates + awareness; URL pattern
…/page:{pageId}; binary messages via lib0 encodings (apps/client/src/code/pages/page/collab/websocket.ts). The legacy collab-server also encodes scheduled key rotation (next_key_rotation_date); the new service omits that path entirely (no background re-encryption; keys stay stable).
Capture message types (@deeplib/misc collab message enums) and on-the-wire order as golden fixtures for whatever you keep byte-compatible; otherwise treat as reimplement with a versioned collab protocol and one released client.
- HTTP-only cookies;
accessTokenandloggedIncan remain the names for WS auth on upgrade if you want minimal client change in those layers. - Redis (standard): session invalidation flags, rate limits, and any cache that replaces the old DataAbstraction-style projections—implemented with portable commands only (see Redis vs KeyDB caveat above). Replace
KEYDB_HOSTS/KEYDB_PASSWORDwith a conventionalREDIS_URL(orREDIS_HOST/PORT/PASSWORD) in the new app’stemplate.env.
- Drizzle should reflect the migrated schema: start from
postgres-init.sql, import or transcribe intoschema.ts, and emit migrations. Columns tied only to rotation (e.g.next_key_rotation_dateonpagesin the old model) can become unused and later dropped in a separate migration, or be left nullable inert during the first cut. - E2E: NON_TECHNICAL_OVERVIEW.md and the Whitepaper remain the product story; the new codebase needs tests for encrypt/decrypt and page load, without rotation code paths.
- Stripe (web) only for the new stack—webhook signing, customer portal, checkout. RevenueCat is not reimplemented. Existing subscribers who only used mobile IAP will need a one-time migration story (e.g. link to Stripe, grace period, or support-led)—set explicitly in product, not in this plan. Device / session tables are modeled in Drizzle as needed.
-
Contracts: OpenAPI + Zod/Valibot
A small@deepnotes/api(name TBD) package contains route handlers’ input/output types and a published OpenAPI document. The Drizzle package stays separate to avoid server importing UI and vice versa. For realtime / collab, add a short appendix (or separate JSON spec) for message kinds and field order. -
Drizzle from day one
One migration chain:drizzle-kit(or equivalent) applied to Postgres; devs never rely on a single frozenpostgres-init.sqlfor drift long term—use it only as the import source for the firstschema.ts. -
HTTP server
Hono on Cloudflare Workers is the default alignment with the Cloudflare hosting decision (same codebase path for REST, middleware, and fetch-handler tests). Fastify remains viable for Node-only targets (e.g. local scripts, a secondary deployment) if the team splits stacks—avoid assuming Fastify plugins work unchanged on Workers without verification. REST routes, no tRPC plugin. Cookie + JWT middleware shared with WebSocket upgrade paths. Rate limiting backed by Redis (see hosting table). -
Redis
Local / CI: Redis 7+ (or LTS) indocker-compose. Production (Cloudflare): managed Redis-compatible service (see Hosting row)—no KeyDB module assumptions. Replaces DataAbstraction with narrower, explicit repositories (cache-aside or simple keys + pub/sub if still needed for multi-instance cache coherence). -
New client application
- Vite 6+ + Vue 3.5+ as a standard SPA (using
vite-ssgfor marketing page SEO). Nuxt SSR is explicitly rejected because DeepNotes is end-to-end encrypted; the server cannot decrypt user content to render it for SEO anyway. - Feature-based folder structure (e.g.,
src/features/auth,src/features/editor) to co-locate components, API clients, and tests for better maintainability. fetch+ openapi-typescript (or hey-api client) generated from the spec—notrpcclient, nosuperjson, no Quasar.- Tiptap + Yjs for the editor if you want to cap risk; collab-server either forked to strip rotation or rewritten against the same Yjs wire.
- Capacitor for mobile and Tauri v2 (or Electron) for desktop after the web app is solid. Decoupling the UI from the native wrappers avoids the heavy Quasar build matrix.
- Vite 6+ + Vue 3.5+ as a standard SPA (using
-
CI/CD and observability
One CI, Node LTS matrix, E2E smoke. Production deploy: Wrangler (or Pages Git integration) to Cloudflare; preview deployments per PR where useful. Prometheus/metricson any long-lived non-Worker services; for Workers, use Cloudflare logging/metrics (and Tail / observability products) as the primary edge story.
- From the legacy
app-server, list tRPC procedures and WebSocket handlers and map them to proposed REST + WS resource names; produce a skeleton OpenAPI (endpoints can501at first). - Transcribe
postgres-init.sqlinto a Drizzle schema and generate migration 0001 (or squash later—goal is a repeatable chain). - Document cookie names, JWT claims, and CORS origins.
- List which
@deepnotes/*forks the new client can avoid entirely.
Exit: OpenAPI v0 + Drizzle schema in source control; feature checklist derived from the old tRPC tree.
Only if you still touch the old monorepo: remove default --inspect-brk, add minimal tests, and align Node/pnpm. Do not invest in extracting tRPC types for the new world—favor Phase 0 instead.
Exit: optional; can be skipped if the team goes straight to the new repository.
- New repo: pnpm + Turborepo 2 (or Nx)—Node 22/24 LTS.
- Docker compose: Postgres + Redis (not KeyDB). New env file with
REDIS_URL-style settings. - Cloudflare:
wrangler.toml(or Wrangler JSON), Hyperdrive config pointing at the same Postgres URL used locally (or a branch DB), Pages project for the client build output; document preview vs production env vars. - CI green: lint, typecheck,
drizzle-kit check, unit smoke; optional deploy job to a Cloudflare preview environment.
- Implement auth and sessions (cookies + JWT) and core pages / groups routes against Drizzle; add realtime and collab services without key-rotation and without tRPC.
- Load tests on collab and realtime only after the protocol is frozen.
- Stripe webhooks; no RevenueCat.
- Feature slice: auth → page list → single page → Yjs collab → groups subset.
- Reuse or port
@stdlib/crypto,@deeplib/miscwhere domain-stable; delete dead code as you go. - Electron and Capacitor after web parity (they multiply CI cost).
- Staged rollout: canary users, then full redirect; old
/trpcstack retired when no supported client still calls it (or keep a read-only legacy deployment for a defined window). - Decommission the old monorepo only when error rates, Stripe, E2E, and data checks (random page decrypt) are green.
| Risk | Mitigation |
|---|---|
| REST hand-written drift vs OpenAPI | Generate types from the spec (or use Zod-to-OpenAPI) and test 4xx/5xx contracts in CI. |
| Collab binary protocol mismatch or dropped rotation | Decide byte parity vs bump collab v2; for v2, ship one new client and retire old together. Document that rotation is no longer a safety valve—rely on strong at-rest and transport crypto without periodic rekey. |
Redis lacks KeyDB’s expiremember and similar |
Redesign those fields as first-class keys or standard hash + TTL; benchmark before cutover. |
| Stripe-only after dropping RevenueCat | User comms and support scripts for any IAP-only customers; one-time data fix if users has provider-specific fields. |
| Dropping key rotation in collab with live old servers | Cut over collab and app together so no mixed fleet runs incompatible rotation expectations. |
| 2FA and group password flows | Still re-test hard; rotation removal does not remove all crypto edge cases. |
| Migration mistakes on live Postgres | Staged env + backup + runbook; Drizzle migrations reviewed like production DDL. |
| Mobile and desktop matrices | Defer Capacitor/Tauri matrix; get web SPA (with vite-ssg for SEO) solid first. |
| Worker CPU time and DO costs under collab load | Load-test Durable Object fan-out and Hyperdrive early; model worst-case concurrent pages and websocket churn. |
| Framework assumes full Node | Prefer Hono on Workers; gate Fastify (or heavy native deps) behind a verified Workers profile or a non-CF deployment path. |
- OpenAPI is the source of truth for public HTTP; client uses generated types or shared Zod.
- Drizzle migrations apply from empty DB to current schema deterministically; production upgrade path is documented.
- < 2 s cold
devAPI start (noinspect-brkby default) on a standard laptop. - Collab + realtime each have at least one integration test against Redis + in-memory or dockerized deps.
- No tRPC and no
superjsonin the new default stack. No RevenueCat. Key rotation code paths are absent and the team signed off on IAP / Stripe user handling. - Zero undocumented framework forks in the new default client, or a short exception list with an owner.
- Cloudflare: API + static/SSG deploy documented; Hyperdrive + external Postgres + Redis proven in staging; collab/realtime path chosen (DO vs separate service) and load-tested.
- NON_TECHNICAL_OVERVIEW.md — product, privacy, plans, and limitations.
- TECHNICAL_OVERVIEW.md — architecture map, commands, and known caveats (the tRPC/KeyDB/RevenueCat parts describe the old app).
template.env— legacy env names; the new app introducesREDIS_*, drops KeyDB- specific names, and does not add RevenueCat variables.apps/app-server/src/trpc/router.tsandapps/app-server/src/trpc/api/**— legacy procedure checklist for feature parity, not a wire spec.apps/app-server/src/websocket/**— legacy WS; user/group rotate-keys are out of scope for the new product.postgres-init.sql— import baseline for Drizzleschema.ts.- Up-to-date Drizzle (schema + migrations) documentation for the version you pin (e.g. via the Context7 MCP in Cursor if available).
This restart is intentionally not tRPC- or KeyDB-compatible on the wire. Success depends on OpenAPI + REST, Drizzle migrations, vanilla Redis, a simpler crypto story (no key rotation, no RevenueCat), and a coordinated rollout of the new HTTP stack with realtime/collab and clients that no longer expect /trpc or scheduled re-keying. Production targets Cloudflare (Workers + Pages, Hyperdrive to Postgres, external Redis, Durable Objects where stateful WebSockets need them). Treat the old monorepo as a behavioral reference and a one-time source of schema and test vectors, then retire it when parity and data checks are proven.