Skip to content

fix(core): tolerate orphan part projection from cascade-delete race#33134

Open
randomvariable wants to merge 1 commit into
anomalyco:devfrom
randomvariable:fix-orphan-part-projection
Open

fix(core): tolerate orphan part projection from cascade-delete race#33134
randomvariable wants to merge 1 commit into
anomalyco:devfrom
randomvariable:fix-orphan-part-projection

Conversation

@randomvariable

@randomvariable randomvariable commented Jun 20, 2026

Copy link
Copy Markdown

Issue for this PR

Closes #31990

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Occasionally the app crashes while projecting session events into SQLite, dumping a raw stack to the TUI with no log line. The failing query is an UPSERT into part during step/turn finish, failing with a foreign-key ConstraintError.

The cause is a cascade-delete race. MessageUpdated and PartUpdated are separate durable events, each committed in its own BEGIN IMMEDIATE transaction (event.ts), so they aren't atomic. part.message_id is a NOT NULL FK to message.id with onDelete: cascade, and SQLite checks FKs at COMMIT. Sequence: message M is committed, then another fiber publishes MessageRemoved(M) (a revert, removeMessage, compaction prune, or the summarize fork) which cascade-deletes M and its parts; meanwhile the aborting fiber's cleanup() still flushes a trailing snapshot patch part referencing M. That part's insert fails the FK at COMMIT, Effect.orDie turns it into a defect, and the fiber crashes.

Two changes:

  • event.ts: log eventID/eventType/aggregateID via Effect.tapError before Effect.orDie at the durable-commit funnel, so DB commit failures are visible in logs instead of only as a raw TUI stack. Crash behaviour is unchanged. tapError is sufficient here because the deferred-constraint commit failure surfaces as a typed SqlError in the error channel (the effect-sqlite layer rolls back and re-fails), not as a defect.
  • session/projector.ts: in the PartUpdated projection, skip and warn when the parent message row is gone, mirroring the FK's own cascade intent. This fabricates nothing and is replay-safe — under replay events apply in seq order, so the same MessageRemoved → late PartUpdated sequence deterministically hits the skip. The early return also skips the usage accounting, which is correct since no part is persisted.

This makes the orphan benign and logged. It does not stop the source from emitting the stray part; once the new log confirms which delete vector fires in practice, a source-side guard is a sensible follow-up.

How did you verify your code works?

  • Added a test in packages/core/test/session-projector.test.ts that projects MessageUpdated(M)MessageRemoved(M) → a late PartUpdated(part, messageID=M) and asserts no defect is raised and the part row is absent.
  • bun typecheck clean in packages/core.
  • bun test test/session-projector.test.ts → 11 pass / 0 fail.

Screenshots / recordings

Not a UI change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

A concurrent MessageRemoved cascade-delete can remove an assistant
message (and its parts) while an aborting fiber still flushes a trailing
snapshot patch part. The PartUpdated projection then upserts a part whose
message_id FK has no parent row; SQLite enforces the FK at COMMIT, the
transaction re-fails as a SqlError, and Effect.orDie turns it into a
defect that crashes the prompt fiber and prints a raw stack to the TUI
with no log line.

- event.ts: log (eventID/eventType/aggregateID) before orDie at the
  durable-commit funnel so DB commit failures are observable instead of
  surfacing only as a raw TUI stack.
- projector.ts: in the PartUpdated projection, skip-and-warn when the
  parent message row is absent, mirroring the FK's onDelete: cascade
  intent. Replay-safe: fabricates nothing, deterministic under seq order.

Fixes anomalyco#31990
@github-actions github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Jun 20, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] SQLite UPSERT into part table fails during step-finish event projection

1 participant