fix(core): tolerate orphan part projection from cascade-delete race#33134
Open
randomvariable wants to merge 1 commit into
Open
fix(core): tolerate orphan part projection from cascade-delete race#33134randomvariable wants to merge 1 commit into
randomvariable wants to merge 1 commit into
Conversation
A concurrent MessageRemoved cascade-delete can remove an assistant message (and its parts) while an aborting fiber still flushes a trailing snapshot patch part. The PartUpdated projection then upserts a part whose message_id FK has no parent row; SQLite enforces the FK at COMMIT, the transaction re-fails as a SqlError, and Effect.orDie turns it into a defect that crashes the prompt fiber and prints a raw stack to the TUI with no log line. - event.ts: log (eventID/eventType/aggregateID) before orDie at the durable-commit funnel so DB commit failures are observable instead of surfacing only as a raw TUI stack. - projector.ts: in the PartUpdated projection, skip-and-warn when the parent message row is absent, mirroring the FK's onDelete: cascade intent. Replay-safe: fabricates nothing, deterministic under seq order. Fixes anomalyco#31990
Contributor
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #31990
Type of change
What does this PR do?
Occasionally the app crashes while projecting session events into SQLite, dumping a raw stack to the TUI with no log line. The failing query is an UPSERT into
partduring step/turn finish, failing with a foreign-keyConstraintError.The cause is a cascade-delete race.
MessageUpdatedandPartUpdatedare separate durable events, each committed in its ownBEGIN IMMEDIATEtransaction (event.ts), so they aren't atomic.part.message_idis a NOT NULL FK tomessage.idwithonDelete: cascade, and SQLite checks FKs at COMMIT. Sequence: messageMis committed, then another fiber publishesMessageRemoved(M)(a revert,removeMessage, compaction prune, or the summarize fork) which cascade-deletesMand its parts; meanwhile the aborting fiber'scleanup()still flushes a trailing snapshotpatchpart referencingM. That part's insert fails the FK at COMMIT,Effect.orDieturns it into a defect, and the fiber crashes.Two changes:
event.ts: logeventID/eventType/aggregateIDviaEffect.tapErrorbeforeEffect.orDieat the durable-commit funnel, so DB commit failures are visible in logs instead of only as a raw TUI stack. Crash behaviour is unchanged.tapErroris sufficient here because the deferred-constraint commit failure surfaces as a typedSqlErrorin the error channel (the effect-sqlite layer rolls back and re-fails), not as a defect.session/projector.ts: in thePartUpdatedprojection, skip and warn when the parent message row is gone, mirroring the FK's own cascade intent. This fabricates nothing and is replay-safe — under replay events apply inseqorder, so the sameMessageRemoved→ latePartUpdatedsequence deterministically hits the skip. The early return also skips the usage accounting, which is correct since no part is persisted.This makes the orphan benign and logged. It does not stop the source from emitting the stray part; once the new log confirms which delete vector fires in practice, a source-side guard is a sensible follow-up.
How did you verify your code works?
packages/core/test/session-projector.test.tsthat projectsMessageUpdated(M)→MessageRemoved(M)→ a latePartUpdated(part, messageID=M)and asserts no defect is raised and the part row is absent.bun typecheckclean inpackages/core.bun test test/session-projector.test.ts→ 11 pass / 0 fail.Screenshots / recordings
Not a UI change.
Checklist