feat: add chat debug retention purge#24943
Conversation
Docs preview📖 View docs preview for |
|
/coder-agents-review |
There was a problem hiding this comment.
Solid addition. The retention mechanism is well-structured: independent config key, clean updated_at retention clock, advisory-locked purge inside the existing purgeTick transaction, cascade-aware batch delete, and symmetric frontend/SDK/backend types. The test coverage is strong, especially TestPurgeChatDebugRuns which exercises all three critical paths (old-run deletion with cascade, disabled-retention no-op, and chat-FK cascade). The story selector fix from /retention/i to "Enable conversation retention" shows genuine reasoning about interaction effects. Melody's wire-format chain walk confirmed end-to-end agreement across all four layers. Knuckle confirmed both indexes serve distinct access patterns correctly.
Three P3 findings (one convergent across five reviewers), three nits, and one note.
Pariston's differential diagnosis: "Four framings for the problem this PR addresses. Each names a different root cause and implies a different fix. The author's implemented framing combines all four."
PS. DEREM-1 (Netero first-pass, not re-posted here) identified a missing test for the GetChatDebugRetentionDays error path. If DEREM-2 is addressed by joining the error into chatConfigErr, DEREM-1's test should assert success="false" rather than success="true".
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All seven R1 findings addressed cleanly in cf43b15. The DEREM-2 fix (chatDebugRetentionErr joined into chatConfigErr) is well-structured: chatRetentionConfigErr gates chat purge, chatDebugRetentionErr gates debug purge independently, chatConfigErr (all three) is returned to surface failures via the iteration metric. The new FailedChatDebugRetentionRead test correctly asserts success="false". The DebugRetentionLoadError story correctly verifies the fallback-to-default behavior. Keep-in-sync comments are now bidirectional.
One new P3 and two nits.
Mafuuu, tracing the upgrade note fix: "By the time the new server is accepting HTTP requests, GetChatDebugRetentionDays has already been called."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All three R2 findings addressed in 90cdbb0. The upgrade note now correctly documents the direct SQL mechanism with proper timing. The batch size comment is properly scoped. The error join is simplified.
One P3 and one nit.
Chopper, on the test gap: "A regression where the chat purge guard is changed from chatRetentionConfigErr == nil to chatConfigErr == nil would make debug retention failure also suppress chat purge. The test would still pass."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Both R3 findings addressed in 2f8dba0. The DEREM-16 fix is well-done: DoAndReturn invokes the tx callback, DeleteOldChats and DeleteOldChatFiles expectations verify chat purge continues when debug retention read fails, and the success=true metric was moved outside the InTx closure to eliminate the double-recording noted in R3.
However, the metric move introduced a regression: one new P3.
Meruem: "The original placement was correct: it sat after recordsPurged inside the closure, which is only reached when the lock is held and all work is done. However, the original code had a different bug. The move fixed that. The residual problem is the lock-not-acquired case."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
DEREM-17 fix verified by all four reviewers. The lockAcquired bool correctly gates the success metric: Meruem traced all four scenarios (lock miss, lock+success, lock+configErr, lock+txErr) and confirmed each produces the right metric state. Mafuuu's contract tracing confirmed auth boundaries, cascade behavior, and the three-way error split are all correct.
One new P3.
Mafuuu's lifecycle trace: "lockAcquired = false path (replica): purgeTick returns nil, doTick records no metric. Correct."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
DEREM-18 fix verified: registry passed to real-DB test, counter asserted > 0 with two old runs deleted. All prior findings remain closed.
One new P3 extending the DEREM-16 precedent to the reverse direction.
Mafuuu: "DEREM-16 established the contract that config-read failures for one subsystem must not suppress purges for orthogonal subsystems. The symmetric path is untested."
coderd/database/dbpurge/dbpurge_test.go:235
P3 [DEREM-19] FailedChatRetentionRead and FailedChatAutoArchiveRead both mock InTx with Return(nil) (callback never executes) and mock GetChatDebugRetentionDays to return 0 (debug purge disabled). Neither test verifies that debug purge continues when chat/auto-archive config reads fail but debug retention is enabled.
DEREM-16 established the contract that config-read failures for one subsystem must not suppress purges for orthogonal subsystems, and applied DoAndReturn to FailedChatDebugRetentionRead to verify chat purge continues when debug retention read fails. The symmetric path is untested: a regression changing the debug purge guard from chatDebugRetentionErr == nil to chatConfigErr == nil would suppress debug purge on any config error, and no test would catch it.
Fix: in FailedChatRetentionRead, use DoAndReturn for InTx, set GetChatDebugRetentionDays to return a non-zero value, and add a DeleteOldChatDebugRuns expectation. (Mafuuu)
🤖
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All 19 findings across 7 rounds are closed (15 author-fixed, 4 orchestrator-dropped). No new findings from Netero or the panel.
The PR is clean. The feature is well-structured: independent retention config, correct updated_at retention clock, advisory-locked purge with proper metric gating, cascade-aware batch delete, symmetric types across Go/SDK/TypeScript, and thorough test coverage. The fix chain across rounds improved the code substantially: metric observability (DEREM-2), upgrade documentation (DEREM-12), test contract fidelity (DEREM-16, DEREM-19), multi-replica metric correctness (DEREM-17), and counter assertion coverage (DEREM-18).
Chopper's final verification: "The fix addresses the root cause, not just the symptom."
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
f1d7b79 to
c905792
Compare
Adds configurable retention for chat debug data, including the purge query, updated_at index, site config, experimental API, SDK types, frontend lifecycle setting, and docs.
The purge deletes debug runs older than the configured retention window and relies on existing cascades to delete steps. The default retention is 30 days, and setting the value to 0 disables the purge.