Skip to content

feat: add chat debug retention purge#24943

Merged
ibetitsmike merged 17 commits into
mainfrom
mike/chatd-purge-1v9q
May 5, 2026
Merged

feat: add chat debug retention purge#24943
ibetitsmike merged 17 commits into
mainfrom
mike/chatd-purge-1v9q

Conversation

@ibetitsmike
Copy link
Copy Markdown
Collaborator

@ibetitsmike ibetitsmike commented May 4, 2026

Mux is acting on Mike's behalf.

Adds configurable retention for chat debug data, including the purge query, updated_at index, site config, experimental API, SDK types, frontend lifecycle setting, and docs.

The purge deletes debug runs older than the configured retention window and relies on existing cascades to delete steps. The default retention is 30 days, and setting the value to 0 disables the purge.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Docs preview

📖 View docs preview for docs/ai-coder/agents/platform-controls/chat-debug-retention.md

@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid addition. The retention mechanism is well-structured: independent config key, clean updated_at retention clock, advisory-locked purge inside the existing purgeTick transaction, cascade-aware batch delete, and symmetric frontend/SDK/backend types. The test coverage is strong, especially TestPurgeChatDebugRuns which exercises all three critical paths (old-run deletion with cascade, disabled-retention no-op, and chat-FK cascade). The story selector fix from /retention/i to "Enable conversation retention" shows genuine reasoning about interaction effects. Melody's wire-format chain walk confirmed end-to-end agreement across all four layers. Knuckle confirmed both indexes serve distinct access patterns correctly.

Three P3 findings (one convergent across five reviewers), three nits, and one note.

Pariston's differential diagnosis: "Four framings for the problem this PR addresses. Each names a different root cause and implies a different fix. The author's implemented framing combines all four."

PS. DEREM-1 (Netero first-pass, not re-posted here) identified a missing test for the GetChatDebugRetentionDays error path. If DEREM-2 is addressed by joining the error into chatConfigErr, DEREM-1's test should assert success="false" rather than success="true".

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/dbpurge/dbpurge_test.go
Comment thread coderd/database/dbpurge/dbpurge.go Outdated
Comment thread coderd/database/dbauthz/dbauthz.go
Comment thread coderd/exp_chats.go Outdated
Comment thread coderd/database/dbpurge/dbpurge_test.go Outdated
Comment thread coderd/database/dbpurge/dbpurge.go
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seven R1 findings addressed cleanly in cf43b15. The DEREM-2 fix (chatDebugRetentionErr joined into chatConfigErr) is well-structured: chatRetentionConfigErr gates chat purge, chatDebugRetentionErr gates debug purge independently, chatConfigErr (all three) is returned to surface failures via the iteration metric. The new FailedChatDebugRetentionRead test correctly asserts success="false". The DebugRetentionLoadError story correctly verifies the fallback-to-default behavior. Keep-in-sync comments are now bidirectional.

One new P3 and two nits.

Mafuuu, tracing the upgrade note fix: "By the time the new server is accepting HTTP requests, GetChatDebugRetentionDays has already been called."

🤖 This review was automatically generated with Coder Agents.

Comment thread docs/ai-coder/agents/platform-controls/chat-debug-retention.md Outdated
Comment thread coderd/database/dbpurge/dbpurge.go Outdated
Comment thread coderd/database/dbpurge/dbpurge.go Outdated
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three R2 findings addressed in 90cdbb0. The upgrade note now correctly documents the direct SQL mechanism with proper timing. The batch size comment is properly scoped. The error join is simplified.

One P3 and one nit.

Chopper, on the test gap: "A regression where the chat purge guard is changed from chatRetentionConfigErr == nil to chatConfigErr == nil would make debug retention failure also suppress chat purge. The test would still pass."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/dbpurge/dbpurge_test.go
Comment thread coderd/database/dbpurge/dbpurge_test.go Outdated
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both R3 findings addressed in 2f8dba0. The DEREM-16 fix is well-done: DoAndReturn invokes the tx callback, DeleteOldChats and DeleteOldChatFiles expectations verify chat purge continues when debug retention read fails, and the success=true metric was moved outside the InTx closure to eliminate the double-recording noted in R3.

However, the metric move introduced a regression: one new P3.

Meruem: "The original placement was correct: it sat after recordsPurged inside the closure, which is only reached when the lock is held and all work is done. However, the original code had a different bug. The move fixed that. The residual problem is the lock-not-acquired case."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/dbpurge/dbpurge.go Outdated
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEREM-17 fix verified by all four reviewers. The lockAcquired bool correctly gates the success metric: Meruem traced all four scenarios (lock miss, lock+success, lock+configErr, lock+txErr) and confirmed each produces the right metric state. Mafuuu's contract tracing confirmed auth boundaries, cascade behavior, and the three-way error split are all correct.

One new P3.

Mafuuu's lifecycle trace: "lockAcquired = false path (replica): purgeTick returns nil, doTick records no metric. Correct."

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/dbpurge/dbpurge_test.go
@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEREM-18 fix verified: registry passed to real-DB test, counter asserted > 0 with two old runs deleted. All prior findings remain closed.

One new P3 extending the DEREM-16 precedent to the reverse direction.

Mafuuu: "DEREM-16 established the contract that config-read failures for one subsystem must not suppress purges for orthogonal subsystems. The symmetric path is untested."


coderd/database/dbpurge/dbpurge_test.go:235

P3 [DEREM-19] FailedChatRetentionRead and FailedChatAutoArchiveRead both mock InTx with Return(nil) (callback never executes) and mock GetChatDebugRetentionDays to return 0 (debug purge disabled). Neither test verifies that debug purge continues when chat/auto-archive config reads fail but debug retention is enabled.

DEREM-16 established the contract that config-read failures for one subsystem must not suppress purges for orthogonal subsystems, and applied DoAndReturn to FailedChatDebugRetentionRead to verify chat purge continues when debug retention read fails. The symmetric path is untested: a regression changing the debug purge guard from chatDebugRetentionErr == nil to chatConfigErr == nil would suppress debug purge on any config error, and no test would catch it.

Fix: in FailedChatRetentionRead, use DoAndReturn for InTx, set GetChatDebugRetentionDays to return a non-zero value, and add a DeleteOldChatDebugRuns expectation. (Mafuuu)

🤖

🤖 This review was automatically generated with Coder Agents.

@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 19 findings across 7 rounds are closed (15 author-fixed, 4 orchestrator-dropped). No new findings from Netero or the panel.

The PR is clean. The feature is well-structured: independent retention config, correct updated_at retention clock, advisory-locked purge with proper metric gating, cascade-aware batch delete, symmetric types across Go/SDK/TypeScript, and thorough test coverage. The fix chain across rounds improved the code substantially: metric observability (DEREM-2), upgrade documentation (DEREM-12), test contract fidelity (DEREM-16, DEREM-19), multi-replica metric correctness (DEREM-17), and counter assertion coverage (DEREM-18).

Chopper's final verification: "The fix addresses the root cause, not just the symptom."

🤖 This review was automatically generated with Coder Agents.

@ibetitsmike
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/database/dbpurge/dbpurge.go Outdated
Comment thread docs/ai-coder/agents/platform-controls/chat-debug-retention.md Outdated
Comment thread coderd/exp_chats.go
@ibetitsmike ibetitsmike requested a review from ThomasK33 May 5, 2026 13:02
@ibetitsmike ibetitsmike force-pushed the mike/chatd-purge-1v9q branch from f1d7b79 to c905792 Compare May 5, 2026 20:13
@ibetitsmike ibetitsmike marked this pull request as ready for review May 5, 2026 20:37
@ibetitsmike ibetitsmike merged commit 2874d4b into main May 5, 2026
36 checks passed
@ibetitsmike ibetitsmike deleted the mike/chatd-purge-1v9q branch May 5, 2026 20:37
@github-actions github-actions Bot locked and limited conversation to collaborators May 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants