feat(site): add chat debug API layer and panel utilities#23919
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a6066419d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
71a8438 to
d6da1db
Compare
350e732 to
f5c8803
Compare
764e78c to
579e483
Compare
f5c8803 to
42058d9
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42058d9449
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f29f721 to
cffa02e
Compare
d06c9e8 to
648c30f
Compare
cffa02e to
517ce22
Compare
648c30f to
7a26ad4
Compare
517ce22 to
833bf86
Compare
7a26ad4 to
3894e00
Compare
833bf86 to
b9a629f
Compare
|
@codex review |
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
mafredri
left a comment
There was a problem hiding this comment.
Round 30. BLOCKED.
DEREM-7 (empty-string tool_call_id dedup collision) fixed in 2a7158a with || undefined normalization and a regression test. Clean fix.
DEREM-8 (P3, interruptChat missing debug run invalidation) remains silent. interruptChat at chats.ts:987 is the only debug-run-producing mutation without void invalidateChatDebugRuns(queryClient, chatId). The other four mutations (createChatMessage, editChatMessage, promoteChatQueuedMessage, regenerateChatTitle) all have it. No author response, no code change.
Further review blocked until DEREM-8 gets a response: fix, file a ticket, or explain why interruptChat should not invalidate. This needs a human decision.
🤖 This review was automatically generated with Coder Agents.
|
DEREM-8 fix: |
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
mafredri
left a comment
There was a problem hiding this comment.
Round 31. All 16 findings resolved. CI green. No P0-P2 from any reviewer.
DEREM-8 (interruptChat invalidation) fixed in 3447e85. The five debug-run-producing mutations now all invalidate chatDebugRunsKey consistently. Netero's final scan found only style nits. The panel (4 reviewers) found no new defects -- Hisoka and Kite returned "No findings."
Bisky identified 3 P3 test coverage gaps (null defaults for coerceStepResponse, JSON-string options/policy, warnings extraction). These are genuine missing test branches but the existing 84-test suite covers the production code paths comprehensively, and the missing tests are for defensive branches that return safe defaults. Not blocking.
16 findings raised across this review. 12 fixed in code, 4 contested and verified closed by the panel. Every finding got a response. The code is solid: 1249 lines of defensive coercion logic, 676 lines of tests, base64 mojibake prevention, tool call deduplication with empty-ID normalization, consistent cache invalidation across all 5 mutations, and a knip exclusion with a TODO for removal.
No findings. ♠ -- Hisoka, after pulling every thread in 1249 lines of coercion logic
🤖 This review was automatically generated with Coder Agents.
DanielleMaywood
left a comment
There was a problem hiding this comment.
I trust that the utilities are being used btw, I just don't trust they deserve to all be utilities
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Round 33. All 16 findings resolved. No new defects. CI failing but appears to be a base-branch issue (18 checks failed including gen, lint, fmt, all Go tests).
New commit 1faa441ccb23 addresses human reviewer feedback: debug logging queries extracted to chatDebugLogging.ts, verbose inline comments trimmed, 252 lines of new tests added covering all previously-flagged gaps. The test suite now has 100 tests for debugPanelUtils.ts and 67 for chats.test.ts.
Panel verdict: Bisky called the test suite "genuine gems, every one." Mafuuu: "No findings." Hisoka pulled every thread, verified all 16 prior findings resolved, probed the chatKey/chatDebugRunsKey coupling and confirmed all 17 invalidation calls carry exact: true. No reviewer found any P0-P3.
16 findings raised across this review history. 12 fixed in code, 4 contested and verified closed by the panel. The PR ships 1249 lines of defensive JSONB coercion logic, 928 lines of tests, typed API methods, React Query cache invalidation wired to all 5 debug-run-producing mutations, and a cleanly extracted chatDebugLogging.ts module. Ready for merge pending CI.
Pulled every thread. Traced callers, ran both test suites, inspected invalidation key hierarchies, tested coercion edge cases against the type system. The coupling exists but the invariant holds. ♠ -- Hisoka
🤖 This review was automatically generated with Coder Agents.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe2c57d68d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Round 34. No PR code changes since R33 (rebase only). All 16 findings remain resolved. CI still failing (base-branch issue, 20 checks).
Panel (4 reviewers): Bisky, Mafuuu, Hisoka all returned "No findings." Zoro noted the export * from "./chatDebugLogging" barrel re-export is unique in the queries directory (Nit, not blocking). No P0-P3.
The review is complete. Ready for merge pending CI.
🤖 This review was automatically generated with Coder Agents.
Merge activity
|
Change-Id: I6b56825b5a1c767300fdb052cbe4b42df3915ece Signed-off-by: Thomas Kosiewski <tk@coder.com>
`createChatMessage` and `promoteChatQueuedMessage` were awaiting `invalidateChatDebugRuns` inside `onSuccess`, which made their `mutateAsync` promises pend until the Debug panel's active refetch completed. `sendMessage`/promote callers in `AgentChatPage` await that promise before running send-side UI updates, so an open Debug panel was adding avoidable latency to every send and promote. Switch both handlers to fire-and-forget `void` invalidation, matching the pattern already used in `editChatMessage.onSettled`. Change-Id: I8edb16124cadf7eb984c0ab4d0ba299156cbe3eb Signed-off-by: Thomas Kosiewski <tk@coder.com>
`tryDecodeBase64` used `atob` directly, which is lenient and happily turned plain-text strings like `"test"` or `"abcd"` (valid base64 alphabet, length divisible by 4) into mojibake whenever the body was not JSON. Gate the decode with a strict base64 regex and a fatal UTF-8 decode so only truly base64-encoded UTF-8 payloads are rewritten; everything else falls through and the raw string is preserved. Add regression tests in `normalizeAttempts` covering both the preservation path and the happy-path decode of non-JSON base64 text. Change-Id: I691dce0d2f68ab9d271a88dc295eb2f1eacd195f Signed-off-by: Thomas Kosiewski <tk@coder.com>
Streamed responses can carry a `tool_input` streaming delta followed by a finalized `tool_call` summary for the same call ID. Because `isToolCallPartType` accepts both variants, the content loop emitted two rows for the same invocation in the Debug panel. Collapse by call ID inside `coerceStepResponse`: track per-ID index and whether we already have a finalized entry, prefer the finalized payload when a partial was seen first, and ignore stray partials that arrive after the finalized form. Entries without an ID fall through unchanged. Adds regression tests covering the tool_input→tool_call upgrade path, the tool_call→tool_input stability path, and multi-call ordering. Change-Id: Ifb8fe32ce9f8fcec040c7dc2364c00ef8751dc0d Signed-off-by: Thomas Kosiewski <tk@coder.com>
The regression test for the tool_input → tool_call upgrade path used
`'{"query":"fo'` as the partial streaming fixture. `typos` treats `fo`
as a suggestion for `of`/`for`/`do`/`go`/`to`, failing the repo-wide
typos lint step.
Shorten the partial value to `'{"query":"f'` — the test semantic is
unchanged (partial arguments get replaced by the finalized payload)
and the shorter fragment no longer trips the typos rule.
Change-Id: Ief69a7ae91680d69d513b992a12a62a1e0fba45f
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Review round 23 flagged en-dash characters (U+2013) in debug utils comments and the knip whole-file exclusion with no tracking comment. Switch those dashes to ASCII `--` and add a TODO noting that the knip ignore is only needed until the Debug panel components land in PRs 8/9. Change-Id: I7fc85b275f9ea4659d2eb62a12d47eaab95aa90a Signed-off-by: Thomas Kosiewski <tk@coder.com>
coerceRunSummary and coerceStepRequest's options/policy inputs can arrive as JSON-string wrappers after a json.RawMessage round-trip. Run them through deepParse before the isRecord check so the wrapped values are exposed instead of silently dropped. Change-Id: I9dae8bc6afa8ba77b5337a38ad957397a3ee0884 Signed-off-by: Thomas Kosiewski <tk@coder.com>
Go's zero value for string is "" and ChatStreamToolCall.tool_call_id serializes without omitempty, so unset IDs arrive as "" on the wire. Treating "" as a real dedup key collapsed distinct tool calls onto the same Map slot, silently dropping the second call. Normalize at the two extraction sites so the dedup guard and downstream React keys treat it as "no id". Change-Id: I43e38f584ec617223e3853d54b62ed9dfa792924 Signed-off-by: Thomas Kosiewski <tk@coder.com>
interruptChat was the only debug-run-producing mutation that didn't kick the Debug panel query on success. The other four (createChatMessage, editChatMessage, promoteChatQueuedMessage, regenerateChatTitle) all invalidate chatDebugRunsKey. Without it, the panel shows stale "running" for up to the 5s poll interval after the user interrupts. Change-Id: I086ca5dde03439f6bc48458b91f9d0f0e81041fa Signed-off-by: Thomas Kosiewski <tk@coder.com>
Drop inline debug-run invalidation comments (now self-evident from `invalidateChatDebugRuns` call), move debug-logging queries into `chatDebugLogging.ts` so the @public JSDoc annotations can be dropped, and re-export them from `chats.ts` so upstack imports continue to resolve. Change-Id: I461a27027797b579ee148be7b9b84bf48bbf6a87 Signed-off-by: Thomas Kosiewski <tk@coder.com>

Summary
Add API client methods, React Query builders, and unit tests for the chat debug endpoints. Add
debugPanelUtilswith coercion helpers that transform raw debug step data into structured display models for the Debug panel, and wire debug run streaming into the chat store.This is PR 7/9 in the chat debug logging stack.
Changes
site/src/api/api.ts): typed methods for all debug endpoints — list runs, list steps, get/set deployment logging, get/set user logging, set per-chat override.site/src/api/queries/chats.ts):chatDebugRuns,chatDebugSteps,chatDebugLoggingConfig,userDebugLoggingConfigquery/mutation factories withrefetchInterval: 5000for live polling.site/src/pages/AgentsPage/components/RightPanel/DebugPanel/debugPanelUtils.ts):coerceStepRequest/coerceStepResponsethat recursively parse nested JSONB intoStepRequestViewModel/StepResponseViewModelwithMessagePart,ToolDef, andToolCallParttypes. IncludesformatTokenSummary(compact3→5 toknotation) andcompactDurationhelpers.Stack overview
Generated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh