feat(coderd/x/chatd): capture generic error debug logs by default#26557
feat(coderd/x/chatd): capture generic error debug logs by default#26557johnstcn wants to merge 1 commit into
Conversation
Persist a minimal chat_turn debug run+step when a turn fails with an unclassified (generic) terminal error, even when debug logging is off. Show the Debug tab whenever a chat has captured runs.
There was a problem hiding this comment.
Code Review
Reviewed errors-only default capture for chatd debug logs and the frontend Debug tab visibility change. The design is well-structured: lazy run materialization via errorRunEnsurer, clean gating on StatusError && ChatErrorKindGeneric, and the sync.Once dedup in the ensurer is correct. One bug found.
Found 1 issue.
| } | ||
|
|
||
| stepNum := nextStepNumber(rc.RunID) | ||
| step, err := h.svc.CreateStep(ctx, CreateStepParams{ |
There was a problem hiding this comment.
🟡 IMPORTANT: captureDeferredError passes the raw ctx to CreateStep and UpdateStep, but finish (the full-recording path) wraps it with stepFinalizeContext(ctx) before DB calls:
// finish does this:
updateCtx, cancel := stepFinalizeContext(ctx)
defer cancel()
h.svc.UpdateStep(updateCtx, ...)stepFinalizeContext calls context.WithoutCancel(ctx) with a timeout, so finalization survives parent context cancellation. Without this, if the HTTP request context is canceled right after the model returns a generic error (e.g., client drops the connection at the same moment the provider sends a 400), both CreateStep and UpdateStep will fail with a canceled context error and the capture is silently skipped.
This is particularly likely in the stream path where captureDeferredError is called inside the iterator closure: by the time the stream error propagates and finalize runs, the request context may already be in the process of cancellation.
| step, err := h.svc.CreateStep(ctx, CreateStepParams{ | |
| stepCtx, cancel := stepFinalizeContext(ctx) | |
| defer cancel() | |
| stepNum := nextStepNumber(rc.RunID) | |
| step, err := h.svc.CreateStep(stepCtx, CreateStepParams{ |
What
Capture chat debug logs for unexpected errors even when debug logging is off, and surface the Debug tab whenever a chat has any captured runs.
Two behaviors:
Errors-only default capture. When full debug logging is disabled, a chat turn that fails with an unclassified error still records a minimal debug run. Capture is gated to:
StatusErroronly (interruptions, cancellations, and step-level timeouts are skipped).chaterror.Classify(err).Kind == ChatErrorKindGenericonly (auth, config, usage_limit, rate_limit, overloaded, timeout, etc. are skipped as known operational conditions).The run is lazily materialized on the first qualifying error via a context-threaded ensurer, so error-free chats write nothing. Captured data is minimal: the run plus one failing step with the normalized error payload only (no
RecordHTTP, request/response/usage/attempts).Full debug logging behavior is unchanged (eager run/step creation, full payloads, all outcomes).
Debug tab visibility. The tab now shows when
debugLoggingEnabled || hasDebugRuns, reusing the existingchatDebugRunsquery (shared key dedupes with the panel). NouseEffectadded.Testing
chat_turnerror run+step.chatdebugunit tests covering generic-capture, non-generic skip, success skip, cancellation skip, no-ensurer skip, and at-most-once run creation.go test ./coderd/x/chatd/...,go vet,golangci-lint, frontendtsc+ biome + circular-deps + react-compiler + knip, andmake pre-commitall green.Deferred (documented
TODO(chatd-debug))Kind-accurate errors-only capture for internal runs (
KindQuickgen/KindTitleGeneration/KindCompaction) is left as a smaller follow-up. The user-visible signal is already satisfied: a failing turn (including a compaction failure, which propagates to the chat_turn error ensurer) surfaces the Debug tab. See theTODO(chatd-debug)notes inquickgen.goandchatloop/compaction.go.Implementation plan
The full plan also scoped kind-accurate internal-run capture (quickgen/compaction/title) for the errors-only level. That was intentionally deferred to keep this change focused on the top-level chat_turn path plus the frontend; see the
TODO(chatd-debug)notes.