Skip to content

feat: add list_templates ranking telemetry#26458

Open
jaaydenh wants to merge 28 commits into
mainfrom
chatd-list-templates-telemetry
Open

feat: add list_templates ranking telemetry#26458
jaaydenh wants to merge 28 commits into
mainfrom
chatd-list-templates-telemetry

Conversation

@jaaydenh

Copy link
Copy Markdown
Contributor

Summary

Adds observability for the list_templates ranking decision (from #25037) and its downstream acceptance, so the ranking thesis can be evaluated with data instead of speculation. The affinity score stays internal to the tool response; this telemetry surfaces it in logs and metrics, never to the model.

Stacked on #25037 (base: workspace-bdje).

What's added

Tier 1 — decision log. One structured Info log (list_templates decision) per call: correlation IDs (chat_id, owner_id, tool_call_id), query_present/query_len (raw query text is never logged), candidate/visible/returned counts, top + runner-up query/affinity scores, the affinity gap, the confidence thresholds, ranking_version, the outcome, a recommendation_reason, and signals_load_failed. The reason comes straight from selectTemplateRecommendation so the log cannot drift from the actual decision branch.

Tier 2 — Prometheus metrics (prefix coderd_chatd_):

Metric Type Labels
list_templates_outcome_total counter outcome = recommended / ask_user / no_matches / no_templates
list_templates_signals_failures_total counter
list_templates_affinity_gap histogram recommended
template_recommendation_followup_total counter outcome (see Tier 3)

The affinity-gap histogram is only observed when affinity is the deciding signal (no query, or the top two share a query tier), so the gap is never negative or misleading when query relevance dominates.

Tier 3 — closed loop (acceptance). An in-memory RecommendationTracker (TTL + max-entries eviction, consume-once) records the recommendation + shown page per chat in list_templates, and create_workspace classifies the template it builds into accepted_recommendation, overrode_with_listed_template, created_listed_without_recommendation, created_unlisted_template, or no_recent_list_templates. Classification runs only on genuine creation (after the idempotent early-return path).

Downstream quality ("was the workspace used or deleted within N days") is intentionally left reconstructable from durable data (the persisted transcript + chats.workspace_id) rather than built as an always-on background collector, which would be disproportionate for an experimental feature and would duplicate gauges across replicas.

Where to view it

  • Logs (Tier 1 + Tier 3): coderd server log stream, messages list_templates decision and create_workspace recommendation follow-up (info level).
  • Metrics (Tier 2 + Tier 3): coderd Prometheus endpoint (--prometheus-enable, default 127.0.0.1:2112).

Design notes / limitations

  • No import cycle: chatloop imports chattool, so the metrics hooks are small interfaces declared in chattool that *chatloop.Metrics satisfies structurally, wired from chatd.go.
  • Tier 3 acceptance is best-effort: in-memory state is lost on restart and not shared across replicas, so a create handled on a different replica than its list classifies as no_recent_list_templates. Treat the acceptance metric as directional; the durable source of truth is the transcript + DB.
  • ranking_version is a hand-maintained constant; bump it when the scoring formula or thresholds change.

Testing

  • go test ./coderd/x/chatd/chattool/... ./coderd/x/chatd/chatloop/... (incl. new tracker classification/TTL/eviction tests, metric-gating tests, reason coverage, and list_templates + create_workspace integration tests); race-clean.
  • make gen regenerated the Prometheus metrics docs for the four new metrics.
  • Pre-commit (gen/fmt/lint/build) passing.

jaaydenh and others added 25 commits May 7, 2026 12:06
…main

# Conflicts:
#	coderd/x/chatd/chattool/createworkspace.go
Replace the lexicographic template-ranking comparator in list_templates
with a frecency score (frequency discounted by recency), per reviewer
feedback.

- Add GetTemplateRankingSignalsByOwnerID, returning the user's recent
  active and recently-deleted workspace counts, last usage, and the count
  of distinct active developers in the org. Recently-deleted workspaces
  now contribute (recovering history the deleted=false filter discarded),
  scoped to a lookback window, and the prebuilds system user is excluded
  from the org popularity count. Replaces
  GetWorkspaceUsageGroupedByTemplateIDByOwnerID.
- Compute the affinity score in Go
  (Wp*(active + Wd*deleted)*0.5^(age/half_life) + Wo*ln(1+org_devs))
  because sqlc cannot reliably compile the parameterized decay
  expression; the query returns the raw signals. Weights, half-life, and
  lookback are explicit constants.
- Recommendation confidence is now a single score comparison: a decisive
  query match recommends on its own, otherwise the top score must clear a
  floor derived from the active-developer minimum and lead the runner-up
  by a derived margin. Stale-but-frequent usage no longer recommends.
- Replace the AsSystemRestricted call for the cross-user org count with a
  narrow dbauthz wrapper checking workspace-owner read plus a
  template-metadata read.
- Clarify list_templates/read_template guidance in the detached prompt.
Address the latest PR review feedback for frecency-based template ranking:

- Authorize template ranking signals against the exact candidate template IDs
  using the same prepared-filter semantics as GetTemplatesWithFilter, so users
  with template ACL access keep ranking signals without broad org-wide template
  read.
- Split the dbauthz mock coverage for org-scoped and any-organization calls;
  normalize duplicate subtest suffixes in the dbauthz method-test harness.
- Surface deleted-only personal usage evidence in list_templates output with a
  recently-deleted count and last-used timestamp.
- Assert the raw SQL query returns the maximum last_used_at value.
- Clarify detached prompt guidance for user_selection_required and conditional
  read_template usage.
- Document ListTemplatesOptions requirements and defaults.
Address CRF-26 by adding list_templates operation context to the user-facing
asOwner authorization error response.
Keep TestChatSystemPrompt's detached workspace awareness expectation in sync
with the updated list_templates guidance.
…truction

list_templates previously taught the model a four-value selection_hint
enum, an eleven-value recommendation_reason, per-item relevance_signals,
and a derived user_selection_required flag, re-explained in two tool
descriptions and two prompt sections. The ranked output now carries a
single fixed next_step instruction at the point of use plus an optional
recommended_template_id, and the per-item payload keeps only raw usage
evidence. Ranking, confidence thresholds, and authorization are
unchanged. Prompts reference the field through chattool.NextStepField so
they cannot drift.

Documents how agent tool calls work, including template selection and
workspace creation, in docs/ai-coder/agents/tools.
Asserts the recommendation fires for recent deleted-only personal usage
(CRF-28), adds the both-above-floor-with-large-gap recommendation case
(CRF-29), and rewords the affinity-score comments to drop the inaccurate
sqlc fragility claim; the score lives in Go so the ranking policy and its
confidence thresholds stay in one place.
Shows a concrete query and ranked two-template response in the agent
tools doc, and clarifies that templates are scored independently with
only raw evidence fields returned.
Add observability for the list_templates ranking decision and its
downstream acceptance.

Tier 1: a structured decision log per call with correlation IDs,
ranking inputs (scores, gap, thresholds), outcome, and reason. The
raw query text is never logged, only its presence and length.

Tier 2: Prometheus metrics for recommendation outcome, signal-load
failures, and the affinity gap between the top two candidates.

Tier 3: an in-memory, best-effort RecommendationTracker correlates a
recommendation with the template create_workspace later builds,
recorded as template_recommendation_followup_total plus a log.

The affinity score stays internal to the response; the telemetry
surfaces it in logs and metrics without exposing it to the model.

Regenerates the Prometheus metrics docs for the new metrics.
@github-actions

Copy link
Copy Markdown

Docs preview

📖 View docs preview for docs/admin/integrations/prometheus.md

@jaaydenh jaaydenh changed the title feat(coderd/x/chatd): add list_templates ranking telemetry feat: add list_templates ranking telemetry Jun 17, 2026
@jaaydenh

Copy link
Copy Markdown
Contributor Author

/coder-agents-review
@codex review

@coder-agents-review

coder-agents-review Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Chat: Review in progress | View chat
Requested: 2026-06-18 12:41 UTC by @jaaydenh
Spend: $68.59 / $100.00

Review history
  • R1 (2026-06-17): 14 reviewers, 6 Nit, 2 Note, 3 P3, COMMENT. Review
  • R2 (2026-06-18): 12 reviewers, 9 Nit, 2 Note, 3 P3, COMMENT. Review

deep-review v0.8.0 | Round 3 | a5a4c49..ab4fb75

Last posted: Round 3, 15 findings (4 P3, 9 Nit, 2 Note), COMMENT. Review

Finding inventory

Law analysis

Effective LOC: 2797. Head SHA: 137ccf9. Verdict: Don't split. Enforcement: Advisory. The three core concerns (scoring, telemetry, tracker) form a linear dependency chain. 11 lines of unrelated cleanup are independently mergeable but trivial.

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Author fixed (137ccf9) listtemplates.go:258 Pagination overwrites tracker listed set, misclassifying prior-page templates as unlisted R1 Hisoka P3, Chopper P3, Ryosuke P3, Knov P3, Pariston P3, Mafuuu Note Yes
CRF-2 P3 Author fixed (137ccf9) recommendationtracker_internal_test.go:109 Eviction test only asserts oldest evicted, never asserts newer entries survived R1 Bisky Yes
CRF-3 P3 Author fixed (137ccf9) listtemplates.go:448 "Tier 1/2/3" labels in comments reference vocabulary defined only in the PR description R1 Gon Yes
CRF-4 Nit Author fixed (137ccf9) listtemplates.go:428 listTemplatesOutcome default case silently maps unknown NextStep values to ask_user R1 Mafuuu Nit, Chopper Note Yes
CRF-5 Nit Author fixed (137ccf9) chatloop/metrics.go:141 template_recommendation_followup_total help text omits outcome label values R1 Leorio Yes
CRF-6 Nit Author fixed (137ccf9) createworkspace.go:396 Follow-up log missing owner_id unlike companion decision log R1 Chopper Yes
CRF-7 Nit Author fixed (137ccf9) recommendationtracker.go:24 recommendationFollowupListedNoRec abbreviation inconsistent with siblings R1 Gon Yes
CRF-8 Nit Author fixed (137ccf9) recommendationtracker_internal_test.go:101 Use for range maxEntries (Go 1.22+) R1 Ging-Go Yes
CRF-9 Note Author accepted R2 (wiring proven by integration test, classification unit-tested) createworkspace_internal_test.go:2114 Only accepted_recommendation followup path integration-tested through create_workspace R1 Bisky Yes
CRF-10 Nit Author fixed (137ccf9) chatd.go:207 Field comment restates RecommendationTracker type doc R1 Gon Yes
CRF-11 Note Author accepted R2 (counter intentionally per-call, inflation is interpretation caveat) listtemplates.go:246 Outcome counter increments on every page fetch, not just initial ranking R1 Knov Yes
CRF-12 Nit Author contested; panel closed R3 (5/5 accept scope defense) dbauthz_test.go:3609 Three GetTemplateRankingSignalsByOwnerID subtests share one name, defeating individual addressability R2 Netero Yes
CRF-13 Nit Author fixed (ab4fb75) listtemplates.go:457 Comment says affinity score "stays internal" but function logs it; means "not in tool result" R2 Gon Nit, Leorio Nit Yes
CRF-14 Nit Author fixed (ab4fb75) chatloop/metrics.go:42 Section heading restates what field names already show R2 Gon Yes
CRF-15 P3 Open recommendationtracker_internal_test.go:42 AccumulatesListedAcrossPages test proves page-1 survival, not page-2 accumulation R3 Bisky Yes

Contested and acknowledged

CRF-9 (Note, createworkspace_internal_test.go:2114) - Only accepted_recommendation followup integration-tested

  • Finding: Only the accepted_recommendation followup path is exercised through the full create_workspace handler. The other four classification outcomes are covered by tracker unit tests only.
  • Author defense: The handler wiring is proven by this integration test and the four other classifications are unit-tested in recommendationtracker_internal_test.go (now including cross-page accumulation), so this is not a coverage gap.
  • Author accepted: The integration test proves wiring. The classification logic is non-trivial (5 branches) and fully unit-tested. Exercising all 5 through the handler would test the wiring 5 times with the same result.

CRF-12 (Nit, dbauthz_test.go:3609) - Three subtests share one s.Run name

  • Finding: Three GetTemplateRankingSignalsByOwnerID subtests use the same s.Run name, defeating individual addressability. The setup_test.go #NN-suffix workaround was introduced to accommodate this.
  • Author defense: The test file is identical to origin/main, introduced by the already-merged feat: rank chat workspace templates #25037. This PR's scope is telemetry in coderd/x/chatd. The cleanup belongs in a standalone follow-up to keep scope bounded.
  • Status: Contested. Panel closed R3 (5/5 accept). File has zero changed lines in this PR; code is on main from feat: rank chat workspace templates #25037.

CRF-11 (Note, listtemplates.go:246) - Outcome counter increments on every page fetch

  • Finding: A three-page session produces three list_templates_outcome_total increments for the same recommendation, inflating the recommended:ask_user ratio.
  • Author defense: The counter is intentionally per-call (the help text says "Total list_templates calls"), so pagination inflating the recommended numerator is an interpretation caveat rather than a bug. The CRF-1 fix corrects the related follow-up classification across pages.
  • Author accepted: The per-call semantics are accurate and the metric contract is honest. The inflation is a known property, not a defect.

Round log

Round 1

Panel (14 reviewers). 3 P3, 6 Nit, 2 Note. Reviewed against 9a91632..3017d21.

Round 2

Panel (11 reviewers). CRF-1 through CRF-8, CRF-10 verified fixed. CRF-9, CRF-11 accepted. 3 new Nit (CRF-12, CRF-13, CRF-14). 0 new P0-P3. Reviewed against f1ce101..137ccf9.

Round 3

Panel (5 reviewers). CRF-12 contested, panel closed (5/5 accept scope defense). CRF-13, CRF-14 verified fixed. 1 new P3 (CRF-15). Reviewed against a5a4c49..ab4fb75.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3017d21d08

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chattool/listtemplates.go

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean telemetry design: three-tier observability (structured decision log, Prometheus metrics, acceptance tracker) that is purely additive, does not alter tool behavior, and degrades gracefully when partial. The RecommendationTracker is well-bounded (TTL, max entries, consume-once), the interface split avoids the import cycle correctly, and the test suite covers every classification path with real DB integration tests and mock clocks for deterministic TTL. Netero (mechanical first pass) found no issues.

3 P3, 6 Nit, 2 Note.

The main convergent finding (5 of 14 reviewers flagged independently): Record overwrites the tracker's listed set on each page fetch instead of accumulating, so templates from earlier pages are misclassified as "unlisted" rather than "overrode_with_listed_template." The primary signal (accepted_recommendation) is unaffected. The fix is straightforward: union listed IDs across pages or gate Record on page 1.

"I tried to build a case that it doesn't, and I couldn't." (Pariston, on whether the in-memory tracker justifies its complexity over the durable source of truth)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chattool/listtemplates.go Outdated
Comment thread coderd/x/chatd/chattool/recommendationtracker_internal_test.go
Comment thread coderd/x/chatd/chattool/listtemplates.go Outdated
Comment thread coderd/x/chatd/chattool/listtemplates.go Outdated
Comment thread coderd/x/chatd/chatloop/metrics.go Outdated
Comment thread coderd/x/chatd/chattool/recommendationtracker.go Outdated
Comment thread coderd/x/chatd/chattool/recommendationtracker_internal_test.go Outdated
Comment thread coderd/x/chatd/chattool/createworkspace_internal_test.go
Comment thread coderd/x/chatd/chatd.go Outdated
Comment thread coderd/x/chatd/chattool/listtemplates.go
Base automatically changed from workspace-bdje to main June 18, 2026 05:41
Resolve reviewer findings on the telemetry PR:

- RecommendationTracker.Record accumulates listed template IDs across
  pages of the same result, so a follow-up build from a later page is
  classified as listed rather than unlisted (CRF-1).
- list_templates records the affinity gap only when ranking signals
  loaded; a failed load leaves affinity scores at zero and would
  pollute the histogram (Codex P2).
- listTemplatesOutcome maps NextStepAskUser explicitly and routes
  unknown next steps to an "unknown" bucket (CRF-4).
- template_recommendation_followup_total help text enumerates its
  outcome values; metrics docs regenerated (CRF-5).
- create_workspace follow-up log includes owner_id (CRF-6).
- Rename recommendationFollowupListedNoRec to
  recommendationFollowupListedNoRecommendation (CRF-7).
- Drop "Tier" labels from comments and tighten the tracker field
  comment (CRF-3, CRF-10).
- Eviction test asserts a newer entry survives and uses range-over-int;
  add coverage for cross-page accumulation and the unknown outcome
  (CRF-2, CRF-8).

Copy link
Copy Markdown
Contributor Author

Pushed 137ccf9 addressing the round-1 review feedback. Summary:

Codex P2list_templates now records the affinity gap only when ranking signals loaded successfully; a failed load no longer adds zero-gap samples to the histogram.

CRF-1 (P3)RecommendationTracker.Record accumulates listed template IDs across pages of the same result instead of overwriting, so a follow-up build from a later page classifies as overrode_with_listed_template.

CRF-2 (P3) — eviction test now asserts a newer entry survives.

CRF-3 (P3) — dropped the Tier 1/2/3 labels from code comments.

CRF-4,5,6,7,8,10 (Nit) — explicit ask_user case + unknown fallback bucket; followup metric help text enumerates outcomes (docs regenerated); follow-up log includes owner_id; renamed recommendationFollowupListedNoRec…NoRecommendation; for range maxEntries; tightened the tracker field comment.

CRF-9, CRF-11 (Note) — acknowledged with rationale on the threads; no code change (intentional per-call counter, classification paths unit-tested).

Added tests: cross-page accumulation, eviction survivor, gap-skipped-on-signal-failure, and the outcome mapping incl. the unknown bucket. All inline threads addressed and resolved.

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 137ccf9457

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chattool/listtemplates.go

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2. All 3 P3 findings from R1 verified fixed in 137ccf9. All 6 Nits fixed. Both Notes acknowledged with rationale. Clean fixes, no regressions, no sibling issues introduced.

The base rebase brought the ranking and scoring logic into the diff (from the stacked PR). 11 reviewers covered the full scope: ranking formula, confidence thresholds, recommendation logic, telemetry, tracker, dbauthz authorization, docs. No P0-P3 issues found. 3 new Nits.

"I tried to build simpler alternatives... I could not build a simpler alternative that preserves the decision quality. The solution is proportional to the problem." (Pariston, stress-testing the design)

3 Nit.

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/dbauthz/dbauthz_test.go
Comment thread coderd/x/chatd/chattool/listtemplates.go
Comment thread coderd/x/chatd/chatloop/metrics.go Outdated
jaaydenh added 2 commits June 18, 2026 12:22
Resolve conflicts in listtemplates.go, listtemplates_internal_test.go,
and listtemplates_test.go by taking the branch version: main's squashed
"rank chat workspace templates" (#25037) is byte-identical to this
branch's pre-telemetry base for those files, so the branch already
contains all of main's changes there plus the telemetry additions.
- Reword recordListTemplatesTelemetry doc: the affinity score is "not
  included in the tool result shown to the model" rather than "stays
  internal", which misread as not logged (CRF-13).
- Drop the redundant "list_templates ranking telemetry" section heading
  in the Metrics struct; the field names already convey it (CRF-14).
- Document the intentional query-tie recommendation rationale: a query
  match is the baseline confidence signal there, so affinity only breaks
  the tie and need not clear the no-query affinity floor (Codex P2).

Copy link
Copy Markdown
Contributor Author

Pushed ab4fb75. Two changes since the last review:

Merged latest main (b7d60a0). The ranking feature #25037 is now squash-merged into main, so this branch's base moved to main. Conflicts in listtemplates.go, listtemplates_internal_test.go, and listtemplates_test.go were resolved by taking the branch version: main's squashed feature is byte-identical to this branch's pre-telemetry base for those files, so the branch already contained all of main's changes there plus the telemetry. Verified diff(main, HEAD) == diff(pre-telemetry, HEAD) for each. make gen shows no drift and the repo builds.

Round 2 feedback:

  • CRF-13 (Nit) — reworded the recordListTemplatesTelemetry doc: the affinity score is "not included in the tool result shown to the model" rather than "stays internal."
  • CRF-14 (Nit) — removed the redundant Metrics section heading.
  • Codex P2 (query-tie affinity floor) — declined the behavior change: this branch is identical to origin/main (shipped in feat: rank chat workspace templates #25037), and the distinction is intentional (a query tie has query relevance as the baseline signal; affinity only breaks the tie). Added a comment documenting it. A floor change would be a separate ranking-semantics PR.
  • CRF-12 (Nit, dbauthz_test.go) — declined: that test is pre-existing feat: rank chat workspace templates #25037 code, not part of this telemetry PR's delta (identical to origin/main). Better as a standalone cleanup of that test.

All four review threads addressed and resolved.

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab4fb75581

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// their listed IDs instead of overwriting. Page 1, a missing entry, or a
// changed recommendation starts fresh.
if page > 1 {
if entry, ok := t.entries[chatID]; ok && entry.recommendedID == recommendedID {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Do not revive expired list_templates history

When a chat requests page > 1 after the 30-minute tracker TTL has elapsed, this branch updates recordedAt and keeps the old page-1 listed set instead of treating the entry as expired. That makes a later create_workspace classify templates shown only in the stale result as freshly listed, corrupting the follow-up telemetry the TTL is meant to bound.

Useful? React with 👍 / 👎.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3. CRF-13 and CRF-14 verified fixed (ab4fb75). CRF-12 (contested): panel closed 5/5, the file has zero changed lines in this PR's diff and the code exists identically on main from #25037.

1 new P3.

"The contest is valid; the cleanup belongs in a standalone PR targeting the code that introduced it." (Mafu-san, on CRF-12)

🤖 This review was automatically generated with Coder Agents.

// page-1 template is still "listed" rather than "unlisted".
tr.Record(chat, rec, []uuid.UUID{rec, page1}, 1)
tr.Record(chat, rec, []uuid.UUID{page2}, 2)
require.Equal(t, recommendationFollowupOverrodeListed, tr.Classify(chat, page1))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-15] AccumulatesListedAcrossPages proves page-1 IDs survived the page-2 Record, not that page-2 IDs were accumulated. (Bisky)

The test records page 1 with {rec, page1}, then page 2 with {page2}, and classifies with page1. That proves page-1 IDs survive the page-2 Record (the union-not-overwrite property), but page1 was already in the set from the first Record call. The accumulation loop body in Record could be deleted entirely (keeping only the early-return), and this assertion still passes.

Fix: classify with page2 instead of page1. That proves the loop added page-2 IDs to the listed set:

require.Equal(t, recommendationFollowupOverrodeListed, tr.Classify(chat, page2))

🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant