Skip to content

feat(coderd/database): add chat_message_search_text function#26968

Draft
johnstcn wants to merge 3 commits into
mainfrom
cian/codagt-721-add-chat_message_search_text-extraction-function
Draft

feat(coderd/database): add chat_message_search_text function#26968
johnstcn wants to merge 3 commits into
mainfrom
cian/codagt-721-add-chat_message_search_text-extraction-function

Conversation

@johnstcn

@johnstcn johnstcn commented Jul 2, 2026

Copy link
Copy Markdown
Member

Adds an IMMUTABLE PARALLEL SAFE SQL function chat_message_search_text(content jsonb) RETURNS text (migration 000537) that extracts the space-joined text of all type='text' parts from a chat message's JSONB content, in order. Non-array content (legacy content_version=0 scalar strings) returns NULL, as does an array with no text parts. Includes a Postgres-backed test exercising the function directly.

This is groundwork for chat history search (CODAGT-591): a later change (CODAGT-722) builds a partial expression GIN index over to_tsvector('simple', chat_message_search_text(content)). No index, trigger, or callers are added here.

Assumptions: single-space separator for tokenization; NULL (not empty string) for empty/scalar/NULL input; legacy content is intentionally excluded from search per the RFC. GetChatUserPromptsByChatID has a similar but not identical extraction (empty separator, extra filtering), so it is not rewritten here.

Closes https://linear.app/codercom/issue/CODAGT-721

Implementation plan

CODAGT-721: Add chat_message_search_text extraction function

Issue: https://linear.app/codercom/issue/CODAGT-721
Parent: CODAGT-591 (Implement Chat History Search)
Branch: cian/codagt-721-add-chat_message_search_text-extraction-function
Deliverable: one draft PR containing a single SQL migration pair plus tests.

Context

The chat search RFC (Notion: "Chat Messages Search", post-PRD follow-up) settled on
a partial expression GIN index over
to_tsvector('simple', chat_message_search_text(content)) on chat_messages,
built at runtime with CREATE INDEX CONCURRENTLY by a dbpurge-style task
(CODAGT-722). This issue lands only the IMMUTABLE extraction function the index
expression needs. No index, no trigger, no stored column, no callers.

Requirements (from issue + RFC)

  • New migration pair under coderd/database/migrations (next number after
    000535; use create_migration.sh).
  • chat_message_search_text(content jsonb) RETURNS text, IMMUTABLE,
    single statement, no CONCURRENTLY.
  • .down.sql drops the function.
  • Semantics (matches the RFC sketch and the existing
    GetChatUserPromptsByChatID extraction in coderd/database/queries/chats.sql:422):
    • If content is NULL or jsonb_typeof(content) <> 'array' (legacy
      content_version=0 scalar strings): return NULL. This intentionally excludes
      legacy message content per the RFC.
    • Otherwise: string_agg(part->>'text', ' ' ORDER BY ordinality) over
      jsonb_array_elements(content) where part->>'type' = 'text'.
    • Only type='text' parts. Reasoning, tool-call/result, file, source,
      context-file, and skill parts are excluded (role/visibility/deleted
      filtering belongs in the index predicate, CODAGT-722, not here).
    • Array with no text parts: string_agg over zero rows returns NULL. Fine.
  • Implementation notes:
    • LANGUAGE sql (not plpgsql): a single-expression SQL function is enough,
      inlinable, and trivially IMMUTABLE. IMMUTABLE + PARALLEL SAFE.
    • IMMUTABLE is a promise, not enforced; the function reads only its
      argument, so the promise holds.

Red: tests first

Migration-level DB tests. Where: a new test in
coderd/database/migrations/migrate_test.go style is overkill; instead add a
Postgres-backed test (e.g. coderd/database/chat_message_search_text_test.go
or extend an existing querier test file per .claude/docs/DATABASE.md and
TESTING.md conventions; confirm placement against how other SQL functions are
tested during implementation). Test executes
SELECT chat_message_search_text($1::jsonb) directly. Cases:

  1. Array with one text part → that text.
  2. Array with text parts interleaved with non-text parts (reasoning,
    tool-call) → text parts joined by a single space, in order.
  3. Scalar JSONB ("hello") → NULL.
  4. Empty array ([]) → NULL.
  5. NULL input → NULL.
  6. Array element missing type or missing text → ignored / NULL text
    skipped (string_agg skips NULLs).

Also: TestMigrateUpDownUp style idempotence is already covered by the
existing migration framework; no extra work unless make test says otherwise.

Green: implementation

  1. ./coderd/database/migrations/create_migration.sh chat_message_search_text

  2. Up migration:

    CREATE FUNCTION chat_message_search_text(content jsonb) RETURNS text
    LANGUAGE sql IMMUTABLE PARALLEL SAFE AS $$
        SELECT string_agg(part->>'text', ' ' ORDER BY ordinality)
        FROM jsonb_array_elements(content) WITH ORDINALITY AS t(part, ordinality)
        WHERE jsonb_typeof(content) = 'array'
          AND part->>'type' = 'text'
    $$;

    Note: jsonb_array_elements on a scalar raises an error, so the scalar
    guard cannot live only in the WHERE clause of a FROM over it. Use
    CASE WHEN jsonb_typeof(content) = 'array' THEN (subquery) END or plpgsql
    with an IF, as in the RFC sketch. Decide during implementation; the tests
    pin the behavior either way.

  3. Down migration: DROP FUNCTION chat_message_search_text(jsonb);

  4. make gen (dump.sql picks up the function; no sqlc changes expected).

  5. Run the new tests + make test RUN=<new test>.

Refactor

  • Compare the final function against GetChatUserPromptsByChatID; do NOT
    rewrite that query to use the new function in this PR (out of scope), but
    note the follow-up if the duplication is exact.
  • Re-check comment quality in the migration (explain the scalar guard and why
    only type='text' parts are indexed).

Verification

  • make gen produces no diff outside the migration + dump.sql.
  • make lint, make test RUN=... pass.
  • Migration up/down/up applies cleanly (covered by migration tests).

Assumptions (flag in PR)

  • Separator is a single space (RFC sketch used ' '; the existing
    user-prompts query uses ''). Space is correct for tokenization.
  • Empty array returns NULL, not empty string. Acceptance criteria says
    "null/empty"; NULL is what string_agg gives and is fine for
    to_tsvector (the index expression handles NULL).
  • Legacy (non-array) content returns NULL by design; RFC explicitly excludes
    legacy message content from search.

This PR was generated by Coder Agents on behalf of @johnstcn.

@linear-code

linear-code Bot commented Jul 2, 2026

Copy link
Copy Markdown

CODAGT-721

@johnstcn johnstcn force-pushed the cian/codagt-721-add-chat_message_search_text-extraction-function branch from 8ab2d7e to 3b783dc Compare July 2, 2026 16:01
@johnstcn

johnstcn commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-07-03 09:30 UTC by @johnstcn
Spend: $27.45 / $100.00

Review history
  • R1 (2026-07-02): 17 reviewers, 3 Nit, 2 Note, COMMENT. Review
  • R2 (2026-07-03): 9 reviewers, 4 Nit, 2 Note, COMMENT. Review

deep-review v0.9.0 | Round 2 | 1989db0..e198c2b

Last posted: Round 2, 6 findings (4 Nit, 2 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 Nit Author fixed (e198c2b) 000537_chat_message_search_text.up.sql:1 Migration comment includes "what" the code already shows; could trim to "why" only R1 Gon P2 (downgraded; Leorio, Kite, Mafu-san praised same comment) Yes
CRF-2 Nit Author fixed (e198c2b) PR description PR description references migration 000536 but actual file is 000537 R1 Leorio P3, Mafu-san Nit No
CRF-3 Nit Author fixed (e198c2b) 000537_chat_message_search_text.down.sql:1 Bare DROP FUNCTION; convention since 000338 is DROP FUNCTION IF EXISTS R1 Chopper Yes
CRF-4 Note Author fixed (e198c2b) chat_message_search_text_test.go:56 No test for array with only non-text parts (different path to same NULL result) R1 Bisky Yes
CRF-5 Note Author accepted R2 (intentional semantic split; consistency pinned by migration tests, cross-check in CODAGT-722/724) 000537_chat_message_search_text.up.sql:9 Four extraction implementations coexist with different separators (landscape note) R1 Robin Yes
CRF-6 Nit Open chat_message_search_text_test.go:23 Test struct field comment restates sql.NullString semantics R2 Gon Yes
CRF-7 Nit Open chat_message_search_text_test.go:24 Test struct field comment restates sql.NullString semantics R2 Gon Yes

Contested and acknowledged

CRF-5 (Note, 000537_chat_message_search_text.up.sql:9) - Four extraction implementations with different separators

  • Finding: Four text extraction implementations coexist with subtly different separators (space, empty string, space, none) across SQL function, SQL query, and two Go helpers.
  • Author accepted: The Go/SQL split is irreducible since Go sites operate on in-memory parts. The two SQL sites have intentionally different semantics: separator and per-message filtering are behavior contracts, not accidental drift. Consistency between SQL extraction and search behavior is pinned by migration tests, and a cross-check will land where index and query meet (CODAGT-722/724).

Round log

Round 1

Panel (17 reviewers). Netero: no findings. 3 Nit, 2 Note. 0 dropped. Reviewed against 1989db0..3b783dc.

Round 2

Panel (9 reviewers). Netero: no findings. All R1 findings addressed. 2 new Nit. 0 dropped. Reviewed against 1989db0..e198c2b.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped PR: a single SQL extraction function with comprehensive Postgres-backed tests, correctly decomposed from the larger chat search work (CODAGT-591 -> CODAGT-722 -> this). 12 of 17 reviewers found no issues. Netero's mechanical sweep was clean.

3 Nits, 2 Notes. No blocking issues.

The PR description references "migration 000536" but the actual file is 000537 (renumbered in the second commit). Worth updating so future readers can find the migration file.

"No mocks, no fakes, no costume jewelry. Every row proves real behavior." -Bisky, on the test suite

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/database/migrations/000537_chat_message_search_text.up.sql Outdated
Comment thread coderd/database/migrations/000537_chat_message_search_text.down.sql Outdated
Comment thread coderd/database/chat_message_search_text_test.go
Comment thread coderd/database/migrations/000537_chat_message_search_text.up.sql
Comment thread coderd/database/migrations/000537_chat_message_search_text.up.sql Outdated
@johnstcn

johnstcn commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All five R1 findings addressed. Comment trimmed, IF EXISTS added, OnlyNonTextParts test added, PR description updated, extraction landscape acknowledged. All fixes verified by 9 reviewers against e198c2b.

2 new Nits (comment style). No blocking issues.

"Everything holds." -Hisoka

🤖 This review was automatically generated with Coder Agents.


cases := []struct {
name string
content sql.NullString // JSONB input; invalid means SQL NULL.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-6] Comment // JSONB input; invalid means SQL NULL. restates what sql.NullString already encodes. No other test in coderd/database/ annotates sql.Null* fields this way. The $1::jsonb cast in the query two lines below provides the JSONB context. (Gon)

🤖

cases := []struct {
name string
content sql.NullString // JSONB input; invalid means SQL NULL.
want sql.NullString // expected text; invalid means SQL NULL.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-7] Same: want sql.NullString and the field name say everything. (Gon)

🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant