Skip to content

fix(billing): prevent deadlock with timeout#4949

Merged
icecrasher321 merged 1 commit into
stagingfrom
fix/deadlock-billing-tx
Jun 10, 2026
Merged

fix(billing): prevent deadlock with timeout#4949
icecrasher321 merged 1 commit into
stagingfrom
fix/deadlock-billing-tx

Conversation

@icecrasher321

Copy link
Copy Markdown
Collaborator

Summary

Fix pool-starvation deadlock in /api/billing/update-cost by resolving billing context before the per-event-key advisory-lock transaction, and bound the lock wait with a 3s lock_timeout (on a 64-bit hashtextextended key) so stuck flushes fail fast for the Go retry/dead-letter path instead of pinning pooled connections.

Type of Change

  • Bug fix

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 10, 2026 8:03pm

Request Review

@cursor

cursor Bot commented Jun 10, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches billing ledger writes and lock semantics on a hot internal path; behavior should be equivalent when locks clear quickly, but lock timeout can surface new retryable failures under contention.

Overview
Fixes pool starvation on cumulative usage flushes (recordCumulativeUsage, used by /api/billing/update-cost) by moving resolveBillingContext outside the advisory-lock transaction and passing the pre-resolved entity/period into the first-flush recordUsage insert, so subscription lookups no longer run on the global pool while a connection holds the per-event-key lock.

Inside the transaction, sets a 3s lock_timeout before acquiring the lock and switches the lock key to pg_advisory_xact_lock(hashtextextended(...)) so stuck waits fail fast (retryable) instead of pinning pooled connections. Vitest coverage asserts call order, stamped billing fields on insert, and the lock/timeout SQL.

Reviewed by Cursor Bugbot for commit 7084891. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7084891. Configure here.

Comment thread apps/sim/lib/billing/core/usage-log.ts
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a pool-starvation deadlock in /api/billing/update-cost by moving resolveBillingContext outside the transaction so subscription lookups no longer compete for pooled connections while the advisory lock is held, and bounds the lock wait with a 3-second lock_timeout using a 64-bit hashtextextended key.

  • usage-log.ts: billing context is now resolved before db.transaction(), the advisory lock switches from 32-bit hashtext to 64-bit hashtextextended, and set_config('lock_timeout', '3000ms', true) is added to fail stuck flushes fast (SQLSTATE 55P03) within the Go retry budget.
  • usage-log.test.ts: three new tests verify pre-transaction ordering of the subscription lookup, that the pre-resolved context is stamped on the first-flush insert, and that the lock timeout and 64-bit hash SQL are executed inside the transaction.

Confidence Score: 3/5

The deadlock fix is sound, but every top-up flush (the majority of calls after the first) now performs a subscription lookup that is discarded unused, adding extra DB load on a high-traffic billing endpoint.

Moving billing context resolution out of the transaction correctly eliminates the pool-starvation deadlock and the lock changes are technically correct. However, the unconditional pre-transaction resolveBillingContext call means every UPDATE-path flush (all subsequent flushes for a multi-leg request) incurs a getHighestPrioritySubscription query that wasn't there before and whose result is never used. On a high-frequency billing path this adds measurable extra load per request. The test helper also accesses a Drizzle-internal property, making the new lock verification tests fragile to Drizzle upgrades.

apps/sim/lib/billing/core/usage-log.ts — the unconditional billing context resolution before the transaction.

Important Files Changed

Filename Overview
apps/sim/lib/billing/core/usage-log.ts Moves billing context resolution before the transaction (fixing pool-starvation deadlock) and adds a 3s lock_timeout with a 64-bit hashtextextended advisory lock; introduces an unnecessary subscription lookup on every subsequent flush (UPDATE path) where the context is not consumed.
apps/sim/lib/billing/core/usage-log.test.ts Adds three new tests covering pre-transaction billing context resolution, context stamping on first-flush insert, and lock timeout/hash verification; the executedSqlContaining helper reads Drizzle's internal .strings property which is fragile.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant recordCumulativeUsage
    participant resolveBillingContext
    participant DB_Pool
    participant Transaction

    Caller->>recordCumulativeUsage: "{userId, eventKey, cost, ...}"
    note over recordCumulativeUsage: BEFORE (old): billingContext resolved inside tx
    note over recordCumulativeUsage: AFTER (new): billingContext resolved BEFORE tx
    recordCumulativeUsage->>resolveBillingContext: resolveBillingContext(userId)
    resolveBillingContext->>DB_Pool: getHighestPrioritySubscription(userId)
    DB_Pool-->>resolveBillingContext: subscription
    resolveBillingContext-->>recordCumulativeUsage: "{billingEntity, billingPeriod}"

    recordCumulativeUsage->>DB_Pool: db.transaction()
    DB_Pool->>Transaction: acquire connection
    Transaction->>Transaction: "SET lock_timeout = '3000ms' (tx-local)"
    Transaction->>Transaction: pg_advisory_xact_lock(hashtextextended(eventKey, 0))
    Transaction->>Transaction: SELECT existing row
    alt existing row found (UPDATE path)
        Transaction->>Transaction: UPDATE cost to newTotal
        note over Transaction: billingContext resolved above — unused here
    else first flush (INSERT path)
        Transaction->>Transaction: INSERT with pre-resolved billingEntity/billingPeriod
    end
    Transaction-->>DB_Pool: release connection + advisory lock
    DB_Pool-->>recordCumulativeUsage: result
    recordCumulativeUsage-->>Caller: "{billed, delta, total}"
Loading

Reviews (1): Last reviewed commit: "fix(billing): prevent deadlock with time..." | Re-trigger Greptile

Comment thread apps/sim/lib/billing/core/usage-log.ts
Comment thread apps/sim/lib/billing/core/usage-log.test.ts
@icecrasher321 icecrasher321 merged commit 20dd654 into staging Jun 10, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/deadlock-billing-tx branch June 10, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant