improvement(metrics): emit hosted-key metrics to CloudWatch instead of OTel#4914
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview The new implementation buffers metric points in-process and flushes them asynchronously (5s interval, 1000-point threshold, and awaited drains on SIGTERM/SIGINT/beforeExit). It is a no-op when Reviewed by Cursor Bugbot for commit bbe5f42. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR replaces the OTel/Prometheus backend for hosted-key metrics with a CloudWatch push-based buffer (
Confidence Score: 5/5Safe to merge — the change is well-scoped, call sites are untouched, and the buffering logic correctly prevents data loss under normal operation. The architectural switch from OTel to CloudWatch is sound and the implementation handles the core concerns (batching, exit flushing, memory bounding, no-op without creds). The two flagged items are observational concerns: one around histogram precision being lost for QueueWaitDuration, and one around the ENABLED guard not covering IAM role credentials. Neither causes incorrect behavior today given the documented deployment model with static AWS creds. apps/sim/lib/monitoring/metrics.ts — specifically the ENABLED guard and QueueWaitDuration metric type. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as Caller
participant Enqueue as enqueue
participant Buffer as Buffer
participant Flush as flushHostedKeyMetrics
participant CW as CloudWatch
Caller->>Enqueue: recordUsed / recordFailed / etc.
Enqueue->>Buffer: push MetricDatum
Enqueue->>Enqueue: ensureBackground registers 5s timer and exit handlers
alt buffer reaches FLUSH_THRESHOLD 1000
Enqueue-->>Flush: fire-and-forget flush
end
Flush->>Buffer: swap pending buffer to empty
Flush->>CW: PutMetricDataCommand batched
alt success
CW-->>Flush: OK
else error
Flush->>Flush: log warn and drop batch
end
Note over Enqueue,Flush: SIGTERM or SIGINT or beforeExit triggers await flush
Reviews (2): Last reviewed commit: "fix(metrics): await metric flush on shut..." | Re-trigger Greptile |
|
Addressed the Bugbot findings in
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bbe5f42. Configure here.
|
@greptile review |

Summary
Sim/HostedKeynamespace) — samehostedKeyMetricsfacade, so no call-site changes intools/index.ts/lib/core/telemetry.ts.PutMetricData(5s flush + flush on SIGTERM/SIGINT/beforeExit), low-cardinality dimensions (Environment, Provider, Tool, Key, Reason); no-op without AWS creds so it stays safe locally. Per-workspace/user cost stays inusage_log.Type of Change
Testing
Tested manually. Biome clean on the changed file;
check:api-validation:strictpassed (boundary audit passed).Checklist