feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127
feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127TheodoreSpeaks wants to merge 6 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Key resolution & config: Billing & telemetry: New shared helpers in Streaming: Reviewed by Cursor Bugbot for commit ed64de9. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR routes LLM provider calls through the shared hosted-key framework (BYOK-first → platform key pool via
Confidence Score: 4/5The non-streaming and simple-streaming paths are correctly wired end-to-end; tool-path streaming for most providers still lacks the finalizeTiming call that would settle the hosted-key cost metric and apply the cost multiplier on drain. The flag-gated key resolution, rate-limiter mode:'none', and centralized cost settlement in settleStreamingLlmCost are all sound. The one structural gap — tool-path streaming providers omitting finalizeTiming in their createStream callbacks — was flagged in a prior review cycle and remains unaddressed. apps/sim/providers/streaming-execution.ts and the tool-path createStream callbacks in openai/core.ts, baseten/index.ts, fireworks/index.ts, together/index.ts, mistral/index.ts, and ollama/core.ts Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant E as executeProviderRequest
participant B as getApiKeyWithBYOK
participant RL as HostedKeyRateLimiter
participant P as Provider.executeRequest
participant M as hostedKeyMetrics
E->>B: provider, model, workspaceId, apiKey, userId
B->>B: isHosted and workspaceId and flag on?
alt BYOK workspace key exists
B-->>E: apiKey isBYOK true
else platform key available
B->>RL: acquireKey mode none
RL-->>B: success key envVarName
B-->>E: apiKey isBYOK false hostedKeyEnvVar
else fallback
B-->>E: legacy key no hostedKeyEnvVar
end
E->>P: executeRequest sanitizedRequest
alt throws
E->>M: recordFailed
E-->>E: re-throw
else StreamingExecution
E->>M: recordUsed
Note over P,M: On stream drain
P->>M: recordCostCharged via settleStreamingLlmCost
else ProviderResponse
E->>M: emitHostedKeyUsage
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant E as executeProviderRequest
participant B as getApiKeyWithBYOK
participant RL as HostedKeyRateLimiter
participant P as Provider.executeRequest
participant M as hostedKeyMetrics
E->>B: provider, model, workspaceId, apiKey, userId
B->>B: isHosted and workspaceId and flag on?
alt BYOK workspace key exists
B-->>E: apiKey isBYOK true
else platform key available
B->>RL: acquireKey mode none
RL-->>B: success key envVarName
B-->>E: apiKey isBYOK false hostedKeyEnvVar
else fallback
B-->>E: legacy key no hostedKeyEnvVar
end
E->>P: executeRequest sanitizedRequest
alt throws
E->>M: recordFailed
E-->>E: re-throw
else StreamingExecution
E->>M: recordUsed
Note over P,M: On stream drain
P->>M: recordCostCharged via settleStreamingLlmCost
else ProviderResponse
E->>M: emitHostedKeyUsage
end
Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile |
| } else if (hostedKeyEnvVar) { | ||
| // Hosted key used: record usage now; cost is settled on stream drain | ||
| // inside createStreamingExecution (the single streaming cost seam). | ||
| hostedKeyMetrics.recordUsed({ |
There was a problem hiding this comment.
Inaccurate comment — Gemini does not use
createStreamingExecution
The comment says "cost is settled on stream drain inside createStreamingExecution (the single streaming cost seam)", but Gemini uses a bespoke createStreamingResult helper and calls settleStreamingLlmCost directly inside its own stream callback. createStreamingExecution is not involved in the Gemini path at all. The comment should note that Gemini settles cost in its own drain callback (or reference both seams).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
@greptile review |
…-agent # Conflicts: # apps/sim/lib/api-key/byok.ts # apps/sim/lib/core/config/feature-flags.ts
|
@greptile review |
|
Addressed review:
Also fixed CI: completed provider test mocks ( |
| tool: response.model, | ||
| key: hostedKeyEnvVar as string, | ||
| costTotal: response.cost.total, | ||
| }) |
There was a problem hiding this comment.
Hosted metrics omit tool costs
Medium Severity
For non-streaming hosted-key responses, emitHostedKeyUsage runs right after token calculateCost, but sumToolCosts is applied to response.cost.total only later. CloudWatch hosted-key cost therefore excludes tool charges even when the returned response total includes them.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5f9046e. Configure here.
|
Two more Bugbot findings reviewed:
|
|
Addressed the two streaming-metric symmetry findings (
Full suite green (8870 tests), tsc 0, lint 16/16. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ed64de9. Configure here.
| usage.promptTokenCount, | ||
| usage.candidatesTokenCount | ||
| request.hostedKey, | ||
| isCachedInput(request.context) |
There was a problem hiding this comment.
Gemini streams skip failure metrics
Medium Severity
For hosted platform keys, executeProviderRequest records hostedKeyMetrics.recordUsed as soon as a StreamingExecution is returned. Other providers wrap streams in createStreamingExecution, which calls recordFailed on read errors. Gemini settles cost via settleStreamingLlmCost on success only, so mid-stream failures never emit a failure metric.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ed64de9. Configure here.


Summary
hosted-key-llmflag (default off)mode: 'none'so LLMs acquire platform keys with no queue/rate limitingcreateStreamingExecution(sharedsettleStreamingLlmCost), applying the cost multiplier the per-provider streaming paths previously omitted; gemini uses the same helperhostingconfig (envKeyPrefix+byokProviderId); pricing stays per-model viacalculateCostclassifyHostedKeyFailureacross tools/providers (fixes auth-error misclassification)Type of Change
Testing
Tested manually;
bun run lintclean,tsc --noEmit0 errors,check:api-validation:strictpassed, unit tests passing (hosted-cost, byok, rate-limiter, streaming-execution, tools)Checklist