fix(providers): correct pricing, deprecations, and capabilities across model catalog#4990
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryMedium Risk Overview Catalog & capabilities: Pricing, Provider plumbing: Reviewed by Cursor Bugbot for commit ddadc30. Configure here. |
Greptile SummaryThis PR performs a two-pass validation of all ~160 models across 12 providers, correcting pricing, deprecation flags, context windows, temperature ranges, and capability fields against live provider documentation. It also ships a real runtime bug fix in
Confidence Score: 5/5Safe to merge — the only runtime change is the Bedrock geo-profile fix, which is well-tested and corrects real invalid model IDs that were being sent to AWS. The Bedrock bug fix is correct and covered by new unit tests. All data changes (pricing, deprecations, context windows) are documented in the PR with source verification. The rest of the changes are refactors and test updates that don't affect production paths. apps/sim/providers/models.ts — the static temperature-range list helpers don't inherit provider-level capabilities, leaving an inconsistency with the per-model query functions, though this has no current production effect. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Model ID lookup] --> B{getModelCapabilities}
B --> C["Merge: { ...provider.capabilities, ...model.capabilities }"]
C --> D[supportsTemperature / getMaxTemperature]
D --> E[Agent slider condition — CORRECT for all models incl. Groq]
F[Static list builders] --> G{getModelsWithTemperatureRange / getModelsWithTemperatureSupport}
G --> H[Only model.capabilities.temperature checked]
H --> I[MODELS_TEMP_RANGE_0_2 / MODELS_WITH_TEMPERATURE_SUPPORT — MISSES Groq provider-level caps]
J[Bedrock model ID] --> K{getBedrockInferenceProfileId}
K --> L{Already prefixed?}
L -- yes --> M[Return as-is]
L -- no --> N{In GEO_PROFILE_UNSUPPORTED_MODEL_IDS?}
N -- yes --> O[Return bare model ID]
N -- no --> P[Add region prefix e.g. us. eu. ap.]
Reviews (5): Last reviewed commit: "fix(providers): default azure-openai to ..." | Re-trigger Greptile |
…th per-provider justification docs
|
Re: the two items flagged in the Greptile summary (posted as outside-diff comments, so answering here): 1. DeepSeek cachedInput $0.0028 (50x discount) — verified, not a decimal error. Re-fetched https://api-docs.deepseek.com/quick_start/pricing directly: deepseek-v4-flash is $0.0028/1M on cache hit vs $0.14/1M on cache miss, $0.28/1M output (and v4-pro is $0.003625 hit vs $0.435 miss — an even steeper ~120x). DeepSeek moved away from the old 10x ratio with the V4 generation. The 2. Renamed IDs ( |
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit ddadc30. Configure here.
Summary
Two-pass validation of every static model entry in
models.ts(~160 models, 12 providers + embedding/rerank pricing) against live provider docs, with secondary pricing cross-checks. Round 2 re-verified every field with a second fleet of per-provider agents, each producing a justification log covering every field, its source URL, and what was changed vs deliberately left alone (kept as a local decision log, not committed).Round 1 — pricing/deprecation sweep
deprecated(xAI May-15 batch, Gemini 2.0 shutdown, Anthropic 4.0 retirements, Mistral/Cerebras/Groq/Bedrock lifecycle)minimal/xhigheffort values, deadnativeStructuredOutputson opus-4-1, renamed shut-down preview ids)Round 2 — full re-verification + remaining fixes
getBedrockInferenceProfileIdno longer prefixes the 11 models whose AWS cards say geo inference is not supported (Mistral text models, Cohere, Titan) — these previously produced invalid model IDs at runtime. Unit tests added.maxOutputTokensfrom model cards (opus-4-1 32000 per card)'none'effort dropped from the 5.4 family (Azure docs enumerate 'none' support exhaustively); azure/gpt-4o deprecated (retires 2026-10-01)verbosityremoved and effort values aligned to documented pro-tier set (medium/high/xhigh); o3/o3-mini/o1/gpt-4.1-nano deprecated per the OpenAI deprecations page (shutdowns 2026-10-23)nativeStructuredOutputs: true— documented GA; previously fell back to prompt-injected schemasgetModelsWithTemperatureRange), Together 0–1 (their docs), Groq provider-level 0–2, deepseek-chat/Cerebras temperature exposedmaxOutputTokensfilled in for Groq (verified via Groq's live models API), Cerebras, Vertex (was silently falling back to 4096);recommended/speedOptimizedflags normalized across providers; ollama-cloud no longer advertises unsupportedtool_choiceJustification logs (one per provider, maintained locally): per-model field tables with the verifying source URL and verdict, changes applied, changes deliberately not made (with reasons), and unverifiable items. Conflicting agent findings were re-verified directly against the primary source before deciding (e.g., gemini-3.1-flash-lite
minimalthinking support — confirmed supported and default).Known follow-ups (documented, intentionally not in this PR)
reasoning_effortfor xAI/Cerebras/Magistral andprompt_cache_keyfor Mistral, then add the corresponding capability flagsType of Change
Testing
All provider, block, and landing-catalog tests pass (151 + 83 in affected files; assertions updated only where they encoded the old incorrect values). New unit tests for the bedrock geo-profile logic. Typecheck and lint clean. Pre-existing
providers/test failures on staging are unrelated (verified by stash-run on clean tree).Checklist