Skip to content

fix(providers): correct pricing, deprecations, and capabilities across model catalog#4990

Merged
waleedlatif1 merged 4 commits into
stagingfrom
fix/model-validation-sweep
Jun 12, 2026
Merged

fix(providers): correct pricing, deprecations, and capabilities across model catalog#4990
waleedlatif1 merged 4 commits into
stagingfrom
fix/model-validation-sweep

Conversation

@waleedlatif1

@waleedlatif1 waleedlatif1 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two-pass validation of every static model entry in models.ts (~160 models, 12 providers + embedding/rerank pricing) against live provider docs, with secondary pricing cross-checks. Round 2 re-verified every field with a second fleet of per-provider agents, each producing a justification log covering every field, its source URL, and what was changed vs deliberately left alone (kept as a local decision log, not committed).

Round 1 — pricing/deprecation sweep

  • grok-4.20 trio price cut ($2/$6 → $1.25/$2.50), DeepSeek V4 repricing ($0.14/$0.28, 1M ctx), Gemini Deep Research output 6x fix ($2 → $12), Bedrock Mistral Large 3 4x fix ($2/$6 → $0.50/$1.50)
  • ~30 retired/redirected models flagged deprecated (xAI May-15 batch, Gemini 2.0 shutdown, Anthropic 4.0 retirements, Mistral/Cerebras/Groq/Bedrock lifecycle)
  • Context-window and capability corrections (sonnet-4-5/4-0 1M → 200k, invalid minimal/xhigh effort values, dead nativeStructuredOutputs on opus-4-1, renamed shut-down preview ids)

Round 2 — full re-verification + remaining fixes

  • Bedrock geo-profile bug fixed in code: getBedrockInferenceProfileId no longer prefixes the 11 models whose AWS cards say geo inference is not supported (Mistral text models, Cohere, Titan) — these previously produced invalid model IDs at runtime. Unit tests added.
  • Bedrock pricing verified via the AWS Pricing API (marketing page is unfetchable): Nova 2 Pro/Lite, Llama 3.x/4 family, Mistral Large 2407, Ministral 3 corrections; cache-read rates added for Claude 4.x and Nova; maxOutputTokens from model cards (opus-4-1 32000 per card)
  • azure/gpt-5.1-codex un-deprecated — round-1 stopgap was based on a wrong premise; the azure provider defaults to the Responses API, so the model works. Azure 'none' effort dropped from the 5.4 family (Azure docs enumerate 'none' support exhaustively); azure/gpt-4o deprecated (retires 2026-10-01)
  • gpt-5.5-pro: undocumented verbosity removed and effort values aligned to documented pro-tier set (medium/high/xhigh); o3/o3-mini/o1/gpt-4.1-nano deprecated per the OpenAI deprecations page (shutdowns 2026-10-23)
  • claude-fable-5 nativeStructuredOutputs: true — documented GA; previously fell back to prompt-injected schemas
  • Temperature ranges corrected against API references: xAI 0–2 (was 0–1), Mistral 0–1.5 (chat-completions max per OpenAPI spec; new 1.5 slider variant in the agent block + parameterized getModelsWithTemperatureRange), Together 0–1 (their docs), Groq provider-level 0–2, deepseek-chat/Cerebras temperature exposed
  • xAI retired slugs repriced to their redirect targets' billed rates (retirement page states redirect billing; legacy values overestimated live cost up to 6x)
  • maxOutputTokens filled in for Groq (verified via Groq's live models API), Cerebras, Vertex (was silently falling back to 4096); recommended/speedOptimized flags normalized across providers; ollama-cloud no longer advertises unsupported tool_choice

Justification logs (one per provider, maintained locally): per-model field tables with the verifying source URL and verdict, changes applied, changes deliberately not made (with reasons), and unverifiable items. Conflicting agent findings were re-verified directly against the primary source before deciding (e.g., gemini-3.1-flash-lite minimal thinking support — confirmed supported and default).

Known follow-ups (documented, intentionally not in this PR)

  • Wire reasoning_effort for xAI/Cerebras/Magistral and prompt_cache_key for Mistral, then add the corresponding capability flags
  • Add new SKUs: deepseek-v4-flash/pro (aliases retire 2026-07-24), grok-build-0.1, mistral-medium-3-5, newer Azure/Foundry models, Gemini deep-research 04-2026 SKUs
  • Vertex 2.5 family retires 2026-10-16 — revisit deprecation + vertex defaultModel ~Sept 2026
  • Cohere-on-Bedrock prices unverifiable via the Pricing API (match list prices; both models Legacy/deprecated)

Type of Change

  • Bug fix

Testing

All provider, block, and landing-catalog tests pass (151 + 83 in affected files; assertions updated only where they encoded the old incorrect values). New unit tests for the bedrock geo-profile logic. Typecheck and lint clean. Pre-existing providers/ test failures on staging are unrelated (verified by stash-run on clean tree).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 12, 2026 3:11am

Request Review

@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Large metadata changes affect model selection, sliders, and cost display; the Bedrock ID fix is correctness-critical for those models but low blast radius elsewhere.

Overview
This PR re-validates the static model catalog in models.ts and fixes a Bedrock runtime bug where geo inference profile prefixes were applied to models that only accept bare in-region IDs (Mistral, Cohere, Titan).

Catalog & capabilities: Pricing, updatedAt, context windows, maxOutputTokens, deprecation flags, and defaults are corrected across OpenAI, Anthropic, Azure, Gemini/Vertex, xAI, DeepSeek, Groq, Cerebras, Mistral, and Bedrock. Notable capability tweaks include Mistral temperature 0–1.5 (new agent-block slider), xAI/Groq 0–2, Together 0–1, claude-fable-5 native structured outputs, gpt-5.5-pro effort/verbosity aligned to docs, Azure 5.4 reasoning effort trimmed, and ollama-cloud no longer advertising unsupported tool choice.

Provider plumbing: getModelsWithTempRange01/02 is replaced by getModelsWithTemperatureRange(max) and MODELS_TEMP_RANGE_0_15; tests and landing-catalog assertions are updated to match.

Reviewed by Cursor Bugbot for commit ddadc30. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR performs a two-pass validation of all ~160 models across 12 providers, correcting pricing, deprecation flags, context windows, temperature ranges, and capability fields against live provider documentation. It also ships a real runtime bug fix in getBedrockInferenceProfileId, which previously prefixed 11 Bedrock models (Mistral text, Cohere, Titan) with geo-inference profile IDs that those models don't support.

  • Bedrock geo-profile fix: GEO_PROFILE_UNSUPPORTED_MODEL_IDS set added; affected models now return bare in-region IDs. New unit tests cover the three branches (prefix required, already prefixed, unsupported).
  • Temperature range parameterization: getModelsWithTempRange01/02 replaced by getModelsWithTemperatureRange(max); a third slider entry (0–1.5) added in agent.ts for Mistral models; MODELS_TEMP_RANGE_0_15 exported from utils.ts.
  • Data corrections: ~30 models flagged deprecated; pricing updated for DeepSeek V4, grok-4.20 trio, Gemini Deep Research, Bedrock Mistral Large 3, Nova, and many others; context windows, maxOutputTokens, and reasoning-effort value sets corrected across Azure, Anthropic, xAI, Groq, and Vertex.

Confidence Score: 5/5

Safe to merge — the only runtime change is the Bedrock geo-profile fix, which is well-tested and corrects real invalid model IDs that were being sent to AWS.

The Bedrock bug fix is correct and covered by new unit tests. All data changes (pricing, deprecations, context windows) are documented in the PR with source verification. The rest of the changes are refactors and test updates that don't affect production paths.

apps/sim/providers/models.ts — the static temperature-range list helpers don't inherit provider-level capabilities, leaving an inconsistency with the per-model query functions, though this has no current production effect.

Important Files Changed

Filename Overview
apps/sim/providers/models.ts Large data-correction sweep: pricing, deprecation flags, context window, temperature ranges, and capabilities updated for ~160 models across 12 providers; two functions replaced by a single parameterized getModelsWithTemperatureRange(max).
apps/sim/providers/bedrock/utils.ts Bug fix: adds GEO_PROFILE_UNSUPPORTED_MODEL_IDS set to prevent getBedrockInferenceProfileId from prepending geo-inference prefixes to the 11 models that only accept bare model IDs.
apps/sim/blocks/blocks/agent.ts Adds a third temperature slider entry (min 0, max 1.5) for Mistral models, matching the pre-existing pattern of two sliders with mutually exclusive model-filtered conditions.
apps/sim/providers/utils.ts Switches from two named range exports to three parameterized ones adding MODELS_TEMP_RANGE_0_15; imports updated to match the renamed function.
apps/sim/providers/bedrock/utils.test.ts New test file: covers geo-inference profile prefixing, already-prefixed IDs, and the 11 models that must return bare model IDs.
apps/sim/providers/utils.test.ts Tests updated to reflect all temperature-range, deprecation, effort-value, and max-token changes; new MODELS_TEMP_RANGE_0_15 range assertions added.
apps/sim/app/(landing)/models/utils.test.ts Single test line updated: replaces deprecated grok-4-latest with mistral-medium-latest in the best-for copy differentiation test.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Model ID lookup] --> B{getModelCapabilities}
    B --> C["Merge: { ...provider.capabilities, ...model.capabilities }"]
    C --> D[supportsTemperature / getMaxTemperature]
    D --> E[Agent slider condition — CORRECT for all models incl. Groq]

    F[Static list builders] --> G{getModelsWithTemperatureRange / getModelsWithTemperatureSupport}
    G --> H[Only model.capabilities.temperature checked]
    H --> I[MODELS_TEMP_RANGE_0_2 / MODELS_WITH_TEMPERATURE_SUPPORT — MISSES Groq provider-level caps]

    J[Bedrock model ID] --> K{getBedrockInferenceProfileId}
    K --> L{Already prefixed?}
    L -- yes --> M[Return as-is]
    L -- no --> N{In GEO_PROFILE_UNSUPPORTED_MODEL_IDS?}
    N -- yes --> O[Return bare model ID]
    N -- no --> P[Add region prefix e.g. us. eu. ap.]
Loading

Reviews (5): Last reviewed commit: "fix(providers): default azure-openai to ..." | Re-trigger Greptile

Comment thread apps/sim/providers/models.ts
Comment thread apps/sim/providers/models.ts
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

Re: the two items flagged in the Greptile summary (posted as outside-diff comments, so answering here):

1. DeepSeek cachedInput $0.0028 (50x discount) — verified, not a decimal error. Re-fetched https://api-docs.deepseek.com/quick_start/pricing directly: deepseek-v4-flash is $0.0028/1M on cache hit vs $0.14/1M on cache miss, $0.28/1M output (and v4-pro is $0.003625 hit vs $0.435 miss — an even steeper ~120x). DeepSeek moved away from the old 10x ratio with the V4 generation. The deepseek-chat/deepseek-reasoner aliases bill at v4-flash rates until their 2026-07-24 retirement.

2. Renamed IDs (gemini-3.1-flash-lite-previewgemini-3.1-flash-lite, azure/gpt-5-chat-latestazure/gpt-5-chat) — runtime routing does not fail for persisted workflows. getProviderFromModel falls back to provider modelPatterns (/^gemini/, /^vertex\//, /^azure\//) when an id isn't in the catalog, and the Azure deployment name is derived from the stored model string (request.model.replace('azure/', '')), so a workflow that stored an old id routes and executes exactly as before — it only loses catalog pricing/capability metadata. Aliases weren't kept because each old id is dead or dying upstream: the google preview was shut down server-side 2026-05-25 (calls fail regardless of our catalog), the vertex preview alias is discontinued by Google on 2026-07-09, and gpt-5-chat-latest never matched an Azure model name. Keeping deprecated alias entries would add permanent catalog noise for at most a few weeks of metadata continuity — judged not worth it, but happy to add them if you disagree.

Comment thread apps/sim/providers/models.ts
Comment thread apps/sim/providers/models.ts
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit ddadc30. Configure here.

@waleedlatif1 waleedlatif1 merged commit e1af2bf into staging Jun 12, 2026
15 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/model-validation-sweep branch June 12, 2026 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant