fix(provider): respect model limit.output instead of capping at 32k#34901
Open
tobwen wants to merge 1 commit into
Open
fix(provider): respect model limit.output instead of capping at 32k#34901tobwen wants to merge 1 commit into
tobwen wants to merge 1 commit into
Conversation
Contributor
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, no duplicate PRs found. The PR #34901 appears to be the only active PR addressing this specific issue of respecting The related PR #34815 (per-variant limit overrides) and #14393 (thinking block signatures and compaction headroom) address adjacent concerns but are distinct from this fix. |
d49f952 to
4b89d0e
Compare
maxOutputTokens used Math.min(model.limit.output, OUTPUT_TOKEN_MAX) which silently capped any configured output limit at the 32k default. Now limit.output is the primary value; the env var acts as an optional ceiling only when explicitly set. overflow.ts uses the same reserved buffer for both limit.input and fallback paths, preventing usable=0 on shared-window models.
4b89d0e to
a4e4b6c
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #29363
Closes #20078
Related: #2949 (closed), #1735 (closed), #32656, #22158, #16971
Prior attempts: #24384 (closed), #29513 (closed), #29679 (closed)
Conflicts with: #32844
Type of change
What does this PR do?
What is broken:
maxOutputTokens()usedMath.min(model.limit.output, OUTPUT_TOKEN_MAX)which silently capped every model at 32k, regardless of its configuredlimit.output. A user withlimit.output: 131071still gotmax_tokens: 32000in the API request.Expected:
limit.outputfrom config should be the primary value sent to the API. The env varOPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAXshould act as an optional ceiling, not a hard cap.Steps to reproduce: Configure a model with
limit.outputabove 32000. Start a session. Observemax_tokens: 32000in the API request (or agent stops withreason: "length"mid-task).Impact: Reasoning-heavy models (DeepSeek V4, Claude with extended thinking) can exhaust the 32k budget on reasoning tokens, leaving very little headroom for visible output. Three prior PRs (#24384, #29513, #29679) attempted this fix but were closed without merging.
The fix:
maxOutputTokensnow returnslimit.outputdirectly when no env var is set. When the env var IS set, it applies asMath.min(limit, envValue). This works becausepositiveInteger()inruntime-flags.tsreturnsundefinedfor unset env vars, so we can distinguish "user configured a ceiling" from "default fallback".overflow.tsalso needed a change. The fallback path (models withoutlimit.input) usedcontext - maxOutputTokens()for compaction headroom. With the fix, this would subtract 131k from a 200k context, collapsingusableto 69k. For shared-window models like Kimi K2.6 (262k/262k) it would be zero. The fix unifies both paths to usereserved(capped atCOMPACTION_BUFFER = 20000), which thelimit.inputpath already used.Updated comments in
compaction.test.tsto reflect the unifiedreservedbehavior (no logic changes).Conflict with PR #32844 (issue #32656): That PR proposes removing the
COMPACTION_BUFFERcap to reserve the fullmaxOutputTokens. This was reasonable whenmaxOutputTokenswas still capped at 32k. With this fix, it would reserve 131k for Claude and 262k for Kimi K2.6, makingusable = 0and triggering compaction on every turn. If this PR merges first, #32844 needs rework.Why I am unsure: Several pre-existing test failures in
compaction.test.tspersist on both original and patched code (verified locally via stash/unstash). I confirmedoverflow()returns correct values via direct calls, but I do not know why theSessionCompaction.Servicetests fail. TheCOMPACTION_BUFFER = 20000value may be too low for code-generation sessions (30k+ token responses). Bumping it to 32000 could be a follow-up.How did you verify your code works?
bun typecheckfrompackages/opencode- passesbun test test/provider/transform.test.ts- 295 pass, 0 fail (7 new tests formaxOutputTokens)bun test test/session/overflow.test.ts- 7 pass, 0 fail (new file, directusable()tests)bun test test/session/compaction.test.ts- pre-existing failures unchanged (verified locally, same failures on original code)isOverflow()call with test inputs returns expectedtruebun test test/plugin/cloudflare.test.ts- 4 pass, 0 failbun test test/session/llm-native.test.ts- 16 pass, 0 failbun test test/effect/runtime-flags.test.ts- 36 pass, 0 failScreenshots / recordings
Not a UI change.
Checklist