Skip to content

fix(provider): respect model limit.output instead of capping at 32k#34901

Open
tobwen wants to merge 1 commit into
anomalyco:devfrom
tobwen:output-token-limit
Open

fix(provider): respect model limit.output instead of capping at 32k#34901
tobwen wants to merge 1 commit into
anomalyco:devfrom
tobwen:output-token-limit

Conversation

@tobwen

@tobwen tobwen commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Issue for this PR

Closes #29363
Closes #20078
Related: #2949 (closed), #1735 (closed), #32656, #22158, #16971
Prior attempts: #24384 (closed), #29513 (closed), #29679 (closed)
Conflicts with: #32844

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

What is broken: maxOutputTokens() used Math.min(model.limit.output, OUTPUT_TOKEN_MAX) which silently capped every model at 32k, regardless of its configured limit.output. A user with limit.output: 131071 still got max_tokens: 32000 in the API request.

Expected: limit.output from config should be the primary value sent to the API. The env var OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX should act as an optional ceiling, not a hard cap.

Steps to reproduce: Configure a model with limit.output above 32000. Start a session. Observe max_tokens: 32000 in the API request (or agent stops with reason: "length" mid-task).

Impact: Reasoning-heavy models (DeepSeek V4, Claude with extended thinking) can exhaust the 32k budget on reasoning tokens, leaving very little headroom for visible output. Three prior PRs (#24384, #29513, #29679) attempted this fix but were closed without merging.

The fix:

maxOutputTokens now returns limit.output directly when no env var is set. When the env var IS set, it applies as Math.min(limit, envValue). This works because positiveInteger() in runtime-flags.ts returns undefined for unset env vars, so we can distinguish "user configured a ceiling" from "default fallback".

overflow.ts also needed a change. The fallback path (models without limit.input) used context - maxOutputTokens() for compaction headroom. With the fix, this would subtract 131k from a 200k context, collapsing usable to 69k. For shared-window models like Kimi K2.6 (262k/262k) it would be zero. The fix unifies both paths to use reserved (capped at COMPACTION_BUFFER = 20000), which the limit.input path already used.

Updated comments in compaction.test.ts to reflect the unified reserved behavior (no logic changes).

Conflict with PR #32844 (issue #32656): That PR proposes removing the COMPACTION_BUFFER cap to reserve the full maxOutputTokens. This was reasonable when maxOutputTokens was still capped at 32k. With this fix, it would reserve 131k for Claude and 262k for Kimi K2.6, making usable = 0 and triggering compaction on every turn. If this PR merges first, #32844 needs rework.

Why I am unsure: Several pre-existing test failures in compaction.test.ts persist on both original and patched code (verified locally via stash/unstash). I confirmed overflow() returns correct values via direct calls, but I do not know why the SessionCompaction.Service tests fail. The COMPACTION_BUFFER = 20000 value may be too low for code-generation sessions (30k+ token responses). Bumping it to 32000 could be a follow-up.

How did you verify your code works?

  • bun typecheck from packages/opencode - passes
  • bun test test/provider/transform.test.ts - 295 pass, 0 fail (7 new tests for maxOutputTokens)
  • bun test test/session/overflow.test.ts - 7 pass, 0 fail (new file, direct usable() tests)
  • bun test test/session/compaction.test.ts - pre-existing failures unchanged (verified locally, same failures on original code)
  • Direct isOverflow() call with test inputs returns expected true
  • bun test test/plugin/cloudflare.test.ts - 4 pass, 0 fail
  • bun test test/session/llm-native.test.ts - 16 pass, 0 fail
  • bun test test/effect/runtime-flags.test.ts - 36 pass, 0 fail

Screenshots / recordings

Not a UI change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, no duplicate PRs found.

The PR #34901 appears to be the only active PR addressing this specific issue of respecting model.limit.output instead of capping at 32k. The PR description itself mentions that three prior attempts (#24384, #29513, #29679) were closed without merging, and there's a known conflict with #32844 (which proposes a different approach to the compaction buffer).

The related PR #34815 (per-variant limit overrides) and #14393 (thinking block signatures and compaction headroom) address adjacent concerns but are distinct from this fix.

@tobwen tobwen force-pushed the output-token-limit branch 2 times, most recently from d49f952 to 4b89d0e Compare July 2, 2026 18:49
maxOutputTokens used Math.min(model.limit.output, OUTPUT_TOKEN_MAX) which
silently capped any configured output limit at the 32k default. Now
limit.output is the primary value; the env var acts as an optional ceiling
only when explicitly set. overflow.ts uses the same reserved buffer for both
limit.input and fallback paths, preventing usable=0 on shared-window models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

1 participant