fix(provider): respect model limit.output instead of capping at 32k by tobwen · Pull Request #34901 · anomalyco/opencode

tobwen · 2026-07-02T10:09:39Z

Issue for this PR

Closes #29363
Closes #20078
Related: #2949 (closed), #1735 (closed), #32656, #22158, #16971
Prior attempts: #24384 (closed), #29513 (closed), #29679 (closed)
Conflicts with: #32844

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

What is broken: maxOutputTokens() used Math.min(model.limit.output, OUTPUT_TOKEN_MAX) which silently capped every model at 32k, regardless of its configured limit.output. A user with limit.output: 131071 still got max_tokens: 32000 in the API request.

Expected: limit.output from config should be the primary value sent to the API. The env var OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX should act as an optional ceiling, not a hard cap.

Steps to reproduce: Configure a model with limit.output above 32000. Start a session. Observe max_tokens: 32000 in the API request (or agent stops with reason: "length" mid-task).

Impact: Reasoning-heavy models (DeepSeek V4, Claude with extended thinking) can exhaust the 32k budget on reasoning tokens, leaving very little headroom for visible output. Three prior PRs (#24384, #29513, #29679) attempted this fix but were closed without merging.

The fix:

maxOutputTokens now returns limit.output directly when no env var is set. When the env var IS set, it applies as Math.min(limit, envValue). This works because positiveInteger() in runtime-flags.ts returns undefined for unset env vars, so we can distinguish "user configured a ceiling" from "default fallback".

overflow.ts also needed a change. The fallback path (models without limit.input) used context - maxOutputTokens() for compaction headroom. With the fix, this would subtract 131k from a 200k context, collapsing usable to 69k. For shared-window models like Kimi K2.6 (262k/262k) it would be zero. The fix unifies both paths to use reserved (capped at COMPACTION_BUFFER = 20000), which the limit.input path already used.

Updated comments in compaction.test.ts to reflect the unified reserved behavior (no logic changes).

Conflict with PR #32844 (issue #32656): That PR proposes removing the COMPACTION_BUFFER cap to reserve the full maxOutputTokens. This was reasonable when maxOutputTokens was still capped at 32k. With this fix, it would reserve 131k for Claude and 262k for Kimi K2.6, making usable = 0 and triggering compaction on every turn. If this PR merges first, #32844 needs rework.

Why I am unsure: Several pre-existing test failures in compaction.test.ts persist on both original and patched code (verified locally via stash/unstash). I confirmed overflow() returns correct values via direct calls, but I do not know why the SessionCompaction.Service tests fail. The COMPACTION_BUFFER = 20000 value may be too low for code-generation sessions (30k+ token responses). Bumping it to 32000 could be a follow-up.

How did you verify your code works?

bun typecheck from packages/opencode - passes
bun test test/provider/transform.test.ts - 295 pass, 0 fail (7 new tests for maxOutputTokens)
bun test test/session/overflow.test.ts - 7 pass, 0 fail (new file, direct usable() tests)
bun test test/session/compaction.test.ts - pre-existing failures unchanged (verified locally, same failures on original code)
Direct isOverflow() call with test inputs returns expected true
bun test test/plugin/cloudflare.test.ts - 4 pass, 0 fail
bun test test/session/llm-native.test.ts - 16 pass, 0 fail
bun test test/effect/runtime-flags.test.ts - 36 pass, 0 fail

Screenshots / recordings

Not a UI change.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

github-actions · 2026-07-02T10:10:26Z

The following comment was made by an LLM, it may be inaccurate:

Based on my search, no duplicate PRs found.

The PR #34901 appears to be the only active PR addressing this specific issue of respecting model.limit.output instead of capping at 32k. The PR description itself mentions that three prior attempts (#24384, #29513, #29679) were closed without merging, and there's a known conflict with #32844 (which proposes a different approach to the compaction buffer).

The related PR #34815 (per-variant limit overrides) and #14393 (thinking block signatures and compaction headroom) address adjacent concerns but are distinct from this fix.

maxOutputTokens used Math.min(model.limit.output, OUTPUT_TOKEN_MAX) which silently capped any configured output limit at the 32k default. Now limit.output is the primary value; the env var acts as an optional ceiling only when explicitly set. overflow.ts uses the same reserved buffer for both limit.input and fallback paths, preventing usable=0 on shared-window models.

github-actions Bot added the contributor label Jul 2, 2026

tobwen force-pushed the output-token-limit branch 2 times, most recently from d49f952 to 4b89d0e Compare July 2, 2026 18:49

tobwen force-pushed the output-token-limit branch from 4b89d0e to a4e4b6c Compare July 3, 2026 10:49

github-actions Bot mentioned this pull request Jul 3, 2026

feat(provider): support per-model limit overrides in user config #35198

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(provider): respect model limit.output instead of capping at 32k#34901

fix(provider): respect model limit.output instead of capping at 32k#34901
tobwen wants to merge 1 commit into
anomalyco:devfrom
tobwen:output-token-limit

tobwen commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tobwen commented Jul 2, 2026

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant