Fix OLMo 3 scaled RoPE handling for sliding attention by nurpax · Pull Request #45945 · huggingface/transformers

nurpax · 2026-05-13T14:56:30Z

Summary

Restores OLMo 3 per-layer rotary embeddings after the RoPE layer-type refactor.
Uses default, unscaled RoPE for sliding_attention layers and configured RoPE for full_attention layers.
Keeps the current Gemma2RotaryEmbedding forward behavior, including returning cos.to(dtype=x.dtype), sin.to(dtype=x.dtype).
Adds a unit test that catches the regression with a mixed sliding/full attention config and YARN RoPE.

Context

Olmo3Model documents that RoPE scaling is not applied to sliding-window attention layers, but the current implementation computes one configured rotary embedding and passes it to every layer. OLMo 3 needs layer-type-specific RoPE: default RoPE for sliding attention and configured/scaled RoPE for full attention.

Test plan

PYTHONPATH=src python utils/check_modular_conversion.py --files src/transformers/models/olmo3/modular_olmo3.py --num_workers 1
ruff check src/transformers/models/olmo3/modular_olmo3.py src/transformers/models/olmo3/modeling_olmo3.py tests/models/olmo3/test_modeling_olmo3.py
git diff --check
PYTHONPATH=src pytest tests/models/olmo3/test_modeling_olmo3.py -k sliding_attention_uses_default_rope_with_scaled_config -q

AI assistance was used to prepare this PR. I reviewed the generated diff and am responsible for the change.

github-actions · 2026-05-13T18:55:07Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmo3, olmo_hybrid

nurpax force-pushed the fix-olmo3-per-layer-rotary-pr branch from cee279b to b9bae6f Compare May 13, 2026 15:05

nurpax mentioned this pull request May 13, 2026

Support transformers v5 allenai/olmes#51

Open

nurpax added 2 commits May 13, 2026 21:45

Fix OLMo3 rotary handling for sliding attention

a5121a9

Update generated OLMo Hybrid rotary signature

ddd085f

nurpax force-pushed the fix-olmo3-per-layer-rotary-pr branch from b9bae6f to ddd085f Compare May 13, 2026 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OLMo 3 scaled RoPE handling for sliding attention#45945

Fix OLMo 3 scaled RoPE handling for sliding attention#45945
nurpax wants to merge 2 commits into
huggingface:mainfrom
nurpax:fix-olmo3-per-layer-rotary-pr

nurpax commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nurpax commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nurpax commented May 13, 2026 •

edited

Loading