Skip to content

Fix OLMo 3 scaled RoPE handling for sliding attention#45945

Draft
nurpax wants to merge 2 commits into
huggingface:mainfrom
nurpax:fix-olmo3-per-layer-rotary-pr
Draft

Fix OLMo 3 scaled RoPE handling for sliding attention#45945
nurpax wants to merge 2 commits into
huggingface:mainfrom
nurpax:fix-olmo3-per-layer-rotary-pr

Conversation

@nurpax
Copy link
Copy Markdown
Contributor

@nurpax nurpax commented May 13, 2026

Summary

  • Restores OLMo 3 per-layer rotary embeddings after the RoPE layer-type refactor.
  • Uses default, unscaled RoPE for sliding_attention layers and configured RoPE for full_attention layers.
  • Keeps the current Gemma2RotaryEmbedding forward behavior, including returning cos.to(dtype=x.dtype), sin.to(dtype=x.dtype).
  • Adds a unit test that catches the regression with a mixed sliding/full attention config and YARN RoPE.

Context

Olmo3Model documents that RoPE scaling is not applied to sliding-window attention layers, but the current implementation computes one configured rotary embedding and passes it to every layer. OLMo 3 needs layer-type-specific RoPE: default RoPE for sliding attention and configured/scaled RoPE for full attention.

Test plan

  • PYTHONPATH=src python utils/check_modular_conversion.py --files src/transformers/models/olmo3/modular_olmo3.py --num_workers 1
  • ruff check src/transformers/models/olmo3/modular_olmo3.py src/transformers/models/olmo3/modeling_olmo3.py tests/models/olmo3/test_modeling_olmo3.py
  • git diff --check
  • PYTHONPATH=src pytest tests/models/olmo3/test_modeling_olmo3.py -k sliding_attention_uses_default_rope_with_scaled_config -q

AI assistance was used to prepare this PR. I reviewed the generated diff and am responsible for the change.

@nurpax nurpax force-pushed the fix-olmo3-per-layer-rotary-pr branch from cee279b to b9bae6f Compare May 13, 2026 15:05
@nurpax nurpax force-pushed the fix-olmo3-per-layer-rotary-pr branch from b9bae6f to ddd085f Compare May 13, 2026 18:53
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmo3, olmo_hybrid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants