Bytecode parity - direct loop backedges by youknowone · Pull Request #7578 · RustPython/RustPython

youknowone · 2026-04-10T04:25:04Z

Summary by CodeRabbit

Bug Fixes
- Fixed loop return stack cleanup behavior to correctly align with CPython's iterator-slot management.
Chores
- Enhanced compiler bytecode generation pipeline with improved jump threading and block reordering optimizations.
- Updated bytecode inspection tools to use raw bytecode representation for accurate analysis.

coderabbitai · 2026-04-10T04:25:11Z

📝 Walkthrough

Walkthrough

The pull request modifies the RustPython compiler's intermediate representation and bytecode generation. Changes include: refactoring loop return stack cleanup in compile.rs, extending the CFG transformation pipeline in ir.rs with new jump threading and block reordering utilities, and updating dis_dump.py to disable instruction normalization for raw bytecode parity.

Changes

Cohort / File(s)	Summary
Compiler loop and control flow `crates/codegen/src/compile.rs`	Updated `Compiler::unwind_fblock` to emit `Swap { i: 2 }` followed by `PopTop` for `ForLoop` returns when `preserve_tos` is set, instead of `PopIter`, to match CPython's iterator-slot cleanup. Added two bytecode-structure tests validating jump threading around `STORE_SUBSCR` and loop return control-flow ordering.
CFG transformation and jump threading pipeline `crates/codegen/src/ir.rs`	Added `set_to_nop` helper for consistent NOP conversion. Extended jump threading into `jump_threading_unconditional` variant and refactored original into `jump_threading_impl`. Introduced new CFG pipeline stage with `reorder_conditional_exit_and_jump_blocks` and `reorder_conditional_jump_and_exit_blocks` predicates. Added `duplicate_jump_targets_without_lineno` to improve line-number resolution around jump-only blocks.
Bytecode instruction normalization `scripts/dis_dump.py`	Disabled instruction normalization and superinstruction decomposition by clearing `SKIP_OPS` and `_OPNAME_NORMALIZE`, and removing `_SUPER_DECOMPOSE` logic. Updated jump-target mapping to operate on raw (un-normalized, un-decomposed) bytecode indices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Bytecode parity - exception #7557: Modifies scripts/dis_dump.py instruction normalization and jump-target index mapping to use raw bytecode stream indices instead of decomposed/normalized indices.

Poem

🐰 The loop returns now reorder clean,
Jump threads race through bytecode stream,
Stack swaps dance where cleanups gleam,
Blocks reorder, CFG supreme! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Bytecode parity - direct loop backedges' directly relates to the main changes in the pull request, which focus on fixing loop backedge bytecode generation and adding jump threading for unconditional jumps to achieve bytecode parity with CPython.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 11158-11163: Rename the misspelled identifier kwonlydefaults in
the test fixture to a cspell-friendly name (e.g., kw_only_defaults) by updating
the parameter name in the function f signature and every use inside the function
body (references to kwonlydefaults and checks like "if kwonlydefaults and kwarg
in kwonlydefaults" should use the new name); ensure all other test fixtures or
assertions that reference this parameter are updated consistently (function f,
parameters kwonlyargs and arg2value, and any access/assignment to the defaults
mapping).

In `@crates/codegen/src/ir.rs`:
- Around line 2737-2763: The code inspects the target block directly
(blocks[target.idx()]) which can be an empty placeholder and miss trampoline
chains; instead, before checking the target's first non-NOP instruction, resolve
the real target by threading through empty blocks using next_nonempty_block
(i.e. call next_nonempty_block(blocks, target) and update target to that
returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the subsequent
logic that computes target_jump/target_ins operates on the first non-empty
block; retain existing checks for include_conditional, is_conditional_jump, and
other conditions.
- Around line 3551-3562: When cloning the jump-only block you adjust
predecessors for the trampoline itself but forget to increment the predecessor
count for the trampoline's downstream target (old_next), causing stale fan-in;
after you set new_block.next = old_next, push the new predecessor for the cloned
block (predecessors.push(1)), also increment predecessors[old_next.idx()] by one
(with a bounds/existence check if necessary) so the successor's incoming edge
count reflects the new clone — reference symbols: new_block, old_next, new_idx,
predecessors, target, blocks, propagate_locations_in_block.

In `@scripts/dis_dump.py`:
- Around line 182-193: The current loop that builds filtered and offset_to_idx
(iterating over raw and using normalized_idx with _SUPER_DECOMPOSE and
_OPNAME_NORMALIZE) produces a logical-instruction index map, not a
raw-bytecode-slot map; update the loop that builds filtered/offset_to_idx to
account for cache/prefix slots by inspecting each instruction's cache_info (or
otherwise using a low-level bytecode API) so offset_to_idx maps actual raw
bytecode slots used by compare_bytecode.py, e.g., when inst.cache_info indicates
extra cache entries increment normalized_idx and reserve corresponding slot
indices before mapping inst.offset; modify the code near the
filtered/offset_to_idx construction (the loop over raw, SKIP_OPS handling, and
uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those cache/prefix
slots or switch to a bytecode-level API.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 273c6c67-97cb-475e-9240-bd988f06cbe8

📥 Commits

Reviewing files that changed from the base of the PR and between a49ce5b and e372959.

📒 Files selected for processing (3)

crates/codegen/src/compile.rs
crates/codegen/src/ir.rs
scripts/dis_dump.py

coderabbitai · 2026-04-10T09:46:36Z

+def f(kwonlyargs, kwonlydefaults, arg2value):
+    missing = 0
+    for kwarg in kwonlyargs:
+        if kwarg not in arg2value:
+            if kwonlydefaults and kwarg in kwonlydefaults:
+                arg2value[kwarg] = kwonlydefaults[kwarg]


⚠️ Potential issue | 🟠 Major

Fix cspell blocker in embedded test source

CI is failing because kwonlydefaults is flagged as an unknown word in Line 11158-Line 11163. This currently blocks merge.

💡 Proposed fix (rename identifier in test fixture)

def f(kwonlyargs, kwonlydefaults, arg2value): +def f(kwonlyargs, kw_only_defaults, arg2value): missing = 0 for kwarg in kwonlyargs: if kwarg not in arg2value: - if kwonlydefaults and kwarg in kwonlydefaults: - arg2value[kwarg] = kwonlydefaults[kwarg] + if kw_only_defaults and kwarg in kw_only_defaults: + arg2value[kwarg] = kw_only_defaults[kwarg] else: missing += 1

🧰 Tools

🪛 GitHub Actions: CI

[error] 11158-11163: cspell hook failed (exit code 1): Unknown word (kwonlydefaults)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 11158 - 11163, Rename the misspelled identifier kwonlydefaults in the test fixture to a cspell-friendly name (e.g., kw_only_defaults) by updating the parameter name in the function f signature and every use inside the function body (references to kwonlydefaults and checks like "if kwonlydefaults and kwarg in kwonlydefaults" should use the new name); ensure all other test fixtures or assertions that reference this parameter are updated consistently (function f, parameters kwonlyargs and arg2value, and any access/assignment to the defaults mapping).

coderabbitai · 2026-04-10T09:46:36Z

            let target = ins.target;
            if target == BlockIdx::NULL {
                continue;
            }
-            if !ins.instr.is_unconditional_jump() && !is_conditional_jump(&ins.instr) {
+            if !(ins.instr.is_unconditional_jump()
+                || include_conditional && is_conditional_jump(&ins.instr))
+            {
                continue;
            }
+            if include_conditional && is_conditional_jump(&ins.instr) {
+                let next = next_nonempty_block(blocks, blocks[bi].next);
+                if next != BlockIdx::NULL
+                    && blocks[next.idx()]
+                        .instructions
+                        .last()
+                        .is_some_and(|instr| instr.instr.is_scope_exit())
+                {
+                    continue;
+                }
+            }
            // Check if target block's first instruction is an unconditional jump
-            let target_block = &blocks[target.idx()];
-            if let Some(target_ins) = target_block.instructions.first()
+            let target_jump = blocks[target.idx()]
+                .instructions
+                .iter()
+                .find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop)))
+                .copied();
+            if let Some(target_ins) = target_jump


⚠️ Potential issue | 🟠 Major

Thread through empty target blocks before inspecting the trampoline.

At this stage a jump can still target an empty placeholder block; several nearby CFG helpers already call next_nonempty_block() for that reason. Looking only at blocks[target] misses JUMP_* -> empty -> JUMP_* chains, so the new late pass can leave indirect backedges in place.

Suggested fix

- let target = ins.target; + let target = next_nonempty_block(blocks, ins.target); if target == BlockIdx::NULL { continue; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let target = ins.target;

if target == BlockIdx::NULL {

continue;

}

if !ins.instr.is_unconditional_jump() && !is_conditional_jump(&ins.instr) {

if !(ins.instr.is_unconditional_jump()

|| include_conditional && is_conditional_jump(&ins.instr))

{

continue;

}

if include_conditional && is_conditional_jump(&ins.instr) {

let next = next_nonempty_block(blocks, blocks[bi].next);

if next != BlockIdx::NULL

&& blocks[next.idx()]

.instructions

.last()

.is_some_and(|instr| instr.instr.is_scope_exit())

{

continue;

}

}

// Check if target block's first instruction is an unconditional jump

let target_block = &blocks[target.idx()];

if let Some(target_ins) = target_block.instructions.first()

let target_jump = blocks[target.idx()]

.instructions

.iter()

.find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop)))

.copied();

if let Some(target_ins) = target_jump

let target = next_nonempty_block(blocks, ins.target);

if target == BlockIdx::NULL {

continue;

}

if !(ins.instr.is_unconditional_jump()

|| include_conditional && is_conditional_jump(&ins.instr))

{

continue;

}

if include_conditional && is_conditional_jump(&ins.instr) {

let next = next_nonempty_block(blocks, blocks[bi].next);

if next != BlockIdx::NULL

&& blocks[next.idx()]

.instructions

.last()

.is_some_and(|instr| instr.instr.is_scope_exit())

{

continue;

}

}

// Check if target block's first instruction is an unconditional jump

let target_jump = blocks[target.idx()]

.instructions

.iter()

.find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop)))

.copied();

if let Some(target_ins) = target_jump

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` around lines 2737 - 2763, The code inspects the target block directly (blocks[target.idx()]) which can be an empty placeholder and miss trampoline chains; instead, before checking the target's first non-NOP instruction, resolve the real target by threading through empty blocks using next_nonempty_block (i.e. call next_nonempty_block(blocks, target) and update target to that returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the subsequent logic that computes target_jump/target_ins operates on the first non-empty block; retain existing checks for include_conditional, is_conditional_jump, and other conditions.

coderabbitai · 2026-04-10T09:46:36Z

+        let new_idx = BlockIdx(blocks.len() as u32);
+        let mut new_block = blocks[target.idx()].clone();
+        propagate_locations_in_block(&mut new_block, last.location, last.end_location);
+        let old_next = blocks[current.idx()].next;
+        new_block.next = old_next;
+        blocks.push(new_block);
+        blocks[current.idx()].next = new_idx;
+
+        let last_mut = blocks[current.idx()].instructions.last_mut().unwrap();
+        last_mut.target = new_idx;
+        predecessors[target.idx()] -= 1;
+        predecessors.push(1);


⚠️ Potential issue | 🟠 Major

Update predecessor counts for the cloned trampoline's downstream target.

Cloning a jump-only block adds a new incoming edge to that block's jump destination, but predecessors is only adjusted for the trampoline itself. propagate_line_numbers() then sees stale fan-in and can incorrectly propagate locations into a now-shared successor.

Suggested fix

let new_idx = BlockIdx(blocks.len() as u32); let mut new_block = blocks[target.idx()].clone(); + let downstream = new_block + .instructions + .last() + .filter(|ins| ins.instr.is_unconditional_jump() && ins.target != BlockIdx::NULL) + .map(|ins| next_nonempty_block(blocks, ins.target)); propagate_locations_in_block(&mut new_block, last.location, last.end_location); let old_next = blocks[current.idx()].next; new_block.next = old_next; blocks.push(new_block); blocks[current.idx()].next = new_idx; let last_mut = blocks[current.idx()].instructions.last_mut().unwrap(); last_mut.target = new_idx; predecessors[target.idx()] -= 1; predecessors.push(1); + if let Some(downstream) = downstream.filter(|idx| *idx != BlockIdx::NULL) { + predecessors[downstream.idx()] += 1; + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` around lines 3551 - 3562, When cloning the jump-only block you adjust predecessors for the trampoline itself but forget to increment the predecessor count for the trampoline's downstream target (old_next), causing stale fan-in; after you set new_block.next = old_next, push the new predecessor for the cloned block (predecessors.push(1)), also increment predecessors[old_next.idx()] by one (with a bounds/existence check if necessary) so the successor's incoming edge count reflects the new clone — reference symbols: new_block, old_next, new_idx, predecessors, target, blocks, propagate_locations_in_block.

coderabbitai · 2026-04-10T09:46:36Z

+    # Build filtered list and offset-to-index mapping for the normalized stream.
+    # This must use post-decomposition indices; otherwise a superinstruction that
+    # expands into multiple logical ops shifts later jump targets by 1.
    filtered = []
    offset_to_idx = {}
+    normalized_idx = 0
    for inst in raw:
        if inst.opname in SKIP_OPS:
            continue
-        offset_to_idx[inst.offset] = len(filtered)
+        opname = _OPNAME_NORMALIZE.get(inst.opname, inst.opname)
+        offset_to_idx[inst.offset] = normalized_idx
+        normalized_idx += len(_SUPER_DECOMPOSE.get(opname, (opname,)))


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import dis import types src = """ def f(xs): for x in xs: if x: pass """ mod = compile(src, "<mem>", "exec") fn = next(c for c in mod.co_consts if isinstance(c, types.CodeType)) insts = list(dis.get_instructions(fn)) print("separate CACHE items:", [i.opname for i in insts if i.opname == "CACHE"]) print("metadata fields:", [name for name in ("start_offset", "cache_offset", "end_offset", "cache_info") if hasattr(insts[0], name)]) print("any cache_info present:", any(getattr(i, "cache_info", None) for i in insts)) PY

Repository: RustPython/RustPython

Length of output: 1796

🌐 Web query:

Python 3.13 dis.get_instructions CACHE entries behavior

💡 Result:

In Python 3.13, dis.get_instructions() never yields separate CACHE instructions anymore, even if you pass show_caches=True—that argument is deprecated and has no effect. Instead, any inline-cache data that would previously have appeared as CACHE entries is exposed on the preceding logical instruction via the Instruction.cache_info field. [1], [2]

If you want to display cache entries in a disassembly listing, use dis.dis(..., show_caches=True) (that’s separate from get_instructions). [1]

Sources:
[1] Python 3.13 dis module docs (get_instructions change note; Instruction.cache_info; show_caches) (docs.python.org)
[2] “What’s New in Python 3.13” (dis.get_instructions cache handling; show_caches deprecated) (contextqmd.com)

Citations:

1: https://docs.python.org/bn-in/3.13/library/dis.html

2: https://contextqmd.com/libraries/cpython/versions/3.15.0a7/pages/Doc/whatsnew/3.13

This builds a logical-instruction map, not a raw-bytecode-slot map.

Python 3.13+ removed separate CACHE entries from dis.get_instructions() output; cache metadata is now attached to instructions via the cache_info field. Since this code indexes directly from the logical iterator output, the offset_to_idx map cannot represent actual raw bytecode slots. If this PR requires raw-bytecode parity for correct comparisons in scripts/compare_bytecode.py:151-160, you'll need to either synthesize cache/prefix slots from instruction metadata or use a lower-level bytecode API. Per the coding guidelines, consider whether this limitation should be addressed in Rust instead of further patching the Python helper.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/dis_dump.py` around lines 182 - 193, The current loop that builds filtered and offset_to_idx (iterating over raw and using normalized_idx with _SUPER_DECOMPOSE and _OPNAME_NORMALIZE) produces a logical-instruction index map, not a raw-bytecode-slot map; update the loop that builds filtered/offset_to_idx to account for cache/prefix slots by inspecting each instruction's cache_info (or otherwise using a low-level bytecode API) so offset_to_idx maps actual raw bytecode slots used by compare_bytecode.py, e.g., when inst.cache_info indicates extra cache entries increment normalized_idx and reserve corresponding slot indices before mapping inst.offset; modify the code near the filtered/offset_to_idx construction (the loop over raw, SKIP_OPS handling, and uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those cache/prefix slots or switch to a bytecode-level API.

youknowone force-pushed the bytecode-parity-jump branch from 35f5642 to 34a5afb Compare April 10, 2026 08:29

Bytecode parity - direct loop backedges

e372959

youknowone force-pushed the bytecode-parity-jump branch from 34a5afb to e372959 Compare April 10, 2026 09:03

youknowone marked this pull request as ready for review April 10, 2026 09:36

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

youknowone merged commit eac4572 into RustPython:main Apr 10, 2026
14 checks passed

youknowone deleted the bytecode-parity-jump branch April 10, 2026 09:49

This was referenced Apr 11, 2026

Bytecode parity - for-loop cleanup, return duplication, late jump threading fix #7580

Merged

Inline with-suppress return blocks and extend return duplication #7622

Merged

This was referenced Apr 30, 2026

Bytecode parity CFG normalization and jump cleanup #7721

Merged

Align bytecode CFG cleanup with CPython #7781

Merged

This was referenced May 16, 2026

Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain #7870

Merged

Align nested code object bytecode parity #7942

Merged

Align codegen metadata with CPython #7952

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytecode parity - direct loop backedges#7578

Bytecode parity - direct loop backedges#7578
youknowone merged 1 commit into
RustPython:mainfrom
youknowone:bytecode-parity-jump

youknowone commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Uh oh!

coderabbitai Bot Apr 10, 2026

Uh oh!

coderabbitai Bot Apr 10, 2026

Uh oh!

coderabbitai Bot Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youknowone commented Apr 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youknowone commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading