Skip to content

Bytecode parity - direct loop backedges#7578

Merged
youknowone merged 1 commit intoRustPython:mainfrom
youknowone:bytecode-parity-jump
Apr 10, 2026
Merged

Bytecode parity - direct loop backedges#7578
youknowone merged 1 commit intoRustPython:mainfrom
youknowone:bytecode-parity-jump

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented Apr 10, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed loop return stack cleanup behavior to correctly align with CPython's iterator-slot management.
  • Chores

    • Enhanced compiler bytecode generation pipeline with improved jump threading and block reordering optimizations.
    • Updated bytecode inspection tools to use raw bytecode representation for accurate analysis.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

The pull request modifies the RustPython compiler's intermediate representation and bytecode generation. Changes include: refactoring loop return stack cleanup in compile.rs, extending the CFG transformation pipeline in ir.rs with new jump threading and block reordering utilities, and updating dis_dump.py to disable instruction normalization for raw bytecode parity.

Changes

Cohort / File(s) Summary
Compiler loop and control flow
crates/codegen/src/compile.rs
Updated Compiler::unwind_fblock to emit Swap { i: 2 } followed by PopTop for ForLoop returns when preserve_tos is set, instead of PopIter, to match CPython's iterator-slot cleanup. Added two bytecode-structure tests validating jump threading around STORE_SUBSCR and loop return control-flow ordering.
CFG transformation and jump threading pipeline
crates/codegen/src/ir.rs
Added set_to_nop helper for consistent NOP conversion. Extended jump threading into jump_threading_unconditional variant and refactored original into jump_threading_impl. Introduced new CFG pipeline stage with reorder_conditional_exit_and_jump_blocks and reorder_conditional_jump_and_exit_blocks predicates. Added duplicate_jump_targets_without_lineno to improve line-number resolution around jump-only blocks.
Bytecode instruction normalization
scripts/dis_dump.py
Disabled instruction normalization and superinstruction decomposition by clearing SKIP_OPS and _OPNAME_NORMALIZE, and removing _SUPER_DECOMPOSE logic. Updated jump-target mapping to operate on raw (un-normalized, un-decomposed) bytecode indices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Bytecode parity - exception #7557: Modifies scripts/dis_dump.py instruction normalization and jump-target index mapping to use raw bytecode stream indices instead of decomposed/normalized indices.

Poem

🐰 The loop returns now reorder clean,
Jump threads race through bytecode stream,
Stack swaps dance where cleanups gleam,
Blocks reorder, CFG supreme! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Bytecode parity - direct loop backedges' directly relates to the main changes in the pull request, which focus on fixing loop backedge bytecode generation and adding jump threading for unconditional jumps to achieve bytecode parity with CPython.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the bytecode-parity-jump branch from 35f5642 to 34a5afb Compare April 10, 2026 08:29
@youknowone youknowone force-pushed the bytecode-parity-jump branch from 34a5afb to e372959 Compare April 10, 2026 09:03
@youknowone youknowone marked this pull request as ready for review April 10, 2026 09:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 11158-11163: Rename the misspelled identifier kwonlydefaults in
the test fixture to a cspell-friendly name (e.g., kw_only_defaults) by updating
the parameter name in the function f signature and every use inside the function
body (references to kwonlydefaults and checks like "if kwonlydefaults and kwarg
in kwonlydefaults" should use the new name); ensure all other test fixtures or
assertions that reference this parameter are updated consistently (function f,
parameters kwonlyargs and arg2value, and any access/assignment to the defaults
mapping).

In `@crates/codegen/src/ir.rs`:
- Around line 2737-2763: The code inspects the target block directly
(blocks[target.idx()]) which can be an empty placeholder and miss trampoline
chains; instead, before checking the target's first non-NOP instruction, resolve
the real target by threading through empty blocks using next_nonempty_block
(i.e. call next_nonempty_block(blocks, target) and update target to that
returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the subsequent
logic that computes target_jump/target_ins operates on the first non-empty
block; retain existing checks for include_conditional, is_conditional_jump, and
other conditions.
- Around line 3551-3562: When cloning the jump-only block you adjust
predecessors for the trampoline itself but forget to increment the predecessor
count for the trampoline's downstream target (old_next), causing stale fan-in;
after you set new_block.next = old_next, push the new predecessor for the cloned
block (predecessors.push(1)), also increment predecessors[old_next.idx()] by one
(with a bounds/existence check if necessary) so the successor's incoming edge
count reflects the new clone — reference symbols: new_block, old_next, new_idx,
predecessors, target, blocks, propagate_locations_in_block.

In `@scripts/dis_dump.py`:
- Around line 182-193: The current loop that builds filtered and offset_to_idx
(iterating over raw and using normalized_idx with _SUPER_DECOMPOSE and
_OPNAME_NORMALIZE) produces a logical-instruction index map, not a
raw-bytecode-slot map; update the loop that builds filtered/offset_to_idx to
account for cache/prefix slots by inspecting each instruction's cache_info (or
otherwise using a low-level bytecode API) so offset_to_idx maps actual raw
bytecode slots used by compare_bytecode.py, e.g., when inst.cache_info indicates
extra cache entries increment normalized_idx and reserve corresponding slot
indices before mapping inst.offset; modify the code near the
filtered/offset_to_idx construction (the loop over raw, SKIP_OPS handling, and
uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those cache/prefix
slots or switch to a bytecode-level API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 273c6c67-97cb-475e-9240-bd988f06cbe8

📥 Commits

Reviewing files that changed from the base of the PR and between a49ce5b and e372959.

📒 Files selected for processing (3)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • scripts/dis_dump.py

Comment on lines +11158 to +11163
def f(kwonlyargs, kwonlydefaults, arg2value):
missing = 0
for kwarg in kwonlyargs:
if kwarg not in arg2value:
if kwonlydefaults and kwarg in kwonlydefaults:
arg2value[kwarg] = kwonlydefaults[kwarg]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix cspell blocker in embedded test source

CI is failing because kwonlydefaults is flagged as an unknown word in Line 11158-Line 11163. This currently blocks merge.

💡 Proposed fix (rename identifier in test fixture)
 def f(kwonlyargs, kwonlydefaults, arg2value):
+def f(kwonlyargs, kw_only_defaults, arg2value):
     missing = 0
     for kwarg in kwonlyargs:
         if kwarg not in arg2value:
-            if kwonlydefaults and kwarg in kwonlydefaults:
-                arg2value[kwarg] = kwonlydefaults[kwarg]
+            if kw_only_defaults and kwarg in kw_only_defaults:
+                arg2value[kwarg] = kw_only_defaults[kwarg]
             else:
                 missing += 1
🧰 Tools
🪛 GitHub Actions: CI

[error] 11158-11163: cspell hook failed (exit code 1): Unknown word (kwonlydefaults)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 11158 - 11163, Rename the
misspelled identifier kwonlydefaults in the test fixture to a cspell-friendly
name (e.g., kw_only_defaults) by updating the parameter name in the function f
signature and every use inside the function body (references to kwonlydefaults
and checks like "if kwonlydefaults and kwarg in kwonlydefaults" should use the
new name); ensure all other test fixtures or assertions that reference this
parameter are updated consistently (function f, parameters kwonlyargs and
arg2value, and any access/assignment to the defaults mapping).

Comment on lines 2737 to +2763
let target = ins.target;
if target == BlockIdx::NULL {
continue;
}
if !ins.instr.is_unconditional_jump() && !is_conditional_jump(&ins.instr) {
if !(ins.instr.is_unconditional_jump()
|| include_conditional && is_conditional_jump(&ins.instr))
{
continue;
}
if include_conditional && is_conditional_jump(&ins.instr) {
let next = next_nonempty_block(blocks, blocks[bi].next);
if next != BlockIdx::NULL
&& blocks[next.idx()]
.instructions
.last()
.is_some_and(|instr| instr.instr.is_scope_exit())
{
continue;
}
}
// Check if target block's first instruction is an unconditional jump
let target_block = &blocks[target.idx()];
if let Some(target_ins) = target_block.instructions.first()
let target_jump = blocks[target.idx()]
.instructions
.iter()
.find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop)))
.copied();
if let Some(target_ins) = target_jump
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Thread through empty target blocks before inspecting the trampoline.

At this stage a jump can still target an empty placeholder block; several nearby CFG helpers already call next_nonempty_block() for that reason. Looking only at blocks[target] misses JUMP_* -> empty -> JUMP_* chains, so the new late pass can leave indirect backedges in place.

Suggested fix
-            let target = ins.target;
+            let target = next_nonempty_block(blocks, ins.target);
             if target == BlockIdx::NULL {
                 continue;
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let target = ins.target;
if target == BlockIdx::NULL {
continue;
}
if !ins.instr.is_unconditional_jump() && !is_conditional_jump(&ins.instr) {
if !(ins.instr.is_unconditional_jump()
|| include_conditional && is_conditional_jump(&ins.instr))
{
continue;
}
if include_conditional && is_conditional_jump(&ins.instr) {
let next = next_nonempty_block(blocks, blocks[bi].next);
if next != BlockIdx::NULL
&& blocks[next.idx()]
.instructions
.last()
.is_some_and(|instr| instr.instr.is_scope_exit())
{
continue;
}
}
// Check if target block's first instruction is an unconditional jump
let target_block = &blocks[target.idx()];
if let Some(target_ins) = target_block.instructions.first()
let target_jump = blocks[target.idx()]
.instructions
.iter()
.find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop)))
.copied();
if let Some(target_ins) = target_jump
let target = next_nonempty_block(blocks, ins.target);
if target == BlockIdx::NULL {
continue;
}
if !(ins.instr.is_unconditional_jump()
|| include_conditional && is_conditional_jump(&ins.instr))
{
continue;
}
if include_conditional && is_conditional_jump(&ins.instr) {
let next = next_nonempty_block(blocks, blocks[bi].next);
if next != BlockIdx::NULL
&& blocks[next.idx()]
.instructions
.last()
.is_some_and(|instr| instr.instr.is_scope_exit())
{
continue;
}
}
// Check if target block's first instruction is an unconditional jump
let target_jump = blocks[target.idx()]
.instructions
.iter()
.find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop)))
.copied();
if let Some(target_ins) = target_jump
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/ir.rs` around lines 2737 - 2763, The code inspects the
target block directly (blocks[target.idx()]) which can be an empty placeholder
and miss trampoline chains; instead, before checking the target's first non-NOP
instruction, resolve the real target by threading through empty blocks using
next_nonempty_block (i.e. call next_nonempty_block(blocks, target) and update
target to that returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the
subsequent logic that computes target_jump/target_ins operates on the first
non-empty block; retain existing checks for include_conditional,
is_conditional_jump, and other conditions.

Comment on lines +3551 to +3562
let new_idx = BlockIdx(blocks.len() as u32);
let mut new_block = blocks[target.idx()].clone();
propagate_locations_in_block(&mut new_block, last.location, last.end_location);
let old_next = blocks[current.idx()].next;
new_block.next = old_next;
blocks.push(new_block);
blocks[current.idx()].next = new_idx;

let last_mut = blocks[current.idx()].instructions.last_mut().unwrap();
last_mut.target = new_idx;
predecessors[target.idx()] -= 1;
predecessors.push(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update predecessor counts for the cloned trampoline's downstream target.

Cloning a jump-only block adds a new incoming edge to that block's jump destination, but predecessors is only adjusted for the trampoline itself. propagate_line_numbers() then sees stale fan-in and can incorrectly propagate locations into a now-shared successor.

Suggested fix
         let new_idx = BlockIdx(blocks.len() as u32);
         let mut new_block = blocks[target.idx()].clone();
+        let downstream = new_block
+            .instructions
+            .last()
+            .filter(|ins| ins.instr.is_unconditional_jump() && ins.target != BlockIdx::NULL)
+            .map(|ins| next_nonempty_block(blocks, ins.target));
         propagate_locations_in_block(&mut new_block, last.location, last.end_location);
         let old_next = blocks[current.idx()].next;
         new_block.next = old_next;
         blocks.push(new_block);
         blocks[current.idx()].next = new_idx;

         let last_mut = blocks[current.idx()].instructions.last_mut().unwrap();
         last_mut.target = new_idx;
         predecessors[target.idx()] -= 1;
         predecessors.push(1);
+        if let Some(downstream) = downstream.filter(|idx| *idx != BlockIdx::NULL) {
+            predecessors[downstream.idx()] += 1;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/ir.rs` around lines 3551 - 3562, When cloning the
jump-only block you adjust predecessors for the trampoline itself but forget to
increment the predecessor count for the trampoline's downstream target
(old_next), causing stale fan-in; after you set new_block.next = old_next, push
the new predecessor for the cloned block (predecessors.push(1)), also increment
predecessors[old_next.idx()] by one (with a bounds/existence check if necessary)
so the successor's incoming edge count reflects the new clone — reference
symbols: new_block, old_next, new_idx, predecessors, target, blocks,
propagate_locations_in_block.

Comment on lines +182 to +193
# Build filtered list and offset-to-index mapping for the normalized stream.
# This must use post-decomposition indices; otherwise a superinstruction that
# expands into multiple logical ops shifts later jump targets by 1.
filtered = []
offset_to_idx = {}
normalized_idx = 0
for inst in raw:
if inst.opname in SKIP_OPS:
continue
offset_to_idx[inst.offset] = len(filtered)
opname = _OPNAME_NORMALIZE.get(inst.opname, inst.opname)
offset_to_idx[inst.offset] = normalized_idx
normalized_idx += len(_SUPER_DECOMPOSE.get(opname, (opname,)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import dis
import types

src = """
def f(xs):
    for x in xs:
        if x:
            pass
"""
mod = compile(src, "<mem>", "exec")
fn = next(c for c in mod.co_consts if isinstance(c, types.CodeType))
insts = list(dis.get_instructions(fn))

print("separate CACHE items:", [i.opname for i in insts if i.opname == "CACHE"])
print("metadata fields:", [name for name in ("start_offset", "cache_offset", "end_offset", "cache_info") if hasattr(insts[0], name)])
print("any cache_info present:", any(getattr(i, "cache_info", None) for i in insts))
PY

Repository: RustPython/RustPython

Length of output: 1796


🌐 Web query:

Python 3.13 dis.get_instructions CACHE entries behavior

💡 Result:

In Python 3.13, dis.get_instructions() never yields separate CACHE instructions anymore, even if you pass show_caches=True—that argument is deprecated and has no effect. Instead, any inline-cache data that would previously have appeared as CACHE entries is exposed on the preceding logical instruction via the Instruction.cache_info field. [1], [2]

If you want to display cache entries in a disassembly listing, use dis.dis(..., show_caches=True) (that’s separate from get_instructions). [1]

Sources:
[1] Python 3.13 dis module docs (get_instructions change note; Instruction.cache_info; show_caches) (docs.python.org)
[2] “What’s New in Python 3.13” (dis.get_instructions cache handling; show_caches deprecated) (contextqmd.com)

Citations:


This builds a logical-instruction map, not a raw-bytecode-slot map.

Python 3.13+ removed separate CACHE entries from dis.get_instructions() output; cache metadata is now attached to instructions via the cache_info field. Since this code indexes directly from the logical iterator output, the offset_to_idx map cannot represent actual raw bytecode slots. If this PR requires raw-bytecode parity for correct comparisons in scripts/compare_bytecode.py:151-160, you'll need to either synthesize cache/prefix slots from instruction metadata or use a lower-level bytecode API. Per the coding guidelines, consider whether this limitation should be addressed in Rust instead of further patching the Python helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/dis_dump.py` around lines 182 - 193, The current loop that builds
filtered and offset_to_idx (iterating over raw and using normalized_idx with
_SUPER_DECOMPOSE and _OPNAME_NORMALIZE) produces a logical-instruction index
map, not a raw-bytecode-slot map; update the loop that builds
filtered/offset_to_idx to account for cache/prefix slots by inspecting each
instruction's cache_info (or otherwise using a low-level bytecode API) so
offset_to_idx maps actual raw bytecode slots used by compare_bytecode.py, e.g.,
when inst.cache_info indicates extra cache entries increment normalized_idx and
reserve corresponding slot indices before mapping inst.offset; modify the code
near the filtered/offset_to_idx construction (the loop over raw, SKIP_OPS
handling, and uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those
cache/prefix slots or switch to a bytecode-level API.

@youknowone youknowone merged commit eac4572 into RustPython:main Apr 10, 2026
14 checks passed
@youknowone youknowone deleted the bytecode-parity-jump branch April 10, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant