Bytecode parity - direct loop backedges#7578
Conversation
📝 WalkthroughWalkthroughThe pull request modifies the RustPython compiler's intermediate representation and bytecode generation. Changes include: refactoring loop return stack cleanup in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
35f5642 to
34a5afb
Compare
34a5afb to
e372959
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 11158-11163: Rename the misspelled identifier kwonlydefaults in
the test fixture to a cspell-friendly name (e.g., kw_only_defaults) by updating
the parameter name in the function f signature and every use inside the function
body (references to kwonlydefaults and checks like "if kwonlydefaults and kwarg
in kwonlydefaults" should use the new name); ensure all other test fixtures or
assertions that reference this parameter are updated consistently (function f,
parameters kwonlyargs and arg2value, and any access/assignment to the defaults
mapping).
In `@crates/codegen/src/ir.rs`:
- Around line 2737-2763: The code inspects the target block directly
(blocks[target.idx()]) which can be an empty placeholder and miss trampoline
chains; instead, before checking the target's first non-NOP instruction, resolve
the real target by threading through empty blocks using next_nonempty_block
(i.e. call next_nonempty_block(blocks, target) and update target to that
returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the subsequent
logic that computes target_jump/target_ins operates on the first non-empty
block; retain existing checks for include_conditional, is_conditional_jump, and
other conditions.
- Around line 3551-3562: When cloning the jump-only block you adjust
predecessors for the trampoline itself but forget to increment the predecessor
count for the trampoline's downstream target (old_next), causing stale fan-in;
after you set new_block.next = old_next, push the new predecessor for the cloned
block (predecessors.push(1)), also increment predecessors[old_next.idx()] by one
(with a bounds/existence check if necessary) so the successor's incoming edge
count reflects the new clone — reference symbols: new_block, old_next, new_idx,
predecessors, target, blocks, propagate_locations_in_block.
In `@scripts/dis_dump.py`:
- Around line 182-193: The current loop that builds filtered and offset_to_idx
(iterating over raw and using normalized_idx with _SUPER_DECOMPOSE and
_OPNAME_NORMALIZE) produces a logical-instruction index map, not a
raw-bytecode-slot map; update the loop that builds filtered/offset_to_idx to
account for cache/prefix slots by inspecting each instruction's cache_info (or
otherwise using a low-level bytecode API) so offset_to_idx maps actual raw
bytecode slots used by compare_bytecode.py, e.g., when inst.cache_info indicates
extra cache entries increment normalized_idx and reserve corresponding slot
indices before mapping inst.offset; modify the code near the
filtered/offset_to_idx construction (the loop over raw, SKIP_OPS handling, and
uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those cache/prefix
slots or switch to a bytecode-level API.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 273c6c67-97cb-475e-9240-bd988f06cbe8
📒 Files selected for processing (3)
crates/codegen/src/compile.rscrates/codegen/src/ir.rsscripts/dis_dump.py
| def f(kwonlyargs, kwonlydefaults, arg2value): | ||
| missing = 0 | ||
| for kwarg in kwonlyargs: | ||
| if kwarg not in arg2value: | ||
| if kwonlydefaults and kwarg in kwonlydefaults: | ||
| arg2value[kwarg] = kwonlydefaults[kwarg] |
There was a problem hiding this comment.
Fix cspell blocker in embedded test source
CI is failing because kwonlydefaults is flagged as an unknown word in Line 11158-Line 11163. This currently blocks merge.
💡 Proposed fix (rename identifier in test fixture)
def f(kwonlyargs, kwonlydefaults, arg2value):
+def f(kwonlyargs, kw_only_defaults, arg2value):
missing = 0
for kwarg in kwonlyargs:
if kwarg not in arg2value:
- if kwonlydefaults and kwarg in kwonlydefaults:
- arg2value[kwarg] = kwonlydefaults[kwarg]
+ if kw_only_defaults and kwarg in kw_only_defaults:
+ arg2value[kwarg] = kw_only_defaults[kwarg]
else:
missing += 1🧰 Tools
🪛 GitHub Actions: CI
[error] 11158-11163: cspell hook failed (exit code 1): Unknown word (kwonlydefaults)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/compile.rs` around lines 11158 - 11163, Rename the
misspelled identifier kwonlydefaults in the test fixture to a cspell-friendly
name (e.g., kw_only_defaults) by updating the parameter name in the function f
signature and every use inside the function body (references to kwonlydefaults
and checks like "if kwonlydefaults and kwarg in kwonlydefaults" should use the
new name); ensure all other test fixtures or assertions that reference this
parameter are updated consistently (function f, parameters kwonlyargs and
arg2value, and any access/assignment to the defaults mapping).
| let target = ins.target; | ||
| if target == BlockIdx::NULL { | ||
| continue; | ||
| } | ||
| if !ins.instr.is_unconditional_jump() && !is_conditional_jump(&ins.instr) { | ||
| if !(ins.instr.is_unconditional_jump() | ||
| || include_conditional && is_conditional_jump(&ins.instr)) | ||
| { | ||
| continue; | ||
| } | ||
| if include_conditional && is_conditional_jump(&ins.instr) { | ||
| let next = next_nonempty_block(blocks, blocks[bi].next); | ||
| if next != BlockIdx::NULL | ||
| && blocks[next.idx()] | ||
| .instructions | ||
| .last() | ||
| .is_some_and(|instr| instr.instr.is_scope_exit()) | ||
| { | ||
| continue; | ||
| } | ||
| } | ||
| // Check if target block's first instruction is an unconditional jump | ||
| let target_block = &blocks[target.idx()]; | ||
| if let Some(target_ins) = target_block.instructions.first() | ||
| let target_jump = blocks[target.idx()] | ||
| .instructions | ||
| .iter() | ||
| .find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop))) | ||
| .copied(); | ||
| if let Some(target_ins) = target_jump |
There was a problem hiding this comment.
Thread through empty target blocks before inspecting the trampoline.
At this stage a jump can still target an empty placeholder block; several nearby CFG helpers already call next_nonempty_block() for that reason. Looking only at blocks[target] misses JUMP_* -> empty -> JUMP_* chains, so the new late pass can leave indirect backedges in place.
Suggested fix
- let target = ins.target;
+ let target = next_nonempty_block(blocks, ins.target);
if target == BlockIdx::NULL {
continue;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let target = ins.target; | |
| if target == BlockIdx::NULL { | |
| continue; | |
| } | |
| if !ins.instr.is_unconditional_jump() && !is_conditional_jump(&ins.instr) { | |
| if !(ins.instr.is_unconditional_jump() | |
| || include_conditional && is_conditional_jump(&ins.instr)) | |
| { | |
| continue; | |
| } | |
| if include_conditional && is_conditional_jump(&ins.instr) { | |
| let next = next_nonempty_block(blocks, blocks[bi].next); | |
| if next != BlockIdx::NULL | |
| && blocks[next.idx()] | |
| .instructions | |
| .last() | |
| .is_some_and(|instr| instr.instr.is_scope_exit()) | |
| { | |
| continue; | |
| } | |
| } | |
| // Check if target block's first instruction is an unconditional jump | |
| let target_block = &blocks[target.idx()]; | |
| if let Some(target_ins) = target_block.instructions.first() | |
| let target_jump = blocks[target.idx()] | |
| .instructions | |
| .iter() | |
| .find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop))) | |
| .copied(); | |
| if let Some(target_ins) = target_jump | |
| let target = next_nonempty_block(blocks, ins.target); | |
| if target == BlockIdx::NULL { | |
| continue; | |
| } | |
| if !(ins.instr.is_unconditional_jump() | |
| || include_conditional && is_conditional_jump(&ins.instr)) | |
| { | |
| continue; | |
| } | |
| if include_conditional && is_conditional_jump(&ins.instr) { | |
| let next = next_nonempty_block(blocks, blocks[bi].next); | |
| if next != BlockIdx::NULL | |
| && blocks[next.idx()] | |
| .instructions | |
| .last() | |
| .is_some_and(|instr| instr.instr.is_scope_exit()) | |
| { | |
| continue; | |
| } | |
| } | |
| // Check if target block's first instruction is an unconditional jump | |
| let target_jump = blocks[target.idx()] | |
| .instructions | |
| .iter() | |
| .find(|ins| !matches!(ins.instr.real(), Some(Instruction::Nop))) | |
| .copied(); | |
| if let Some(target_ins) = target_jump |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/ir.rs` around lines 2737 - 2763, The code inspects the
target block directly (blocks[target.idx()]) which can be an empty placeholder
and miss trampoline chains; instead, before checking the target's first non-NOP
instruction, resolve the real target by threading through empty blocks using
next_nonempty_block (i.e. call next_nonempty_block(blocks, target) and update
target to that returned BlockIdx, skipping if it becomes BlockIdx::NULL) so the
subsequent logic that computes target_jump/target_ins operates on the first
non-empty block; retain existing checks for include_conditional,
is_conditional_jump, and other conditions.
| let new_idx = BlockIdx(blocks.len() as u32); | ||
| let mut new_block = blocks[target.idx()].clone(); | ||
| propagate_locations_in_block(&mut new_block, last.location, last.end_location); | ||
| let old_next = blocks[current.idx()].next; | ||
| new_block.next = old_next; | ||
| blocks.push(new_block); | ||
| blocks[current.idx()].next = new_idx; | ||
|
|
||
| let last_mut = blocks[current.idx()].instructions.last_mut().unwrap(); | ||
| last_mut.target = new_idx; | ||
| predecessors[target.idx()] -= 1; | ||
| predecessors.push(1); |
There was a problem hiding this comment.
Update predecessor counts for the cloned trampoline's downstream target.
Cloning a jump-only block adds a new incoming edge to that block's jump destination, but predecessors is only adjusted for the trampoline itself. propagate_line_numbers() then sees stale fan-in and can incorrectly propagate locations into a now-shared successor.
Suggested fix
let new_idx = BlockIdx(blocks.len() as u32);
let mut new_block = blocks[target.idx()].clone();
+ let downstream = new_block
+ .instructions
+ .last()
+ .filter(|ins| ins.instr.is_unconditional_jump() && ins.target != BlockIdx::NULL)
+ .map(|ins| next_nonempty_block(blocks, ins.target));
propagate_locations_in_block(&mut new_block, last.location, last.end_location);
let old_next = blocks[current.idx()].next;
new_block.next = old_next;
blocks.push(new_block);
blocks[current.idx()].next = new_idx;
let last_mut = blocks[current.idx()].instructions.last_mut().unwrap();
last_mut.target = new_idx;
predecessors[target.idx()] -= 1;
predecessors.push(1);
+ if let Some(downstream) = downstream.filter(|idx| *idx != BlockIdx::NULL) {
+ predecessors[downstream.idx()] += 1;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/ir.rs` around lines 3551 - 3562, When cloning the
jump-only block you adjust predecessors for the trampoline itself but forget to
increment the predecessor count for the trampoline's downstream target
(old_next), causing stale fan-in; after you set new_block.next = old_next, push
the new predecessor for the cloned block (predecessors.push(1)), also increment
predecessors[old_next.idx()] by one (with a bounds/existence check if necessary)
so the successor's incoming edge count reflects the new clone — reference
symbols: new_block, old_next, new_idx, predecessors, target, blocks,
propagate_locations_in_block.
| # Build filtered list and offset-to-index mapping for the normalized stream. | ||
| # This must use post-decomposition indices; otherwise a superinstruction that | ||
| # expands into multiple logical ops shifts later jump targets by 1. | ||
| filtered = [] | ||
| offset_to_idx = {} | ||
| normalized_idx = 0 | ||
| for inst in raw: | ||
| if inst.opname in SKIP_OPS: | ||
| continue | ||
| offset_to_idx[inst.offset] = len(filtered) | ||
| opname = _OPNAME_NORMALIZE.get(inst.opname, inst.opname) | ||
| offset_to_idx[inst.offset] = normalized_idx | ||
| normalized_idx += len(_SUPER_DECOMPOSE.get(opname, (opname,))) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import dis
import types
src = """
def f(xs):
for x in xs:
if x:
pass
"""
mod = compile(src, "<mem>", "exec")
fn = next(c for c in mod.co_consts if isinstance(c, types.CodeType))
insts = list(dis.get_instructions(fn))
print("separate CACHE items:", [i.opname for i in insts if i.opname == "CACHE"])
print("metadata fields:", [name for name in ("start_offset", "cache_offset", "end_offset", "cache_info") if hasattr(insts[0], name)])
print("any cache_info present:", any(getattr(i, "cache_info", None) for i in insts))
PYRepository: RustPython/RustPython
Length of output: 1796
🌐 Web query:
Python 3.13 dis.get_instructions CACHE entries behavior
💡 Result:
In Python 3.13, dis.get_instructions() never yields separate CACHE instructions anymore, even if you pass show_caches=True—that argument is deprecated and has no effect. Instead, any inline-cache data that would previously have appeared as CACHE entries is exposed on the preceding logical instruction via the Instruction.cache_info field. [1], [2]
If you want to display cache entries in a disassembly listing, use dis.dis(..., show_caches=True) (that’s separate from get_instructions). [1]
Sources:
[1] Python 3.13 dis module docs (get_instructions change note; Instruction.cache_info; show_caches) (docs.python.org)
[2] “What’s New in Python 3.13” (dis.get_instructions cache handling; show_caches deprecated) (contextqmd.com)
Citations:
- 1: https://docs.python.org/bn-in/3.13/library/dis.html
- 2: https://contextqmd.com/libraries/cpython/versions/3.15.0a7/pages/Doc/whatsnew/3.13
This builds a logical-instruction map, not a raw-bytecode-slot map.
Python 3.13+ removed separate CACHE entries from dis.get_instructions() output; cache metadata is now attached to instructions via the cache_info field. Since this code indexes directly from the logical iterator output, the offset_to_idx map cannot represent actual raw bytecode slots. If this PR requires raw-bytecode parity for correct comparisons in scripts/compare_bytecode.py:151-160, you'll need to either synthesize cache/prefix slots from instruction metadata or use a lower-level bytecode API. Per the coding guidelines, consider whether this limitation should be addressed in Rust instead of further patching the Python helper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/dis_dump.py` around lines 182 - 193, The current loop that builds
filtered and offset_to_idx (iterating over raw and using normalized_idx with
_SUPER_DECOMPOSE and _OPNAME_NORMALIZE) produces a logical-instruction index
map, not a raw-bytecode-slot map; update the loop that builds
filtered/offset_to_idx to account for cache/prefix slots by inspecting each
instruction's cache_info (or otherwise using a low-level bytecode API) so
offset_to_idx maps actual raw bytecode slots used by compare_bytecode.py, e.g.,
when inst.cache_info indicates extra cache entries increment normalized_idx and
reserve corresponding slot indices before mapping inst.offset; modify the code
near the filtered/offset_to_idx construction (the loop over raw, SKIP_OPS
handling, and uses of _SUPER_DECOMPOSE/_OPNAME_NORMALIZE) to synthesize those
cache/prefix slots or switch to a bytecode-level API.
Summary by CodeRabbit
Bug Fixes
Chores