Skip to content

Align nested code object bytecode parity#7942

Merged
youknowone merged 9 commits into
RustPython:mainfrom
youknowone:bytecode-parity
May 21, 2026
Merged

Align nested code object bytecode parity#7942
youknowone merged 9 commits into
RustPython:mainfrom
youknowone:bytecode-parity

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented May 20, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved try/except/finally, loop-break, and match/arm code generation to better match CPython control-flow, fixing miscompiled breaks, borrowing, and jump behaviors (including async comprehensions).
    • Fixed class-name mangling for underscore-only class names.
  • Tests

    • Added and updated unit tests to assert CPython-like instruction layout and borrow/load-fast behavior.
  • Tooling

    • Stabilized disassembly constant representations for cross-interpreter consistency.
  • Chores

    • Minor public API/compatibility adjustment in code-generation exports.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR adds AST-tail predicates and tuple-nonfolding, wires those predicates into finally/try/orelse/while/for/match codegen preservation and borrow decisions, introduces a per-instruction for-loop break-cleanup flag and IR guards to prevent unsafe rewrites, refines jump sizing/duplication logic, and adds tests plus stable constant normalization for disassembly.

Changes

Try/except control-flow and for-break pattern handling

Layer / File(s) Summary
Statement-shape and truthiness helpers
crates/codegen/src/compile.rs
Adds nested-with/async-comprehension detection, conditional finalbody and handlers-end-with-scope-exit predicates, store-name-then-same-name-method-call probe, and treats tuple constants as non-foldable for truthiness and unary not.
Finally/handler label preservation & try-else wiring
crates/codegen/src/compile.rs
Expands preserve_finally logic, introduces bare-handler successor preservation flags, tracks try-start barrier state, and disables load-fast borrowing for try/else when preservation predicates apply.
While/for end-label reuse and break cleanup tracking
crates/codegen/src/compile.rs
Reuse empty while end label only when orelse empty and body ends with break; mark load-fast barrier on effective-after block; add and use mark_last_for_loop_break_cleanup_jump() and initialize instruction option.
Match/case successor and jump emission
crates/codegen/src/compile.rs
Add OR-pattern/case-context flags, create explicit successor when current block ends with conditionals, emit JumpNoInterrupt for arms with async comprehensions, and adjust OR/default borrow-disabling.
Unit tests and helpers
crates/codegen/src/compile.rs
Add many CPython-parity tests and a helper to inspect load-fast/load-fast-borrow pairs.

IR-level flags and control-flow rewrite guards

Layer / File(s) Summary
Instruction flag and synthetic init
crates/codegen/src/ir.rs
Adds InstructionInfo.for_loop_break_cleanup_jump, initializes it for synthetic instructions and test helpers, and exposes is_conditional_jump as pub(crate).
Jump sizing and threading guards
crates/codegen/src/ir.rs
Add block->instr index and extended-forward-jumps tracking for sizing, widen EXTENDED_ARG logic, and detect threads_break_cleanup to block unsafe threading.
Duplication/clone and passthrough refinements
crates/codegen/src/ir.rs
Introduce block_has_break_continue_cleanup_jump, lineful_shared_jump_back_target, is_for_break_cleanup_block, and jump_targets_exception_region_entry; refine duplication/reorder decisions and end-return duplication gating.
Peephole and load-fast passthroughs
crates/codegen/src/ir.rs
Make tuple constant truthiness unknown (no fold), update folded-load escape detection, add has_protected_warm_fallthrough_predecessor and try_except_return_end passthrough, and prevent location propagation for sentinel overrides.

Symbol table and tooling

Layer / File(s) Summary
mangle_name guard and tests
crates/codegen/src/symboltable.rs
Return original name when class prefix becomes empty after stripping underscores; add unit tests.
dis_dump constant normalization
scripts/dis_dump.py
Add _normalize_const_repr() and use it for LOAD_CONST argrepr extraction to produce cross-interpreter stable constant representations (floats, tuples, frozensets).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Poem

🐰 I hopped through blocks where handlers close,
I marked the jumps where cleanup flows,
Tuples no longer fold in haste,
Labels kept in CPython's taste,
A careful rabbit left no trace.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 49.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Align nested code object bytecode parity' is vague and uses generic phrasing that doesn't convey specific information about the substantive changes made across four files. Consider a more specific title that highlights the main changes, such as 'Refactor compiler control-flow and bytecode parity for nested code objects' or similar.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review May 21, 2026 00:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/codegen/src/ir.rs`:
- Around line 8625-8630: The current guard uses chain_start directly but must
check the first non-empty successor before calling is_for_break_cleanup_block to
avoid false negatives when empty label blocks sit between the conditional and
the real chain body; update the condition so that instead of
is_for_break_cleanup_block(blocks, chain_start) you compute the first non-empty
block by walking successors from chain_start (skipping empty/label-only blocks)
and pass that block to is_for_break_cleanup_block (keep the other checks:
is_generic_false_path_reorder and jump_targets_for_iter(blocks, jump_block)
unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 03818ab4-fb70-4ef1-a405-b00f6b23538e

📥 Commits

Reviewing files that changed from the base of the PR and between 948924a and b92d314.

📒 Files selected for processing (2)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs

Comment thread crates/codegen/src/ir.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/codegen/src/ir.rs (1)

8757-8760: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the first non-empty chain block when checking the for-break cleanup guard.

Line 8759 still checks chain_start directly. If empty label blocks appear before the real PopTop; JUMP_* cleanup block, this guard is bypassed and the reorder can still fire.

Suggested fix
-        if is_generic_false_path_reorder
-            && jump_targets_for_iter(blocks, jump_block)
-            && is_for_break_cleanup_block(blocks, chain_start)
+        if is_generic_false_path_reorder
+            && jump_targets_for_iter(blocks, jump_block)
+            && is_for_break_cleanup_block(blocks, next_nonempty_block(blocks, chain_start))
         {
             current = next;
             continue;
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/codegen/src/ir.rs` around lines 8757 - 8760, The check uses
chain_start directly which can be an empty/label block, letting the
for-break-cleanup guard be bypassed; update the condition so that instead of
passing chain_start into is_for_break_cleanup_block, walk the chain from
chain_start to find the first non-empty block (skip empty/label-only blocks) and
pass that block into is_for_break_cleanup_block while keeping the existing
jump_targets_for_iter and is_generic_false_path_reorder checks intact; update
references around is_generic_false_path_reorder, jump_targets_for_iter(blocks,
jump_block), and is_for_break_cleanup_block(...) to use the located non-empty
chain block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 879-913: The visitor in statements_contain_async_comprehension
misses generator expressions: update AsyncComprehensionVisitor::visit_expr to
also match ast::Expr::Generator(ast::ExprGenerator { generators, .. }) and check
generators.iter().any(|g| g.is_async) so async generator expressions set
found=true; this ensures the predicate used by compile_match_inner (which
controls emitting PseudoInstruction::JumpNoInterrupt) detects async genexps the
same way as ListComp/SetComp/DictComp. If omission was intentional, add a
clarifying comment near statements_contain_async_comprehension explaining why
ast::Expr::Generator is excluded.

---

Duplicate comments:
In `@crates/codegen/src/ir.rs`:
- Around line 8757-8760: The check uses chain_start directly which can be an
empty/label block, letting the for-break-cleanup guard be bypassed; update the
condition so that instead of passing chain_start into
is_for_break_cleanup_block, walk the chain from chain_start to find the first
non-empty block (skip empty/label-only blocks) and pass that block into
is_for_break_cleanup_block while keeping the existing jump_targets_for_iter and
is_generic_false_path_reorder checks intact; update references around
is_generic_false_path_reorder, jump_targets_for_iter(blocks, jump_block), and
is_for_break_cleanup_block(...) to use the located non-empty chain block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: bf1f7bdc-885b-4ca7-b0f4-7f4967682d0f

📥 Commits

Reviewing files that changed from the base of the PR and between b92d314 and a91674d.

📒 Files selected for processing (3)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/codegen/src/symboltable.rs

Comment on lines +879 to +913
fn statements_contain_async_comprehension(body: &[ast::Stmt]) -> bool {
use ast::visitor::Visitor;

#[derive(Default)]
struct AsyncComprehensionVisitor {
found: bool,
}

impl ast::visitor::Visitor<'_> for AsyncComprehensionVisitor {
fn visit_expr(&mut self, expr: &ast::Expr) {
if self.found {
return;
}
match expr {
ast::Expr::ListComp(ast::ExprListComp { generators, .. })
| ast::Expr::SetComp(ast::ExprSetComp { generators, .. })
| ast::Expr::DictComp(ast::ExprDictComp { generators, .. })
if generators.iter().any(|generator| generator.is_async) =>
{
self.found = true;
}
_ => ast::visitor::walk_expr(self, expr),
}
}
}

let mut visitor = AsyncComprehensionVisitor::default();
for stmt in body {
visitor.visit_stmt(stmt);
if visitor.found {
return true;
}
}
false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

❓ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Confirm the AST variant name for generator expressions in the ruff_python_ast
# dependency used by RustPython's codegen.
rg -nP --type=rust -C2 '\bExpr::(Generator|GeneratorExp)\b' crates/codegen
rg -nP --type=rust -C2 '\bExprGenerator(Exp)?\b' crates/codegen
# And whether `is_async` is a field of its Comprehension entries.
rg -nP --type=rust -C3 'struct (ExprGenerator|ExprGeneratorExp)\b'

Repository: RustPython/RustPython

Length of output: 4170


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the exact predicate definition
rg -n "fn statements_contain_async_comprehension" crates/codegen/src/compile.rs
rg -n "statements_contain_async_comprehension\\(" crates/codegen/src/compile.rs

# Inspect the function body around the reported lines
sed -n '840,940p' crates/codegen/src/compile.rs

# Find where JumpNoInterrupt is gated by async-comprehension detection
rg -n "JumpNoInterrupt" crates/codegen/src/compile.rs
rg -n "async_comprehension" crates/codegen/src/compile.rs

# Also search for other “contain_async_*” helpers that might already handle Generator expressions
rg -n "contain_async" crates/codegen/src/compile.rs

# Confirm the AST variant + fields for generator expressions in this repo’s rust bindings usage
rg -n "Expr::Generator\\(ast::ExprGenerator" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 6941


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '9090,9170p' crates/codegen/src/compile.rs

# Inspect the generator-related test mentioned in search results
sed -n '28360,28510p' crates/codegen/src/compile.rs

# Inspect the match async comprehension iterator test for context
sed -n '38810,38890p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 11293


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "statements_contain_async_comprehension" crates/codegen/src/compile.rs

# Inspect around the one call site (~9130-9150) more tightly
sed -n '9105,9155p' crates/codegen/src/compile.rs

# Find definition/usage context of PseudoInstruction::JumpNoInterrupt
rg -n "enum PseudoInstruction|JumpNoInterrupt" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 4733


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the existing Expr::Generator match arm usage (~11248-11252 per earlier search)
sed -n '11190,11320p' crates/codegen/src/compile.rs

# Search for any other async-comprehension / async-generator detection helpers
rg -n "statements_contain_async|async_comprehension|async generator|AsyncGen" crates/codegen/src/compile.rs

# Search for other predicates that might cover Expr::Generator with async generators
rg -n "Expr::Generator\\(ast::ExprGenerator \\{[^}]*generators" crates/codegen/src/compile.rs

# Look for any existing visitor/predicate that checks generator.is_async specifically for Expr::Generator
rg -n "Expr::Generator\\(ast::ExprGenerator\\b[\\s\\S]*is_async" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 5880


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all uses/mentions of `is_async` in the codebase (should be limited).
rg -n --hidden --no-ignore-vcs "\bis_async\b" crates/codegen/src/compile.rs crates/codegen/src -S

# Check if `statements_contain_async_comprehension` is the only place controlling JumpNoInterrupt
rg -n "statements_contain_async_comprehension|body_contains_async_comprehension|JumpNoInterrupt.*end" crates/codegen/src/compile.rs

# Inspect the generator-specific code around the compile.rs match arm (~11248)
sed -n '11230,11310p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 18681


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show more surrounding context for the match compilation block that uses body_contains_async_comprehension
sed -n '9010,9180p' crates/codegen/src/compile.rs

# Find any match-related tests mentioning genexpr or generator expressions
rg -n "test_.*match.*(genexpr|genexpr|Generator|generator expr|async genexpr|AsyncGenWrap)" crates/codegen/src/compile.rs

# Specifically search for "match" and "genexpr" in close proximity
rg -n "match[\\s\\S]{0,300}genexpr|genexpr[\\s\\S]{0,300}match" crates/codegen/src/compile.rs

# Search for expected JumpNoInterrupt behavior in tests (esp around match tests)
rg -n "JumpNoInterrupt\\s*\\{[^}]*end\\s*\\}|body_contains_async_comprehension|statements_contain_async_comprehension" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 7128


Detect async generator expressions in statements_contain_async_comprehension

statements_contain_async_comprehension only checks ast::Expr::ListComp / ast::Expr::SetComp / ast::Expr::DictComp generators for generator.is_async, but async generator expressions are represented as ast::Expr::Generator(ast::ExprGenerator { generators, .. }) and can also have generators[*].is_async: true. Since this predicate gates whether compile_match_inner emits PseudoInstruction::JumpNoInterrupt for match-arm bodies, async genexps could be missed and make match codegen diverge from CPython. Extend the visitor to include ast::Expr::Generator (or add a clarifying comment if the omission is intentional).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/codegen/src/compile.rs` around lines 879 - 913, The visitor in
statements_contain_async_comprehension misses generator expressions: update
AsyncComprehensionVisitor::visit_expr to also match
ast::Expr::Generator(ast::ExprGenerator { generators, .. }) and check
generators.iter().any(|g| g.is_async) so async generator expressions set
found=true; this ensures the predicate used by compile_match_inner (which
controls emitting PseudoInstruction::JumpNoInterrupt) detects async genexps the
same way as ListComp/SetComp/DictComp. If omission was intentional, add a
clarifying comment near statements_contain_async_comprehension explaining why
ast::Expr::Generator is excluded.

@github-actions
Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/pyclbr.py
[ ] test: cpython/Lib/test/test_pyclbr.py (TODO: 1)

dependencies:

  • pyclbr

dependent tests: (1 tests)

  • pyclbr: test_pyclbr

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
crates/codegen/src/ir.rs (1)

8796-8801: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the first non-empty chain block in this for-break guard.

chain_start is still the raw layout successor here. If there’s an empty label block before the real PopTop; JUMP_* cleanup block, this guard misses the for-break shape and the reorder still fires.

Suggested fix
         if is_generic_false_path_reorder
             && jump_targets_for_iter(blocks, jump_block)
-            && is_for_break_cleanup_block(blocks, chain_start)
+            && is_for_break_cleanup_block(
+                blocks,
+                next_nonempty_block(blocks, chain_start),
+            )
         {
             current = next;
             continue;
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/codegen/src/ir.rs` around lines 8796 - 8801, The guard uses
chain_start (the raw layout successor) which can be an empty label block and
thus misses the for-break cleanup pattern; change the check so you first walk
from chain_start to the first non-empty chain block (skip empty/label-only
blocks) and pass that resolved block into is_for_break_cleanup_block(blocks,
resolved_chain_start) (keeping the surrounding conditions using
is_generic_false_path_reorder, jump_targets_for_iter(blocks, jump_block), and
the same current/next control flow), so the for-break shape is detected even
when there's an intervening empty label block.
crates/codegen/src/compile.rs (1)

895-903: ⚠️ Potential issue | 🟡 Minor

Handle async generator expressions in this visitor too.

This still skips ast::Expr::Generator, so a match arm body containing an async genexpr can leave body_contains_async_comprehension false and take the plain Jump path instead of JumpNoInterrupt.

Proposed fix
                 match expr {
                     ast::Expr::ListComp(ast::ExprListComp { generators, .. })
                     | ast::Expr::SetComp(ast::ExprSetComp { generators, .. })
                     | ast::Expr::DictComp(ast::ExprDictComp { generators, .. })
+                    | ast::Expr::Generator(ast::ExprGenerator { generators, .. })
                         if generators.iter().any(|generator| generator.is_async) =>
                     {
                         self.found = true;
                     }
                     _ => ast::visitor::walk_expr(self, expr),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/codegen/src/compile.rs` around lines 895 - 903, The visitor currently
marks async comprehensions when matching ListComp/SetComp/DictComp but misses
ast::Expr::Generator, so add a match arm for ast::Expr::Generator (or include it
in the existing pattern) and set self.found = true when its generator(s) are
async; update the match in compile.rs that currently checks
generators.iter().any(|generator| generator.is_async) to also handle
ast::Expr::Generator so body_contains_async_comprehension becomes true and the
code chooses JumpNoInterrupt instead of Jump.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/codegen/src/compile.rs`:
- Around line 895-903: The visitor currently marks async comprehensions when
matching ListComp/SetComp/DictComp but misses ast::Expr::Generator, so add a
match arm for ast::Expr::Generator (or include it in the existing pattern) and
set self.found = true when its generator(s) are async; update the match in
compile.rs that currently checks generators.iter().any(|generator|
generator.is_async) to also handle ast::Expr::Generator so
body_contains_async_comprehension becomes true and the code chooses
JumpNoInterrupt instead of Jump.

In `@crates/codegen/src/ir.rs`:
- Around line 8796-8801: The guard uses chain_start (the raw layout successor)
which can be an empty label block and thus misses the for-break cleanup pattern;
change the check so you first walk from chain_start to the first non-empty chain
block (skip empty/label-only blocks) and pass that resolved block into
is_for_break_cleanup_block(blocks, resolved_chain_start) (keeping the
surrounding conditions using is_generic_false_path_reorder,
jump_targets_for_iter(blocks, jump_block), and the same current/next control
flow), so the for-break shape is detected even when there's an intervening empty
label block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 45f2ac17-9cf9-4f7a-a1be-fa0cf60bf8ea

📥 Commits

Reviewing files that changed from the base of the PR and between a91674d and 4d19a8c.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_pyclbr.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/codegen/src/symboltable.rs
  • scripts/dis_dump.py

@youknowone youknowone merged commit 06f73f2 into RustPython:main May 21, 2026
27 checks passed
@youknowone youknowone deleted the bytecode-parity branch May 21, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant