Skip to content

Align compiler and AST behavior with CPython#8138

Draft
youknowone wants to merge 2 commits into
RustPython:mainfrom
youknowone:bytecode-parity
Draft

Align compiler and AST behavior with CPython#8138
youknowone wants to merge 2 commits into
RustPython:mainfrom
youknowone:bytecode-parity

Conversation

@youknowone

@youknowone youknowone commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

  • Align RustPython compiler/codegen behavior more closely with CPython, including bytecode layout, control-flow lowering, optimizer behavior, source locations, exception tables, and future flags.
  • Expand _ast, symtable, and compile-path handling so RustPython follows CPython semantics while using the Ruff parser AST surface.
  • Remove the local Ruff parser vendor fork and depend on upstream Ruff by pinned commit for reproducible builds.
  • Update compiler, opcode, and Python-level tests/snapshots for the new CPython-aligned behavior.

Summary by CodeRabbit

Release Notes

  • New Features
    • Extended parsing/compilation support for more __future__ flags, including improved future-annotation handling.
    • Added a warning for invalid control flow inside finally blocks during preprocessing.
    • Enhanced exec to accept closure objects.
  • Bug Fixes
    • Improved error reporting for compile/import/eval failures to raise the correct Python exceptions.
    • Strengthened structural pattern matching validation (duplicate attribute/key detection).
    • Fixed dict-comprehension un-parsing behavior and improved embedding-facing Py_CompileString handling.
  • Chores
    • Updated lint/spell configuration and refreshed tooling dependencies.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d8777e1a-93d9-4dfe-8a73-a748128bb155

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR updates compile-time plumbing, code generation, AST conversion and validation, runtime pattern handling, and builtin execution paths to align behavior with newer CPython-style semantics.

Changes

Compiler and AST parity

Layer / File(s) Summary
Tooling, C-API compile path, and compiler metadata
.cspell.json, .pre-commit-config.yaml, Cargo.toml, crates/capi/src/ceval.rs, crates/compiler-core/src/bytecode*.rs, crates/stdlib/src/_opcode.rs, crates/vm/Cargo.toml
Updates Ruff sourcing/configuration, adds thin-vec, extends C-ABI compile helpers and compiler flags, widens unpack arguments, and adjusts low-level compile/error conversion paths and opcode tests.
Codegen errors, preprocessing, and optimizer flow
crates/codegen/src/{error.rs,ir.rs,lib.rs,preprocess.rs,unparse.rs}
Adds public preprocessing APIs and AST side tables, revises error variants, changes future-feature and docstring preprocessing, updates constant folding and jump threading, and expands optimizer regression tests.
Runtime type flags and match execution
crates/vm/src/{builtins/type.rs,frame.rs,object/core.rs,stdlib/{_abc.rs,_symtable.rs}}
Caches inherited ABC collection flags on types, propagates them through registration, uses them in match operations, checks duplicate class attributes and mapping keys, and extends symbol-table scope typing.
AST core helpers and Python-side node classes
crates/vm/src/stdlib/_ast/{constant.rs,pyast.rs,python.rs,repr.rs,type_ignore.rs,type_parameters.rs,validate.rs,basic.rs,operator.rs,other.rs,parameter.rs,module.rs,node.rs,argument.rs}
Introduces public AST override storage and lookup paths, rewrites shared list and field extraction helpers, updates Python-side AST class behavior, and makes validation consume override-backed node data.
AST node conversion refactors
crates/vm/src/stdlib/_ast/{expression.rs,statement.rs,pattern.rs,string.rs,exception.rs,elif_else_clause.rs}
Refactors expression, statement, pattern, exception, f-string, and template-string conversions around range-aware constructors and public-node reconstruction paths.
Builtin compile, eval, and exec flow
crates/vm/src/stdlib/builtins.rs, crates/vm/src/{eval.rs,import.rs,exceptions.rs,sys.rs}
Reworks builtin compilation and execution to merge future flags, audit source compilation, decode byte inputs, validate closures, and adjust related runtime error conversion paths.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant builtins.rs
  participant _ast
  participant codegen
  participant VM

  Caller->>builtins.rs: compile/eval/exec input
  builtins.rs->>_ast: parse, preprocess, or validate AST/source
  _ast->>codegen: preprocess module and future flags
  codegen->>VM: compile code object with flags
  VM-->>builtins.rs: code object or exception
  builtins.rs-->>Caller: result or raised exception
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐇 I hopped through syntax, flags, and trees,
and tugged at bytecode in the breeze.
With AST burrows neat and deep,
the matchers now more closely keep.
A compile-time carrot, crisp and bright—
thump thump, the tests all passed tonight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Align compiler and AST behavior with CPython" accurately and clearly describes the primary objective of this comprehensive changeset, which updates RustPython's compiler pipeline and AST handling to match CPython semantics.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_cmd_line_script.py (TODO: 14)

dependencies:

dependent tests: (no tests depend on cmd_line_script)

[x] lib: cpython/Lib/typing.py
[ ] test: cpython/Lib/test/test_typing.py (TODO: 2)
[x] test: cpython/Lib/test/test_type_aliases.py
[x] test: cpython/Lib/test/test_type_annotations.py (TODO: 1)
[x] test: cpython/Lib/test/test_type_params.py (TODO: 1)
[x] test: cpython/Lib/test/test_genericalias.py

dependencies:

  • typing

dependent tests: (19 tests)

  • typing: test_annotationlib test_builtin test_copy test_enum test_fractions test_funcattrs test_functools test_genericalias test_grammar test_inspect test_isinstance test_patma test_peg_generator test_pydoc test_pyrepl test_type_aliases test_type_params test_types test_typing

[x] lib: cpython/Lib/future.py

dependencies:

  • future

dependent tests: (35 tests)

  • future: test_flufl test_future_stmt test_generator_stop test_pydoc test_zoneinfo
    • codeop: test_codeop
      • pdb: test_pdb
      • traceback: test_asyncio test_builtin test_code_module test_contextlib test_contextlib_async test_coroutines test_dictcomps test_exceptions test_http_cookiejar test_importlib test_iter test_listcomps test_pyexpat test_setcomps test_socket test_ssl test_subprocess test_sys test_threadedtempfile test_threading test_traceback test_unittest test_with test_zipimport
    • importlib.metadata: test_importlib
    • pydoc: test_enum
      • xmlrpc.server: test_docxmlrpc test_xmlrpc

[ ] test: cpython/Lib/test/test_patma.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on patma)

[x] test: cpython/Lib/test/test_named_expressions.py (TODO: 2)

dependencies:

dependent tests: (no tests depend on named_expressions)

[ ] test: cpython/Lib/test/test_syntax.py (TODO: 65)

dependencies:

dependent tests: (no tests depend on syntax)

[ ] test: cpython/Lib/test/test_pyrepl (TODO: 23)
[ ] test: cpython/Lib/test/test_repl.py

dependencies:

dependent tests: (no tests depend on pyrepl)

[ ] test: cpython/Lib/test/test_exceptions.py (TODO: 22)
[ ] test: cpython/Lib/test/test_baseexception.py
[x] test: cpython/Lib/test/test_except_star.py (TODO: 1)
[ ] test: cpython/Lib/test/test_exception_group.py (TODO: 3)
[x] test: cpython/Lib/test/test_exception_hierarchy.py (TODO: 2)
[x] test: cpython/Lib/test/test_exception_variations.py

dependencies:

dependent tests: (no tests depend on exception)

[x] test: cpython/Lib/test/test_compile.py
[x] test: cpython/Lib/test/test_compiler_assemble.py
[x] test: cpython/Lib/test/test_compiler_codegen.py
[x] test: cpython/Lib/test/test_peepholer.py (TODO: 3)

dependencies:

dependent tests: (no tests depend on compile)

[x] test: cpython/Lib/test/test_unpack.py
[ ] test: cpython/Lib/test/test_unpack_ex.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on unpack)

[x] lib: cpython/Lib/ast.py
[x] lib: cpython/Lib/_ast_unparse.py
[x] test: cpython/Lib/test/test_unparse.py
[x] test: cpython/Lib/test/test_type_comments.py

dependencies:

  • ast

dependent tests: (148 tests)

  • ast: test_ast test_builtin test_compile test_compiler_codegen test_dis test_fstring test_future_stmt test_peepholer test_peg_generator test_site test_ssl test_type_comments test_ucn test_unparse
    • annotationlib: test_annotationlib test_functools test_grammar test_inspect test_reprlib test_type_annotations test_type_params test_typing
      • dataclasses: test__colorize test_copy test_ctypes test_enum test_genericalias test_patma test_pprint test_pydoc test_regrtest test_zoneinfo
      • inspect: test_abc test_argparse test_asyncgen test_buffer test_clinic test_code test_collections test_coroutines test_decimal test_generators test_monitoring test_ntpath test_operator test_posixpath test_signal test_sqlite3 test_traceback test_turtle test_types test_unittest test_yield_from test_zipimport test_zipimport_support
    • dbm.dumb: test_dbm_dumb
    • inspect:
      • bdb: test_bdb test_pdb
      • cmd: test_cmd
      • importlib.metadata: test_importlib
      • pkgutil: test_pkgutil test_pyrepl test_runpy
      • rlcompleter: test_pyrepl test_rlcompleter
      • trace: test_trace
      • xmlrpc.server: test_docxmlrpc test_xmlrpc
    • pyclbr: test_pyclbr
    • traceback: test_asyncio test_code_module test_contextlib test_contextlib_async test_dictcomps test_exceptions test_http_cookiejar test_importlib test_iter test_listcomps test_pyexpat test_setcomps test_socket test_subprocess test_sys test_threadedtempfile test_threading test_unittest test_with
      • concurrent.futures.process: test_compileall test_concurrent_futures
      • http.cookiejar: test_urllib2
      • logging: test_asyncio test_hashlib test_logging test_support test_urllib2net
      • multiprocessing: test_asyncio test_concurrent_futures test_fcntl test_memoryview test_multiprocessing_main_handling test_re
      • py_compile: test_cmd_line_script test_importlib test_modulefinder test_py_compile
      • socketserver: test_imaplib test_socketserver test_wsgiref
      • threading: test_android test_asyncio test_bytes test_bz2 test_concurrent_futures test_context test_ctypes test_email test_fork1 test_frame test_ftplib test_gc test_httplib test_httpservers test_importlib test_io test_ioctl test_itertools test_largefile test_linecache test_opcache test_pathlib test_poll test_poplib test_pyrepl test_queue test_robotparser test_sched test_smtplib test_super test_syslog test_termios test_threading_local test_time test_urllib2_localnet test_weakref test_winreg test_zstd
      • timeit: test_timeit

[x] test: cpython/Lib/test/test_builtin.py (TODO: 15)

dependencies:

dependent tests: (no tests depend on builtin)

[x] test: cpython/Lib/test/test_str.py (TODO: 6)
[x] test: cpython/Lib/test/test_fstring.py (TODO: 14)
[x] test: cpython/Lib/test/test_string_literals.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on str)

[x] test: cpython/Lib/test/test_list.py (TODO: 4)
[ ] test: cpython/Lib/test/test_listcomps.py (TODO: 4)
[ ] test: cpython/Lib/test/test_userlist.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on list)

[x] lib: cpython/Lib/symtable.py
[x] test: cpython/Lib/test/test_symtable.py (TODO: 14)

dependencies:

  • symtable

dependent tests: (2 tests)

  • symtable: test_inspect test_symtable

[ ] test: cpython/Lib/test/test_pep646_syntax.py (TODO: 2)

dependencies:

dependent tests: (no tests depend on pep646_syntax)

[x] lib: cpython/Lib/pdb.py
[ ] test: cpython/Lib/test/test_pdb.py (TODO: 47)

dependencies:

  • pdb

dependent tests: (1 tests)

  • pdb: test_pdb

[ ] test: cpython/Lib/test/test_grammar.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on grammar)

[ ] test: cpython/Lib/test/test_generators.py (TODO: 11)
[ ] test: cpython/Lib/test/test_genexps.py (TODO: 4)
[x] test: cpython/Lib/test/test_generator_stop.py
[x] test: cpython/Lib/test/test_yield_from.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on generator)

[x] lib: cpython/Lib/codeop.py
[x] test: cpython/Lib/test/test_codeop.py (TODO: 2)

dependencies:

  • codeop

dependent tests: (103 tests)

  • codeop: test_codeop
    • code:
      • pdb: test_pdb
      • sqlite3.main: test_sqlite3
    • traceback: test_asyncio test_builtin test_code_module test_contextlib test_contextlib_async test_coroutines test_dictcomps test_exceptions test_http_cookiejar test_importlib test_iter test_listcomps test_pyexpat test_setcomps test_socket test_ssl test_subprocess test_sys test_threadedtempfile test_threading test_traceback test_unittest test_with test_zipimport
      • concurrent.futures.process: test_compileall test_concurrent_futures
      • http.cookiejar: test_urllib2
      • logging: test_asyncio test_decimal test_genericalias test_hashlib test_logging test_pkgutil test_support test_unittest test_urllib2net
      • multiprocessing: test_asyncio test_concurrent_futures test_fcntl test_memoryview test_multiprocessing_main_handling test_re
      • py_compile: test_argparse test_cmd_line_script test_importlib test_modulefinder test_py_compile test_pydoc test_runpy
      • pydoc: test_enum
      • site: test_site
      • socketserver: test_imaplib test_socketserver test_wsgiref
      • threading: test_android test_asyncio test_bytes test_bz2 test_code test_concurrent_futures test_context test_ctypes test_docxmlrpc test_email test_fork1 test_frame test_ftplib test_functools test_gc test_httplib test_httpservers test_importlib test_inspect test_io test_ioctl test_itertools test_largefile test_linecache test_opcache test_pathlib test_poll test_poplib test_pyrepl test_queue test_robotparser test_sched test_signal test_smtplib test_sqlite3 test_super test_syslog test_termios test_threading_local test_time test_urllib2_localnet test_weakref test_winreg test_xmlrpc test_zstd
      • timeit: test_timeit

[x] lib: cpython/Lib/pydoc.py
[x] lib: cpython/Lib/pydoc_data
[ ] test: cpython/Lib/test/test_pydoc (TODO: 34)

dependencies:

  • pydoc

dependent tests: (5 tests)

  • pydoc: test_enum test_pydoc
    • pdb: test_pdb
    • xmlrpc.server: test_docxmlrpc test_xmlrpc

[x] test: cpython/Lib/test/test_global.py

dependencies:

dependent tests: (no tests depend on global)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone changed the title Bytecode parity Align compiler and AST behavior with CPython Jun 22, 2026
Comment thread crates/capi/src/ceval.rs Outdated
}

#[unsafe(no_mangle)]
pub unsafe extern "C" fn Py_CompileStringFlags(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_CompileStringFlags is not part of the stable ABI.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also reverted more misoperations on ceval.rs

Comment thread crates/capi/src/ceval.rs Outdated
vm.compile(code, mode, filename)
.map_err(|err| vm.new_syntax_error(&err, Some(code)))
#[unsafe(no_mangle)]
pub unsafe extern "C" fn Py_CompileStringExFlags(

This comment was marked as resolved.

@youknowone youknowone force-pushed the bytecode-parity branch 2 times, most recently from 1ecac1d to c4bd2c9 Compare June 22, 2026 09:35
@youknowone youknowone marked this pull request as ready for review June 22, 2026 09:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/_ast/repr.rs (1)

82-86: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve the comma for one-element tuple fields.

repr_ast_tuple now renders a single-item tuple as (value), which is no longer a tuple representation. AST reprs containing tuple-valued fields should keep (value,).

🐛 Proposed fix
     } else if parts.len() > 1 {
         rendered.push_wtf8(", ".as_ref());
         rendered.push_wtf8(&parts[1]);
+    } else if items.len() == 1 {
+        rendered.push_wtf8(",".as_ref());
     }
     rendered.push_wtf8(")".as_ref());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_ast/repr.rs` around lines 82 - 86, The repr_ast_tuple
function currently does not preserve the trailing comma for single-element
tuples, causing them to render as `(value)` instead of the correct `(value,)`.
Add logic to handle the case when parts.len() equals 1, where you should push
the first element from parts[0] followed by a comma before the closing
parenthesis is added. The existing condition checking parts.len() > 1 should
remain unchanged, but you need to add an else-if branch or similar handling to
address the single-element case explicitly.
🧹 Nitpick comments (3)
crates/vm/src/stdlib/_ast/operator.rs (1)

19-30: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consolidate singleton-node dispatch into one shared matcher.

These four functions duplicate the same “scan node type candidates, return enum variant, else type error” logic. A shared helper + per-enum table will keep mappings centralized and easier to audit.

♻️ Suggested refactor shape
+fn match_singleton_node<T: Copy>(
+    vm: &VirtualMachine,
+    object: &PyObjectRef,
+    expected: &str,
+    variants: &[(&'static Py<PyType>, T)],
+) -> PyResult<T> {
+    for (node_type, value) in variants {
+        if is_node_instance(vm, object, *node_type)? {
+            return Ok(*value);
+        }
+    }
+    Err(vm.new_type_error(format!(
+        "expected some sort of {expected}, but got {}",
+        object.repr(vm)?
+    )))
+}
+
     fn ast_from_object(
         vm: &VirtualMachine,
         _source_file: &SourceFile,
         object: PyObjectRef,
     ) -> PyResult<Self> {
-        Ok(
-            if is_node_instance(vm, &object, pyast::NodeOperatorAdd::static_type())? {
-                Self::Add
-            } else if ...
-        )
+        match_singleton_node(
+            vm,
+            &object,
+            "operator",
+            &[
+                (pyast::NodeOperatorAdd::static_type(), Self::Add),
+                (pyast::NodeOperatorSub::static_type(), Self::Sub),
+                // ...
+                (pyast::NodeOperatorFloorDiv::static_type(), Self::FloorDiv),
+            ],
+        )
     }

As per coding guidelines, “When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code.”

Also applies to: 60-93, 114-129, 156-183

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_ast/operator.rs` around lines 19 - 30, The boolop
dispatch logic in this function is duplicated across four functions with only
differing node type checks and enum variant returns. Extract a shared helper
function that takes a mapping or table of node type candidates paired with their
corresponding enum variants, then performs the common dispatch pattern (check
each node type and return the matching variant, or throw a type error if none
match). Replace the duplicate dispatch code in all four functions (including the
blocks at lines 60-93, 114-129, and 156-183) with calls to this shared helper,
passing only the differing node-type-to-variant mappings.

Source: Coding guidelines

crates/vm/src/stdlib/_ast/module.rs (1)

43-67: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Extract Mod node-kind detection into a shared matcher helper.

The dispatch in Line 43-Line 67 repeats the same branch shape; extracting the detected kind first will keep conversion logic single-path and reduce drift with similar AST dispatchers.

♻️ Suggested refactor shape
+enum ModKind {
+    Module,
+    Interactive,
+    Expression,
+    FunctionType,
+}
+
+fn mod_kind_from_object(vm: &VirtualMachine, object: &PyObjectRef) -> PyResult<ModKind> {
+    if object.is_instance(pyast::NodeModModule::static_type().as_object(), vm)? {
+        Ok(ModKind::Module)
+    } else if object.is_instance(pyast::NodeModInteractive::static_type().as_object(), vm)? {
+        Ok(ModKind::Interactive)
+    } else if object.is_instance(pyast::NodeModExpression::static_type().as_object(), vm)? {
+        Ok(ModKind::Expression)
+    } else if object.is_instance(pyast::NodeModFunctionType::static_type().as_object(), vm)? {
+        Ok(ModKind::FunctionType)
+    } else {
+        Err(vm.new_type_error(format!(
+            "expected some sort of mod, but got {}",
+            object.repr(vm)?
+        )))
+    }
+}
+
     fn ast_from_object(
         vm: &VirtualMachine,
         source_file: &SourceFile,
         object: PyObjectRef,
     ) -> PyResult<Self> {
-        Ok(
-            if object.is_instance(pyast::NodeModModule::static_type().as_object(), vm)? {
-                Self::Module(ModModule::ast_from_object(vm, source_file, object)?)
-            } else if object
-                .is_instance(pyast::NodeModInteractive::static_type().as_object(), vm)?
-            {
-                Self::Interactive(ModInteractive::ast_from_object(vm, source_file, object)?)
-            } else if object.is_instance(pyast::NodeModExpression::static_type().as_object(), vm)? {
-                Self::Expression(ast::ModExpression::ast_from_object(
-                    vm,
-                    source_file,
-                    object,
-                )?)
-            } else if object
-                .is_instance(pyast::NodeModFunctionType::static_type().as_object(), vm)?
-            {
-                Self::FunctionType(ModFunctionType::ast_from_object(vm, source_file, object)?)
-            } else {
-                return Err(vm.new_type_error(format!(
-                    "expected some sort of mod, but got {}",
-                    object.repr(vm)?
-                )));
-            },
-        )
+        match mod_kind_from_object(vm, &object)? {
+            ModKind::Module => Ok(Self::Module(ModModule::ast_from_object(vm, source_file, object)?)),
+            ModKind::Interactive => Ok(Self::Interactive(ModInteractive::ast_from_object(vm, source_file, object)?)),
+            ModKind::Expression => Ok(Self::Expression(ast::ModExpression::ast_from_object(vm, source_file, object)?)),
+            ModKind::FunctionType => Ok(Self::FunctionType(ModFunctionType::ast_from_object(vm, source_file, object)?)),
+        }
     }

As per coding guidelines, “When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_ast/module.rs` around lines 43 - 67, The dispatch logic
in the conversion function for Mod contains repeated branches that all follow
the same pattern: checking is_instance for different node types and then calling
ast_from_object. Extract the detection logic by first identifying which mod kind
the object is (ModModule, ModInteractive, ModExpression, or ModFunctionType)
into a separate helper or pattern match, then perform the ast_from_object
conversion call once using the detected kind rather than repeating the
conversion call in each branch. This eliminates duplicate code where the only
difference between branches is the specific node type being checked.

Source: Coding guidelines

crates/vm/src/stdlib/_ast/string.rs (1)

730-730: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Replace the decorative section separator.

This adds a // ===== ... ===== separator, which the repository guidelines explicitly disallow. Use a plain short comment or remove it.

Suggested fix
-// ===== TString (Template String) Support =====
+// Template string support.

As per coding guidelines, “Do not add decorative section separators (e.g. // -----------, // ===, /* *** */). Use /// doc-comments or short // comments only when they add value.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_ast/string.rs` at line 730, The decorative section
separator comment using the `// ===== ... =====` pattern violates repository
coding guidelines that disallow such decorative separators. Replace the
separator comment at the TString (Template String) Support section with either a
plain short comment like `// TString (Template String) Support` if the section
heading adds value, or remove it entirely if the context is clear from the
surrounding code.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/codegen/src/ir.rs`:
- Around line 3601-3608: The pattern matching for is_copy_of_load_const only
checks for Instruction::LoadConst, but line 3595 can canonicalize a LOAD_CONST
integer to LOAD_SMALL_INT before curr is captured, causing valid COPY patterns
after LOAD_SMALL_INT to not be recognized. Update the matches! condition for
is_copy_of_load_const to also handle Instruction::LoadSmallInt in addition to
Instruction::LoadConst so that patterns like LOAD_SMALL_INT followed by COPY are
properly tracked and can be fused with subsequent operations like TO_BOOL.

In `@crates/vm/src/frame.rs`:
- Around line 3198-3200: The current `seen_keys: Vec<PyObjectRef>` approach uses
linear scanning with `vm.bool_eq` which changes observable behavior by calling
`__eq__` for keys CPython would separate by hash and allows unhashable keys to
reach custom implementations. Replace this vector-based tracking with a Python
set-backed collection that respects proper hash and equality semantics. Extract
the duplicate key checking logic into a single shared function that works across
both lookup paths instead of duplicating the check logic in the branches around
lines 3217-3230 and 3247-3260, following the guideline to extract differing
values first and call common logic once.
- Line 3056: The code uses Vec::<String>::new() to track seen attributes and
calls expect_str() on user-defined __match_args__, which panics on strings with
lone surrogates. Change seen_attrs from Vec::<String> to Vec::<PyObjectRef> to
store the actual Python objects instead of converting to strings. In both the
positional attribute tracking section (around lines 3103-3116) and keyword
attribute tracking section (around lines 3159-3172), replace direct string
equality comparisons with vm.bool_eq() calls to safely compare the PyObjectRef
values. Use .repr() on PyObjectRef objects when constructing error messages
instead of trying to extract string values. This preserves the original Python
objects and avoids panicking on surrogates.

In `@crates/vm/src/stdlib/_abc.rs`:
- Around line 244-250: The collection_flags calculation only reads from
cls_type.slots.flags but ignores the abc_tpflags field, which stores virtual
ABC-derived sequence and mapping markers. This causes transitive virtual
registrations to be lost during propagation. Modify the collection_flags
assignment to include flags from both cls_type.slots.flags and
cls_type.abc_tpflags by combining them appropriately before passing to
set_abc_collection_flags_recursive, ensuring that virtual ABC registrations are
preserved when propagating collection flags.

In `@crates/vm/src/stdlib/_ast.rs`:
- Around line 337-347: The error message in the node_object_to_ast_string
function on line 345 only mentions str as an acceptable type, but the function
actually accepts both str and bytes as shown in the type check on line 342.
Update the error message passed to vm.new_type_error to accurately reflect that
both str and bytes types are accepted, so users receive an accurate description
of what types are allowed.

In `@crates/vm/src/stdlib/_ast/constant.rs`:
- Around line 1124-1144: The first_invalid_constant_type function recursively
scans nested tuples and frozensets without checking the VM's recursion limit,
which can cause a Rust stack overflow on deeply nested structures. Guard the
recursive calls to first_invalid_constant_type_opt with vm.with_recursion to
enforce the VM's recursion limit before recursing into items in both the tuple
iteration block (where items are processed from tuple.iter()) and the frozenset
iteration block (where items are processed from set.elements()). Apply the same
recursion guard pattern to any other similar recursive scans referenced around
lines 1209-1216.

In `@crates/vm/src/stdlib/_ast/pyast.rs`:
- Around line 91-109: The base AST classes generated by this macro (such as
stmt, expr, and excepthandler) are missing initialization of _field_types and
__annotations__ attributes, which are needed for CPython-compatible
introspection metadata. In the same macro branch where _fields is being set to
ctx.empty_tuple, add initialization for _field_types and __annotations__ as
empty dictionaries using appropriate ctx methods. This should be done before or
after the _fields initialization to ensure all base classes maintain proper
metadata structure.

In `@crates/vm/src/stdlib/_ast/python.rs`:
- Around line 478-480: In the branch where
ftype.fast_isinstance(vm.ctx.types.union_type) evaluates to true (handling
optional T | None fields), instead of leaving the optional field unset with just
a comment, assign a default value of None to the field. This ensures that
optional attributes always exist on AST nodes and prevents missing field errors
in constructors, __replace__ methods, and repr/validation operations that expect
the field to be present.

In `@crates/vm/src/stdlib/_ast/statement.rs`:
- Line 104: The issue is that stmt_range_from_object is being called on an
object before validating it is actually a statement type, causing location field
errors to appear before the type-check error. Move the range extraction call
from before the match statement into each individual successful match branch
(for each specific statement type being handled), and leave the final else
clause to report the "expected some sort of stmt" error without requiring
location fields to be present on invalid node types.
- Around line 29-42: In the function definition_range_from_name, replace the
incorrect keyword_len arithmetic calculation with a search-based approach.
Instead of using checked_sub with the keyword_len parameter to compute
keyword_start, use rfind to search backwards from name_start for the actual
keyword position. The current implementation with checked_sub and keyword_len is
fragile and produces correct results only by accident through the
max(line_start) clamping. You will need to pass the keyword string as a
parameter to the function so rfind can locate its actual position in the source
code.

In `@crates/vm/src/stdlib/_ast/string.rs`:
- Around line 913-940: The interpolation_to_expr function currently loses the
original location information by setting both the ast::TString range and
ast::ExprTString range to TextRange::default(). Capture the range from the
interpolation parameter before it is moved into TemplateStrPart::Interpolation,
then use this captured range to populate the range field for both the
ast::TString struct and the ast::ExprTString struct instead of using
TextRange::default().

In `@crates/vm/src/stdlib/_ast/type_parameters.rs`:
- Around line 87-118: The function currently calls type_param_range_from_object
on the object before validating that it is actually a supported type_param node
kind, which can cause it to fail while reading location fields instead of
returning the intended error message. Restructure the logic to first classify
the object by checking if it is an instance of TypeVar, ParamSpec, or
TypeVarTuple using is_node_instance, and only after confirming it is a valid
type should type_param_range_from_object be called to extract the range. If none
of the three type checks match, immediately return the type_error before
attempting to access any fields on the invalid object.

In `@crates/vm/src/stdlib/builtins.rs`:
- Around line 374-377: The constant CODEGEN_NOT_SUPPORTED is defined only under
the feature guard #[cfg(not(feature = "rustpython-compiler"))] but is being used
under #[cfg(not(feature = "compiler"))] in the code blocks at lines 374-377 and
397-400. Since "compiler" is a feature alias that depends on
"rustpython-compiler", these conditions are not equivalent and the constant may
be unavailable, causing a compilation error. Fix this by making the feature
guard consistent for the CODEGEN_NOT_SUPPORTED constant definition and its usage
locations—either move the constant definition to be guarded by #[cfg(not(feature
= "compiler"))] or change all usage sites to reference #[cfg(not(feature =
"rustpython-compiler"))] instead, ensuring the same feature flag is used
throughout.

---

Outside diff comments:
In `@crates/vm/src/stdlib/_ast/repr.rs`:
- Around line 82-86: The repr_ast_tuple function currently does not preserve the
trailing comma for single-element tuples, causing them to render as `(value)`
instead of the correct `(value,)`. Add logic to handle the case when parts.len()
equals 1, where you should push the first element from parts[0] followed by a
comma before the closing parenthesis is added. The existing condition checking
parts.len() > 1 should remain unchanged, but you need to add an else-if branch
or similar handling to address the single-element case explicitly.

---

Nitpick comments:
In `@crates/vm/src/stdlib/_ast/module.rs`:
- Around line 43-67: The dispatch logic in the conversion function for Mod
contains repeated branches that all follow the same pattern: checking
is_instance for different node types and then calling ast_from_object. Extract
the detection logic by first identifying which mod kind the object is
(ModModule, ModInteractive, ModExpression, or ModFunctionType) into a separate
helper or pattern match, then perform the ast_from_object conversion call once
using the detected kind rather than repeating the conversion call in each
branch. This eliminates duplicate code where the only difference between
branches is the specific node type being checked.

In `@crates/vm/src/stdlib/_ast/operator.rs`:
- Around line 19-30: The boolop dispatch logic in this function is duplicated
across four functions with only differing node type checks and enum variant
returns. Extract a shared helper function that takes a mapping or table of node
type candidates paired with their corresponding enum variants, then performs the
common dispatch pattern (check each node type and return the matching variant,
or throw a type error if none match). Replace the duplicate dispatch code in all
four functions (including the blocks at lines 60-93, 114-129, and 156-183) with
calls to this shared helper, passing only the differing node-type-to-variant
mappings.

In `@crates/vm/src/stdlib/_ast/string.rs`:
- Line 730: The decorative section separator comment using the `// ===== ...
=====` pattern violates repository coding guidelines that disallow such
decorative separators. Replace the separator comment at the TString (Template
String) Support section with either a plain short comment like `// TString
(Template String) Support` if the section heading adds value, or remove it
entirely if the context is clear from the surrounding code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: e18de394-1f13-4b89-b45b-7957a283e3cb

📥 Commits

Reviewing files that changed from the base of the PR and between 55b96f9 and c4bd2c9.

⛔ Files ignored due to path filters (13)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/test/test_compile.py is excluded by !Lib/**
  • Lib/test/test_grammar.py is excluded by !Lib/**
  • Lib/test/test_patma.py is excluded by !Lib/**
  • Lib/test/test_syntax.py is excluded by !Lib/**
  • Lib/test/test_type_comments.py is excluded by !Lib/**
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__bare_function_annotations_check_attribute_and_subscript_expressions.snap is excluded by !**/*.snap
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__const_no_op.snap is excluded by !**/*.snap
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__constant_true_if_pass_keeps_line_anchor_nop.snap is excluded by !**/*.snap
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__if_ands.snap is excluded by !**/*.snap
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__if_mixed.snap is excluded by !**/*.snap
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__if_ors.snap is excluded by !**/*.snap
  • crates/stdlib/src/snapshots/rustpython_stdlib___opcode__tests__nested_bool_op.snap is excluded by !**/*.snap
📒 Files selected for processing (62)
  • .cspell.json
  • .pre-commit-config.yaml
  • Cargo.toml
  • crates/capi/src/ceval.rs
  • crates/codegen/src/compile.rs
  • crates/codegen/src/error.rs
  • crates/codegen/src/ir.rs
  • crates/codegen/src/lib.rs
  • crates/codegen/src/preprocess.rs
  • crates/codegen/src/symboltable.rs
  • crates/codegen/src/unparse.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/bytecode/instruction.rs
  • crates/compiler-core/src/bytecode/oparg.rs
  • crates/compiler/src/lib.rs
  • crates/stdlib/src/_opcode.rs
  • crates/vm/Cargo.toml
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/eval.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/import.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/stdlib/_abc.rs
  • crates/vm/src/stdlib/_ast.rs
  • crates/vm/src/stdlib/_ast/argument.rs
  • crates/vm/src/stdlib/_ast/basic.rs
  • crates/vm/src/stdlib/_ast/constant.rs
  • crates/vm/src/stdlib/_ast/elif_else_clause.rs
  • crates/vm/src/stdlib/_ast/exception.rs
  • crates/vm/src/stdlib/_ast/expression.rs
  • crates/vm/src/stdlib/_ast/module.rs
  • crates/vm/src/stdlib/_ast/node.rs
  • crates/vm/src/stdlib/_ast/operator.rs
  • crates/vm/src/stdlib/_ast/other.rs
  • crates/vm/src/stdlib/_ast/parameter.rs
  • crates/vm/src/stdlib/_ast/pattern.rs
  • crates/vm/src/stdlib/_ast/pyast.rs
  • crates/vm/src/stdlib/_ast/python.rs
  • crates/vm/src/stdlib/_ast/repr.rs
  • crates/vm/src/stdlib/_ast/statement.rs
  • crates/vm/src/stdlib/_ast/string.rs
  • crates/vm/src/stdlib/_ast/type_ignore.rs
  • crates/vm/src/stdlib/_ast/type_parameters.rs
  • crates/vm/src/stdlib/_ast/validate.rs
  • crates/vm/src/stdlib/_symtable.rs
  • crates/vm/src/stdlib/builtins.rs
  • crates/vm/src/stdlib/sys.rs
  • crates/vm/src/vm/compile.rs
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/python_run.rs
  • crates/vm/src/vm/vm_new.rs
  • crates/wasm/Cargo.toml
  • crates/wasm/src/lib.rs
  • crates/wasm/src/vm_class.rs
  • examples/hello_embed.rs
  • examples/mini_repl.rs
  • examples/parse_folder.rs
  • extra_tests/snippets/builtin_compile.py
  • src/shell.rs
  • tools/opcode_metadata/generate_rs_opcode_metadata.py

Comment thread crates/codegen/src/ir.rs
Comment thread crates/vm/src/frame.rs Outdated
Comment thread crates/vm/src/frame.rs Outdated
Comment thread crates/vm/src/stdlib/_abc.rs
Comment on lines +337 to +347
pub(super) fn node_object_to_ast_string(
vm: &VirtualMachine,
obj: PyObjectRef,
) -> PyResult<PyObjectRef> {
let cls = obj.class();
if cls.is(vm.ctx.types.str_type) || cls.is(vm.ctx.types.bytes_type) {
Ok(obj)
} else {
Err(vm.new_type_error("AST string must be of type str"))
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Error message does not reflect that bytes is also accepted.

The function accepts both str and bytes types (line 342), but the error message on line 345 only mentions str. This could confuse users who encounter the error after passing a different type.

Suggested fix
     } else {
-        Err(vm.new_type_error("AST string must be of type str"))
+        Err(vm.new_type_error("AST string must be of type str or bytes"))
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub(super) fn node_object_to_ast_string(
vm: &VirtualMachine,
obj: PyObjectRef,
) -> PyResult<PyObjectRef> {
let cls = obj.class();
if cls.is(vm.ctx.types.str_type) || cls.is(vm.ctx.types.bytes_type) {
Ok(obj)
} else {
Err(vm.new_type_error("AST string must be of type str"))
}
}
pub(super) fn node_object_to_ast_string(
vm: &VirtualMachine,
obj: PyObjectRef,
) -> PyResult<PyObjectRef> {
let cls = obj.class();
if cls.is(vm.ctx.types.str_type) || cls.is(vm.ctx.types.bytes_type) {
Ok(obj)
} else {
Err(vm.new_type_error("AST string must be of type str or bytes"))
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_ast.rs` around lines 337 - 347, The error message in
the node_object_to_ast_string function on line 345 only mentions str as an
acceptable type, but the function actually accepts both str and bytes as shown
in the type check on line 342. Update the error message passed to
vm.new_type_error to accurately reflect that both str and bytes types are
accepted, so users receive an accurate description of what types are allowed.

Comment thread crates/vm/src/stdlib/_ast/statement.rs
Comment thread crates/vm/src/stdlib/_ast/statement.rs
Comment thread crates/vm/src/stdlib/_ast/string.rs
Comment thread crates/vm/src/stdlib/_ast/type_parameters.rs Outdated
Comment thread crates/vm/src/stdlib/builtins.rs Outdated
@youknowone youknowone force-pushed the bytecode-parity branch 3 times, most recently from 754a4a4 to cfc9db9 Compare June 22, 2026 13:27
@youknowone youknowone marked this pull request as draft June 22, 2026 13:38
// Invalidate negative cache
increment_invalidation_counter();

if let Some(cls_type) = cls.downcast_ref::<PyType>()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if this have CPython reference

Comment thread Cargo.toml
rustpython-wtf8 = { path = "crates/wtf8", version = "0.5.0" }
rustpython-doc = { path = "crates/doc", version = "0.5.0" }

# Use RustPython-packaged Ruff crates from the published fork while keeping

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimize diff and keep rustpython-ruff_python_parser as commented

Comment thread examples/mini_repl.rs Outdated
@@ -65,8 +65,8 @@ def fib(n):
// this line also automatically prints the output
// (note that this is only the case when compiler::Mode::Single is passed to vm.compile)
match vm
.compile(&input, vm::compiler::Mode::Single, "<embedded>")
.map_err(|err| vm.new_syntax_error(&err, Some(&input)))
.compile(&input, vm::compiler::Mode::Single, "<embedded>".to_owned())

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why redundant .to_owned() is added?

Comment thread crates/vm/src/vm/vm_new.rs Outdated
syntax_error.as_object(), self, unwrap,
"_metadata" => metadata,
);
syntax_error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep set_attrs! usage

}
}

impl<T: Node> Node for ThinVec<T> {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we added ThinVec support, does we still need Vec support?

impl Transformer for AstPreprocessor {
fn visit_stmt(&self, stmt: &mut ast::Stmt) {
match stmt {
ast::Stmt::FunctionDef(function) => {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to full-implement this match arms again?

@youknowone youknowone force-pushed the bytecode-parity branch 3 times, most recently from 3324654 to ba0113c Compare June 22, 2026 14:41
Update the compiler pipeline, AST module support, and related VM compile/eval paths toward CPython 3.14 behavior.

Use upstream Ruff parser crates directly and keep RustPython-specific syntax preflight handling in the compile path.

Refresh opcode metadata, snapshots, and targeted tests for the aligned bytecode and AST behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants