You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add CALL_ALLOC_AND_ENTER_INIT specialization
Optimizes user-defined class instantiation MyClass(args...)
when tp_new == object.__new__ and __init__ is a simple
PyFunction. Allocates the object directly and calls __init__
via invoke_exact_args, bypassing the generic type.__call__
dispatch path.
* Invalidate JIT cache when __code__ is reassigned
Change jitted_code from OnceCell to PyMutex<Option<CompiledCode>> so
it can be cleared on __code__ assignment. The setter now sets the
cached JIT code to None to prevent executing stale machine code.
* Atomic operations for specialization cache
- range iterator: deduplicate fast_next/next_fast
- Replace raw pointer reads/writes in CodeUnits with atomic
operations (AtomicU8/AtomicU16) for thread safety
- Add read_op (Acquire), read_arg (Relaxed), compare_exchange_op
- Use Release ordering in replace_op to synchronize cache writes
- Dispatch loop reads opcodes atomically via read_op/read_arg
- Fix adaptive counter access: use read/write_adaptive_counter
instead of read/write_cache_u16 (was reading wrong bytes)
- Add pre-check guards to all specialize_* functions to prevent
concurrent specialization races
- Move modified() before attribute changes in type.__setattr__
to prevent use-after-free of cached descriptors
- Use SeqCst ordering in modified() for version invalidation
- Add Release fence after quicken() initialization
* Fix slot wrapper override for inherited attributes
For __getattribute__: only use getattro_wrapper when the type
itself defines the attribute; otherwise inherit native slot from
base class via MRO.
For __setattr__/__delattr__: only store setattro_wrapper when
the type has its own __setattr__ or __delattr__; otherwise keep
the inherited base slot.
* Fix StoreAttrSlot cache overflow corrupting next instruction
write_cache_u32 at cache_base+3 writes 2 code units (positions 3 and 4),
but STORE_ATTR only has 4 cache entries (positions 0-3). This overwrites
the next instruction with the upper 16 bits of the slot offset.
Changed to write_cache_u16/read_cache_u16 since member descriptor offsets
fit within u16 (max 65535 bytes).
* Exclude method_descriptor from has_python_cmp check
has_python_cmp incorrectly treated method_descriptor as Python-level
comparison methods, causing richcompare slot to use wrapper dispatch
instead of inheriting the native slot.
* Fix CompareOpFloat NaN handling
partial_cmp returns None for NaN comparisons. is_some_and incorrectly
returned false for all NaN comparisons, but NaN != x should be true
per IEEE 754 semantics.
* Fix invoke_exact_args borrow in CallAllocAndEnterInit
* Distinguish Python method vs not-found in slot MRO lookup
Change lookup_slot_in_mro to return a 3-state SlotLookupResult
enum (NativeSlot/PythonMethod/NotFound) instead of Option<T>.
Previously, both "found a Python-level method" and "found nothing"
returned None, causing incorrect slot inheritance. For example,
class Test(Mixin, TestCase) would inherit object.slot_init from
Mixin via inherit_from_mro instead of using init_wrapper to
dispatch TestCase.__init__.
Apply this fix consistently to all slot update sites:
update_main_slot!, update_sub_slot!, TpGetattro, TpSetattro,
TpDescrSet, TpHash, TpRichcompare, SqAssItem, MpAssSubscript.
* Extract specialization helper functions to reduce boilerplate
- deoptimize() / deoptimize_at(): replace specialized op with base op
- adaptive(): decrement warmup counter or call specialize function
- commit_specialization(): replace op on success, backoff on failure
- execute_binary_op_int() / execute_binary_op_float(): typed binary ops
Removes 10 duplicate deoptimize_* functions, consolidates 13 adaptive
counter blocks, 6 binary op handlers, and 7 specialize tail patterns.
Also replaces inline deopt blocks in LoadAttr/StoreAttr handlers.
* Improve specialization guards and fix mark_stacks
- CONTAINS_OP_SET: add frozenset support in handler and specialize
- TO_BOOL_ALWAYS_TRUE: cache type version instead of checking slots
- LOAD_GLOBAL_BUILTIN: cache builtins dict version alongside globals
- mark_stacks: deoptimize specialized opcodes for correct reachability
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
0 commit comments