Skip to content

Python 3.14 free-threaded support#2721

Open
greateggsgreg wants to merge 15 commits into
pythonnet:masterfrom
greateggsgreg:freethreading-prep
Open

Python 3.14 free-threaded support#2721
greateggsgreg wants to merge 15 commits into
pythonnet:masterfrom
greateggsgreg:freethreading-prep

Conversation

@greateggsgreg
Copy link
Copy Markdown

@greateggsgreg greateggsgreg commented May 9, 2026

First step toward Python 3.14 free-threaded (Py_GIL_DISABLED) support, tracked under #2720 (and the umbrella #2610).

After this PR, pythonnet runs the full pytest suite on python3.14t against both .NET 8 and .NET 10 on Linux. Existing GIL builds (3.11 - 3.14) are unaffected.

What changes

Free-threaded PyObject layout + refcount

FT PyObject is 16 bytes longer (ob_tid + ob_flags + ob_mutex + ob_gc_bits + ob_ref_local + ob_ref_shared replaces ob_refcnt) and the refcount is split, so there is no single offset to read.

  • ABI.Initialize detects the build via sys._is_gil_enabled() and sets ObjectHeadOffset = 16 on FT, shifting the generated TypeOffset* values to absolute PyHeapTypeObject offsets.
  • Runtime.Refcount prefers a P/Invoke into Py_REFCNT (a real symbol on CPython 3.14+, required on FT). Older Pythons fall back to the existing direct read.

Atomic type creation in ReflectedClrType.GetOrCreate / TypeManager.GetType

The classic check-then-act on ClassManager.cache and TypeManager.cache raced two ways under FT: duplicate cache.Add (ArgumentException), and partial-type visibility from outside-the-lock readers observing a half-initialised type during recursive class init. Two-cache design:

  • cache — only fully-initialised types; outside-the-lock fast path is safe.
  • _inProgressCache — partial types visible only to the lock-holding builder, so self-referential definitions resolve via the reentrant lock.

Serialisation snapshots remain Dictionary<,> on the wire for binary compatibility.

Atomic GCHandle ownership in tp_clear / tp_dealloc

subtype_clear (main thread) and the .NET finalizer thread can both reach a CLR-bound Python object's GCHandle slot. Previous read-then-zero in ManagedType.TryFreeGCHandle and ClassDerivedObject.tp_dealloc (strong→weak swap) let two threads observe the same handle and double-free it. Both paths now use Interlocked.Exchange to atomically claim the slot.

Reflection.Emit is not thread-safe

Concurrent TypeBuilder / ModuleBuilder operations corrupt the IL stream or throw "Duplicate type name within an assembly". Two emit-and-bake sequences are now serialised:

  • ClassDerivedObject.CreateDerivedType (Python subclasses of CLR types) on _buildersLock.
  • DelegateManager.GetDispatcher (Python callables → CLR delegates) on a new _emitLock.

Concurrent collections / atomics on shared state

Plain Dictionary<,>, HashSet<>, and counters that the GIL implicitly serialised tear under FT.

  • ConcurrentDictionary<,>: ExtensionType.loadedExtensions, CLRObject.reflectedObjects, ClassManager.cache, TypeManager.cache, ModuleObject.cache / allNames, Interop.delegateTypes, ClassBase.ClearVisited, InternString._string2interns / _intern2strings.
  • Interlocked + Volatile: Finalizer._throttled (counter), Runtime.run (epoch), PyBuffer.disposedValue (dispose flag — finalizer + explicit Dispose no longer double-free _view).
  • Lock-protected (where nested mutation or list-with-equality semantics rule out lock-free containers): GenericUtil.mapping (nested Dictionary/List), PythonEngine.ShutdownHandlers, Runtime._pyRefs.
  • volatile bool: Runtime._isInitialized / _typesInitialized, PythonEngine.initialized, Finalizer.started.

Guard the .NET-finalizer / Python-shutdown interaction

The .NET finalizer thread can dispatch Py_DecRef concurrently with Py_Finalize; a stale ob_ref_local read after teardown crashes the process. Runtime._Py_IsFinalizing() guards on Finalizer.ThrottledCollect, PyObject's finalizer, and the Refcount > 0 Debug.Assert in XDecref / AddFinalizedObject (kept on GIL, skipped on FT).

Tests + CI

New tests in tests/test_thread.py cover the refcount/ABI invariants, the ConcurrentDictionary caches, atomic GCHandle ownership, the two Reflection.Emit locks, and the ShutdownHandlers lock. Some tests are marked @freethreaded_only because the GIL-build code path triggers a separate pre-existing pythonnet crash under high-contention CLR allocation that's reproducible on master and out of scope for this PR.

.github/workflows/main.yml adds 3.14t to the Linux/macOS matrix and skips Mono on 3.14t (clr-loader's mono backend isn't yet validated against free-threaded Python).

Verification

Linux aarch64, full pytest suite (tests/, --ignore=tests/domain_tests):

.NET 8 .NET 10
3.11 - 3.14 473 passed, 9 skipped 473 passed, 9 skipped
3.14t 477 passed, 5 skipped 477 passed, 5 skipped

(9 skipped on GIL = 5 pre-existing + 4 FT-only tests; 5 skipped on FT = the 5 pre-existing skips.)

3.13t is not in the matrix because cffi <2.0 does not support free-threaded 3.13 and clr-loader requires cffi.

Two narrow fixes to remove obvious data races that already exist on the
GIL build and become hot paths under Py_GIL_DISABLED:

- InternString: replace plain Dictionary<> with ConcurrentDictionary<>
  for both _string2interns and _intern2strings. These are written from
  startup but read from every attribute-lookup hot path, so any
  concurrent shutdown/reinit could tear them.
- ClassDerived.GetModuleBuilder: add a lock around the
  check-then-create on assemblyBuilders/moduleBuilders. The previous
  ContainsKey-then-DefineDynamicAssembly pattern had a TOCTOU race that
  could produce duplicate builders. Reset() also now locks for a clean
  reinitialisation.

These are not sufficient for full free-threading support, but they
remove low-hanging concurrency hazards.
Free-threaded CPython (Py_GIL_DISABLED) changes the PyObject layout in
two pythonnet-relevant ways:

- The header is 16 bytes larger (ob_tid + flags + mutex + gc_bits +
  ob_ref_local + ob_ref_shared replace the single ob_refcnt).
- The refcount is no longer a single field; reads must go through the
  Py_REFCNT API.

Detect the build at runtime via sys._is_gil_enabled() in ABI.Initialize
and:

- Set ObjectHeadOffset to 16 on free-threaded builds so the generated
  TypeOffset values still resolve to absolute PyHeapTypeObject offsets.
- Skip the ob_refcnt probe (it scans for an IntPtr value of 1 which
  cannot be located reliably under the FT layout).

Add a Py_REFCNT P/Invoke (try-loaded; only exported as a function on
CPython 3.14+) and prefer it in Runtime.Refcount, falling back to the
existing direct read on older Pythons that only expose Py_REFCNT as a
macro.
ExtensionType.loadedExtensions and CLRObject.reflectedObjects are
"borrowed reference" registries written on every alloc and read or
removed from finalizer-thread paths.  Under free-threaded Python the
plain HashSet<IntPtr> tears reliably; under the GIL the same tears
were happening more rarely but still mostly observable as Debug.Assert
firings during shutdown.

Convert both to ConcurrentDictionary<IntPtr, byte> with the equivalent
TryAdd/TryRemove operations, and update the few non-mutating callers
(NullGCHandles, RuntimeData snapshot LINQ) to enumerate Keys.
…tType

Both type-creation paths had a classic check-then-act race:

  if (!cache.TryGetValue(t, out var pyType))
  {
      pyType = AllocateClass(t);
      cache.Add(t, pyType);   // throws under contention, partial type otherwise
      InitializeClass(...);
  }

Two threads racing past the TryGetValue could both call AllocateClass
and one would throw on Dictionary.Add ("duplicate key").  Worse, the
cache add happens *before* InitializeClass populates members so a
third thread's outside-the-lock fast path could observe a partially-
initialised type and fail with AttributeError on members not yet added
(reproducible under free-threaded Python with concurrent attribute
access on built-in CLR types).

Convert ClassManager.cache and TypeManager.cache to ConcurrentDictionary
and serialise the multi-step initialisation behind a lock.

ReflectedClrType.GetOrCreate uses a two-cache design:

- `cache`            - only fully-initialised types; safe to read on the
                       outside-the-lock fast path.
- `_inProgressCache` - partial types being built inside the lock; visible
                       only to the building thread, so self-referential
                       class definitions (which recurse into GetOrCreate
                       for the same type chain) still resolve.

Cross-thread access cannot reach the in-progress cache because acquiring
the lock is required, so other threads always see fully-ready types.

The serialisation snapshot copies remain Dictionary<,> on the wire for
binary compatibility.
tests/test_thread.py:

- test_runtime_refcount_matches_sys_getrefcount and
  test_is_gil_enabled_attribute_present_on_3_13_plus assert the basic
  invariants behind ABI.DetectFreeThreaded and Runtime.Refcount.
- test_concurrent_clr_method_calls and test_concurrent_attribute_access
  exercise the CLR call site cache and the ConcurrentDictionary intern
  path under contention.  Both run on every interpreter; on GIL builds
  they degenerate to mostly-serial smoke checks.
- test_concurrent_clr_object_creation,
  test_concurrent_python_subclass_of_clr_type and
  test_freethreaded_concurrent_attribute_access_no_tear are FT-only
  because the GIL-build code path triggers a pre-existing pythonnet
  crash under high-contention CLR allocation that is reproducible on
  master and out of scope for this branch.

.github/workflows/main.yml:

- Add 3.14t to the python matrix on Linux and macOS (Windows FT
  support is not yet plumbed through pythonnet's native build chain).
- Skip the Mono runtime steps on 3.14t — clr-loader's mono backend is
  not yet validated for free-threaded Python.
Two related sets of races that the GIL hid but free-threaded Python
exposes reliably.

ClassDerivedObject.tp_dealloc and ManagedType.TryFreeGCHandle both
read the GCHandle slot, then mutated it.  Under FT, subtype_clear (on
the main thread) and the .NET finalizer thread can race for the same
slot; a non-atomic read-then-zero lets both threads observe the same
handle and double-free it.  Both paths now use Interlocked.Exchange
to atomically claim ownership of the slot - only the thread that
observes a non-zero handle frees it.  ClassDerived's strong-to-weak
swap follows the same pattern.

CreateDerivedType emits IL via Reflection.Emit, whose ModuleBuilder
and TypeBuilder operations are documented as not thread-safe.
Concurrent dynamic subclass creation under FT corrupts the IL stream
and segfaults.  Serialise the entire emit-and-bake sequence on the
existing _buildersLock (the lock that already guarded the assembly /
module builder cache).

The .NET finalizer thread can dispatch Py_DecRef calls concurrently
with Py_Finalize, and a stale read of ob_ref_local after teardown
crashes the process.  Three guards:

- Finalizer.ThrottledCollect and PyObject's finalizer short-circuit
  when Runtime._Py_IsFinalizing(); PyObject drops the raw pointer
  instead of enqueueing a decref so process exit reclaims the memory.
- Finalizer.AddFinalizedObject's Refcount > 0 Debug.Assert is kept on
  GIL builds and skipped on FT; a stale ob_ref_local read from the
  finalizer thread can crash the process even when the assertion
  would succeed under the GIL.
- Runtime.XDecref's matching Refcount > 0 Debug.Assert gets the same
  FT-only skip for the same reason.
The atomic-type-creation commit (2f08c98) covered the two highest-
contention caches.  Wider audit found more plain Dictionary / HashSet
collections on hot paths that tear under free-threaded Python and
also fire Debug.Assert on GIL builds at sufficient contention.
Convert them to ConcurrentDictionary:

- ModuleObject.cache and ModuleObject.allNames - hit on every
  Module.Attr access; the old HashSet.Add for allNames could miss
  add-once semantics under contention and the Dictionary.Remove on
  teardown could observe a torn map.
- Interop.delegateTypes - racing past TryGetValue with the old plain
  Dictionary threw on Add ("duplicate key") instead of silently
  picking one winner, which became reproducible on high concurrency.
- ClassBase.ClearVisited - re-entrancy guard for tp_clear, hit from
  both the main thread and the .NET finalizer thread; the plain
  HashSet tore reliably under FT.

Also tidy the existing ClassManager / TypeManager / ExtensionType
ConcurrentDictionary references via a using-directive instead of
fully qualifying System.Collections.Concurrent at every site.
test_python_thread_calls_to_clr left its workers detached and visible
under free-threaded Python as background threading.excepthook noise.
Collect the threads up front and join them at the end so they cannot
outlive the test.

Also tighten the docstring for test_concurrent_python_subclass_of_clr_type
to spell out why it is FT-only (the GIL build hits a separate pre-
existing CLR-object lifecycle crash under high contention, also
reproducible on master).
@greateggsgreg greateggsgreg changed the title Initial Python 3.14 free-threaded support Python 3.14 free-threaded support May 10, 2026
Two tests for the thread-safe-collections work in 0a9d482:

- test_module_dunder_all_added_once asserts ModuleObject.allNames keeps
  add-once semantics; a torn HashSet would surface duplicates in
  __all__ on free-threaded builds.
- test_concurrent_module_attribute_access exercises ModuleObject.cache
  with concurrent getattr on a CLR namespace.  The old plain Dictionary
  threw on Add ("duplicate key") under simultaneous misses; the
  ConcurrentDictionary version absorbs the race.

Both run on every interpreter (no @freethreaded_only) — they degenerate
to single-threaded smoke checks under the GIL while the FT build
actually exercises contention.
Audit found several more shared-state hazards beyond the registries
already covered.  The following are reachable on every interpreter
under sufficient contention; FT exposes them reliably.

src/runtime/DelegateManager.cs
  Lock cache lookup + Reflection.Emit.  TypeBuilder/ModuleBuilder are
  not thread-safe; concurrent Python->CLR delegate construction (e.g.
  PythonEngine.ShutdownHandler(lambda: None) from multiple threads)
  threw "Duplicate type name within an assembly" on 3.14t.  Same shape
  as the CreateDerivedType fix from a18e872.

src/runtime/Finalizer.cs
  - `_throttled` becomes Interlocked.Increment / Interlocked.Exchange.
    Lost increments would either grow the queue unbounded or burn CPU
    on unnecessary drains.
  - `started` is now volatile so the ThrottledCollect check on every
    PyObject ctor cannot observe a stale "false" after Initialize.

src/runtime/PythonTypes/PyBuffer.cs
  `disposedValue` is now an int gated by Interlocked.Exchange (write)
  and Volatile.Read (hot reads).  The .NET finalizer racing with an
  explicit Dispose() could otherwise both pass the `if (!disposedValue)`
  check and call PyBuffer_Release twice -> double-free of _view.obj.
  Repeated check at every public method extracted to ThrowIfDisposed().

src/runtime/Util/GenericUtil.cs
  `mapping` (nested Dictionary<string, Dictionary<string, List<string>>>)
  guarded by a lock; nested mutations cannot be expressed atomically
  with ConcurrentDictionary alone.  GenericByName snapshots candidate
  names under the lock then calls AssemblyManager.LookupTypes outside
  it, since LookupTypes can re-enter Register.

src/runtime/PythonEngine.cs
  - `ShutdownHandlers` (List<>) wrapped in a lock.  ConcurrentStack
    would change semantics (no remove-by-equality).  ExecuteShutdownHandlers
    pops under the lock and invokes unlocked so handlers can re-enter
    Add/Remove without deadlock.
  - `initialized` flag is now volatile (read from worker threads,
    written from Initialize/Shutdown).

src/runtime/Runtime.cs
  - `run` epoch now Interlocked.Increment + Volatile.Read; lost
    increments across re-init would let stale finalizer queue entries
    slip past the RuntimeRun guard.
  - `_pyRefs` mutations wrapped in a lock; ResetPyMembers snapshots
    then disposes outside the lock so a Dispose callback cannot
    reenter and deadlock.
  - `_isInitialized`, `_typesInitialized` now volatile.

tests/test_thread.py
  - test_concurrent_delegate_creation (FT-only): reproduces the
    DelegateManager Reflection.Emit race - aborts with the lock
    removed, passes with it.
  - test_concurrent_shutdown_handler_register (FT-only): drives
    AddShutdownHandler/RemoveShutdownHandler from 8 threads on
    pre-built handlers.
  - Removed test_freethreaded_concurrent_attribute_access_no_tear;
    its workload duplicates test_concurrent_attribute_access at a
    different intensity without exercising additional code paths.

Comment cleanup
  Trimmed multi-line thread-safety comments across the branch's
  earlier commits to single lines that capture only the non-obvious
  "why" (Concurrent: / Lock: / volatile: / Atomic claim:).  Removed
  comments where the type signature already documents the choice.
Comment-only changes. Adds inline notes at lock acquisitions where the
"why" is not obvious from the field declaration alone:

- ClassDerivedObject.Reset / GetModuleBuilder: explain that both
  builder caches must update atomically and that DefineDynamicAssembly
  / DefineDynamicModule produce duplicates under contention.
- TypeManager.GetType: note that CreateType + cache write must be
  atomic.
- ReflectedClrType.GetOrCreate: cross-file lock; mention it also
  serialises ClassManager.cache and TypeManager._slotsHolders writes.
- Runtime.ResetPyMembers: explain the snapshot-then-dispose pattern
  (Dispose() callbacks would deadlock if invoked under the lock).

Expands the strong->weak GCHandle swap in ClassDerivedObject.tp_dealloc
to spell out:

1. Why the PyObject is not freed at refcount 0 (C# wrapper may still
   reference it; ToPython() resurrects via _Py_NewReference).
2. Why the handle is demoted to weak (lets the C# wrapper be GC'd; on
   collection PyFinalize enqueues the real PyObject_GC_Del).
3. Why the swap uses Interlocked.Exchange (tp_clear may race on the
   same slot under FT / finalizer thread; without atomic claim both
   threads could observe and double-free the same handle).
Switching the underlying dictionaries to ConcurrentDictionary
(0a9d482) replaced Add() with TryAdd().  Add() threw on duplicate
keys, which served as a debug-time check that SetIntern is only
called once per builtin name (the invariant Initialize relies on
via its leading `Debug.Assert(_string2interns.Count == 0)`).

TryAdd silently masks that case.  Capture its bool result and
Debug.Assert it - restores the same correctness signal, with no
release-build cost.
Two FT-only tests for code paths identified by reviewing pythonnet's
downstream consumers (QuantConnect/Lean, Rhino.Inside, Speckle) and
the historical issue tracker:

test_concurrent_clr_delegate_invocation_from_python
  Python callables wrapped as distinct CLR delegate types
  (PublicDelegate, StringDelegate, BoolDelegate) and invoked
  concurrently from worker threads.  Canonical embedder pattern
  for callbacks/event handlers; hits DelegateManager.GetDispatcher
  (the Reflection.Emit lock added in 92072bd) and the GIL
  re-acquisition path in Dispatcher.Dispatch.

test_concurrent_generic_type_binding
  36 distinct Dictionary[K,V] / List[K] type-arg pairs resolved
  concurrently from N threads.  Targets the open-issue family
  pythonnet#2269 (ClassManager hash collision crash),
  pythonnet#1407 (ClassManager perf regression with MaybeType keys), and
  pythonnet#821 (generic resolution race).  Exercises ClassManager.cache,
  TypeManager.cache, GenericUtil.mapping, and the generic-binding
  fast path simultaneously.

Both are @freethreaded_only because the cumulative pytest state
under GIL builds trips the same pre-existing CPython 3.11/3.12/3.13
GC crash that gates the other high-contention tests in this file.
greateggsgreg and others added 2 commits May 11, 2026 21:59
PythonEnvironment.FindLibPythonInHome built a single candidate name
from version.Major.Minor (e.g. libpython3.14.so) and missed the
free-threaded variant (libpython3.14t.so / python314t.dll). pyvenv.cfg's
version field doesn't distinguish the two builds, so probe both names
and let File.Exists pick the one that's actually on disk.

Unblocks the 3.14t CI jobs added in pythonnet#2721: they were failing in
PythonEngine.Initialize with "Py_IncRef: undefined symbol" because
PythonDLL resolved to null and pythonnet fell back to dlopen of the
dotnet binary itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant