Python 3.14 free-threaded support#2721
Open
greateggsgreg wants to merge 15 commits into
Open
Conversation
Two narrow fixes to remove obvious data races that already exist on the GIL build and become hot paths under Py_GIL_DISABLED: - InternString: replace plain Dictionary<> with ConcurrentDictionary<> for both _string2interns and _intern2strings. These are written from startup but read from every attribute-lookup hot path, so any concurrent shutdown/reinit could tear them. - ClassDerived.GetModuleBuilder: add a lock around the check-then-create on assemblyBuilders/moduleBuilders. The previous ContainsKey-then-DefineDynamicAssembly pattern had a TOCTOU race that could produce duplicate builders. Reset() also now locks for a clean reinitialisation. These are not sufficient for full free-threading support, but they remove low-hanging concurrency hazards.
Free-threaded CPython (Py_GIL_DISABLED) changes the PyObject layout in two pythonnet-relevant ways: - The header is 16 bytes larger (ob_tid + flags + mutex + gc_bits + ob_ref_local + ob_ref_shared replace the single ob_refcnt). - The refcount is no longer a single field; reads must go through the Py_REFCNT API. Detect the build at runtime via sys._is_gil_enabled() in ABI.Initialize and: - Set ObjectHeadOffset to 16 on free-threaded builds so the generated TypeOffset values still resolve to absolute PyHeapTypeObject offsets. - Skip the ob_refcnt probe (it scans for an IntPtr value of 1 which cannot be located reliably under the FT layout). Add a Py_REFCNT P/Invoke (try-loaded; only exported as a function on CPython 3.14+) and prefer it in Runtime.Refcount, falling back to the existing direct read on older Pythons that only expose Py_REFCNT as a macro.
ExtensionType.loadedExtensions and CLRObject.reflectedObjects are "borrowed reference" registries written on every alloc and read or removed from finalizer-thread paths. Under free-threaded Python the plain HashSet<IntPtr> tears reliably; under the GIL the same tears were happening more rarely but still mostly observable as Debug.Assert firings during shutdown. Convert both to ConcurrentDictionary<IntPtr, byte> with the equivalent TryAdd/TryRemove operations, and update the few non-mutating callers (NullGCHandles, RuntimeData snapshot LINQ) to enumerate Keys.
…tType
Both type-creation paths had a classic check-then-act race:
if (!cache.TryGetValue(t, out var pyType))
{
pyType = AllocateClass(t);
cache.Add(t, pyType); // throws under contention, partial type otherwise
InitializeClass(...);
}
Two threads racing past the TryGetValue could both call AllocateClass
and one would throw on Dictionary.Add ("duplicate key"). Worse, the
cache add happens *before* InitializeClass populates members so a
third thread's outside-the-lock fast path could observe a partially-
initialised type and fail with AttributeError on members not yet added
(reproducible under free-threaded Python with concurrent attribute
access on built-in CLR types).
Convert ClassManager.cache and TypeManager.cache to ConcurrentDictionary
and serialise the multi-step initialisation behind a lock.
ReflectedClrType.GetOrCreate uses a two-cache design:
- `cache` - only fully-initialised types; safe to read on the
outside-the-lock fast path.
- `_inProgressCache` - partial types being built inside the lock; visible
only to the building thread, so self-referential
class definitions (which recurse into GetOrCreate
for the same type chain) still resolve.
Cross-thread access cannot reach the in-progress cache because acquiring
the lock is required, so other threads always see fully-ready types.
The serialisation snapshot copies remain Dictionary<,> on the wire for
binary compatibility.
tests/test_thread.py: - test_runtime_refcount_matches_sys_getrefcount and test_is_gil_enabled_attribute_present_on_3_13_plus assert the basic invariants behind ABI.DetectFreeThreaded and Runtime.Refcount. - test_concurrent_clr_method_calls and test_concurrent_attribute_access exercise the CLR call site cache and the ConcurrentDictionary intern path under contention. Both run on every interpreter; on GIL builds they degenerate to mostly-serial smoke checks. - test_concurrent_clr_object_creation, test_concurrent_python_subclass_of_clr_type and test_freethreaded_concurrent_attribute_access_no_tear are FT-only because the GIL-build code path triggers a pre-existing pythonnet crash under high-contention CLR allocation that is reproducible on master and out of scope for this branch. .github/workflows/main.yml: - Add 3.14t to the python matrix on Linux and macOS (Windows FT support is not yet plumbed through pythonnet's native build chain). - Skip the Mono runtime steps on 3.14t — clr-loader's mono backend is not yet validated for free-threaded Python.
Two related sets of races that the GIL hid but free-threaded Python exposes reliably. ClassDerivedObject.tp_dealloc and ManagedType.TryFreeGCHandle both read the GCHandle slot, then mutated it. Under FT, subtype_clear (on the main thread) and the .NET finalizer thread can race for the same slot; a non-atomic read-then-zero lets both threads observe the same handle and double-free it. Both paths now use Interlocked.Exchange to atomically claim ownership of the slot - only the thread that observes a non-zero handle frees it. ClassDerived's strong-to-weak swap follows the same pattern. CreateDerivedType emits IL via Reflection.Emit, whose ModuleBuilder and TypeBuilder operations are documented as not thread-safe. Concurrent dynamic subclass creation under FT corrupts the IL stream and segfaults. Serialise the entire emit-and-bake sequence on the existing _buildersLock (the lock that already guarded the assembly / module builder cache). The .NET finalizer thread can dispatch Py_DecRef calls concurrently with Py_Finalize, and a stale read of ob_ref_local after teardown crashes the process. Three guards: - Finalizer.ThrottledCollect and PyObject's finalizer short-circuit when Runtime._Py_IsFinalizing(); PyObject drops the raw pointer instead of enqueueing a decref so process exit reclaims the memory. - Finalizer.AddFinalizedObject's Refcount > 0 Debug.Assert is kept on GIL builds and skipped on FT; a stale ob_ref_local read from the finalizer thread can crash the process even when the assertion would succeed under the GIL. - Runtime.XDecref's matching Refcount > 0 Debug.Assert gets the same FT-only skip for the same reason.
The atomic-type-creation commit (2f08c98) covered the two highest- contention caches. Wider audit found more plain Dictionary / HashSet collections on hot paths that tear under free-threaded Python and also fire Debug.Assert on GIL builds at sufficient contention. Convert them to ConcurrentDictionary: - ModuleObject.cache and ModuleObject.allNames - hit on every Module.Attr access; the old HashSet.Add for allNames could miss add-once semantics under contention and the Dictionary.Remove on teardown could observe a torn map. - Interop.delegateTypes - racing past TryGetValue with the old plain Dictionary threw on Add ("duplicate key") instead of silently picking one winner, which became reproducible on high concurrency. - ClassBase.ClearVisited - re-entrancy guard for tp_clear, hit from both the main thread and the .NET finalizer thread; the plain HashSet tore reliably under FT. Also tidy the existing ClassManager / TypeManager / ExtensionType ConcurrentDictionary references via a using-directive instead of fully qualifying System.Collections.Concurrent at every site.
test_python_thread_calls_to_clr left its workers detached and visible under free-threaded Python as background threading.excepthook noise. Collect the threads up front and join them at the end so they cannot outlive the test. Also tighten the docstring for test_concurrent_python_subclass_of_clr_type to spell out why it is FT-only (the GIL build hits a separate pre- existing CLR-object lifecycle crash under high contention, also reproducible on master).
Two tests for the thread-safe-collections work in 0a9d482: - test_module_dunder_all_added_once asserts ModuleObject.allNames keeps add-once semantics; a torn HashSet would surface duplicates in __all__ on free-threaded builds. - test_concurrent_module_attribute_access exercises ModuleObject.cache with concurrent getattr on a CLR namespace. The old plain Dictionary threw on Add ("duplicate key") under simultaneous misses; the ConcurrentDictionary version absorbs the race. Both run on every interpreter (no @freethreaded_only) — they degenerate to single-threaded smoke checks under the GIL while the FT build actually exercises contention.
Audit found several more shared-state hazards beyond the registries already covered. The following are reachable on every interpreter under sufficient contention; FT exposes them reliably. src/runtime/DelegateManager.cs Lock cache lookup + Reflection.Emit. TypeBuilder/ModuleBuilder are not thread-safe; concurrent Python->CLR delegate construction (e.g. PythonEngine.ShutdownHandler(lambda: None) from multiple threads) threw "Duplicate type name within an assembly" on 3.14t. Same shape as the CreateDerivedType fix from a18e872. src/runtime/Finalizer.cs - `_throttled` becomes Interlocked.Increment / Interlocked.Exchange. Lost increments would either grow the queue unbounded or burn CPU on unnecessary drains. - `started` is now volatile so the ThrottledCollect check on every PyObject ctor cannot observe a stale "false" after Initialize. src/runtime/PythonTypes/PyBuffer.cs `disposedValue` is now an int gated by Interlocked.Exchange (write) and Volatile.Read (hot reads). The .NET finalizer racing with an explicit Dispose() could otherwise both pass the `if (!disposedValue)` check and call PyBuffer_Release twice -> double-free of _view.obj. Repeated check at every public method extracted to ThrowIfDisposed(). src/runtime/Util/GenericUtil.cs `mapping` (nested Dictionary<string, Dictionary<string, List<string>>>) guarded by a lock; nested mutations cannot be expressed atomically with ConcurrentDictionary alone. GenericByName snapshots candidate names under the lock then calls AssemblyManager.LookupTypes outside it, since LookupTypes can re-enter Register. src/runtime/PythonEngine.cs - `ShutdownHandlers` (List<>) wrapped in a lock. ConcurrentStack would change semantics (no remove-by-equality). ExecuteShutdownHandlers pops under the lock and invokes unlocked so handlers can re-enter Add/Remove without deadlock. - `initialized` flag is now volatile (read from worker threads, written from Initialize/Shutdown). src/runtime/Runtime.cs - `run` epoch now Interlocked.Increment + Volatile.Read; lost increments across re-init would let stale finalizer queue entries slip past the RuntimeRun guard. - `_pyRefs` mutations wrapped in a lock; ResetPyMembers snapshots then disposes outside the lock so a Dispose callback cannot reenter and deadlock. - `_isInitialized`, `_typesInitialized` now volatile. tests/test_thread.py - test_concurrent_delegate_creation (FT-only): reproduces the DelegateManager Reflection.Emit race - aborts with the lock removed, passes with it. - test_concurrent_shutdown_handler_register (FT-only): drives AddShutdownHandler/RemoveShutdownHandler from 8 threads on pre-built handlers. - Removed test_freethreaded_concurrent_attribute_access_no_tear; its workload duplicates test_concurrent_attribute_access at a different intensity without exercising additional code paths. Comment cleanup Trimmed multi-line thread-safety comments across the branch's earlier commits to single lines that capture only the non-obvious "why" (Concurrent: / Lock: / volatile: / Atomic claim:). Removed comments where the type signature already documents the choice.
Comment-only changes. Adds inline notes at lock acquisitions where the "why" is not obvious from the field declaration alone: - ClassDerivedObject.Reset / GetModuleBuilder: explain that both builder caches must update atomically and that DefineDynamicAssembly / DefineDynamicModule produce duplicates under contention. - TypeManager.GetType: note that CreateType + cache write must be atomic. - ReflectedClrType.GetOrCreate: cross-file lock; mention it also serialises ClassManager.cache and TypeManager._slotsHolders writes. - Runtime.ResetPyMembers: explain the snapshot-then-dispose pattern (Dispose() callbacks would deadlock if invoked under the lock). Expands the strong->weak GCHandle swap in ClassDerivedObject.tp_dealloc to spell out: 1. Why the PyObject is not freed at refcount 0 (C# wrapper may still reference it; ToPython() resurrects via _Py_NewReference). 2. Why the handle is demoted to weak (lets the C# wrapper be GC'd; on collection PyFinalize enqueues the real PyObject_GC_Del). 3. Why the swap uses Interlocked.Exchange (tp_clear may race on the same slot under FT / finalizer thread; without atomic claim both threads could observe and double-free the same handle).
Switching the underlying dictionaries to ConcurrentDictionary (0a9d482) replaced Add() with TryAdd(). Add() threw on duplicate keys, which served as a debug-time check that SetIntern is only called once per builtin name (the invariant Initialize relies on via its leading `Debug.Assert(_string2interns.Count == 0)`). TryAdd silently masks that case. Capture its bool result and Debug.Assert it - restores the same correctness signal, with no release-build cost.
Two FT-only tests for code paths identified by reviewing pythonnet's downstream consumers (QuantConnect/Lean, Rhino.Inside, Speckle) and the historical issue tracker: test_concurrent_clr_delegate_invocation_from_python Python callables wrapped as distinct CLR delegate types (PublicDelegate, StringDelegate, BoolDelegate) and invoked concurrently from worker threads. Canonical embedder pattern for callbacks/event handlers; hits DelegateManager.GetDispatcher (the Reflection.Emit lock added in 92072bd) and the GIL re-acquisition path in Dispatcher.Dispatch. test_concurrent_generic_type_binding 36 distinct Dictionary[K,V] / List[K] type-arg pairs resolved concurrently from N threads. Targets the open-issue family pythonnet#2269 (ClassManager hash collision crash), pythonnet#1407 (ClassManager perf regression with MaybeType keys), and pythonnet#821 (generic resolution race). Exercises ClassManager.cache, TypeManager.cache, GenericUtil.mapping, and the generic-binding fast path simultaneously. Both are @freethreaded_only because the cumulative pytest state under GIL builds trips the same pre-existing CPython 3.11/3.12/3.13 GC crash that gates the other high-contention tests in this file.
7f2934b to
275a9f2
Compare
PythonEnvironment.FindLibPythonInHome built a single candidate name from version.Major.Minor (e.g. libpython3.14.so) and missed the free-threaded variant (libpython3.14t.so / python314t.dll). pyvenv.cfg's version field doesn't distinguish the two builds, so probe both names and let File.Exists pick the one that's actually on disk. Unblocks the 3.14t CI jobs added in pythonnet#2721: they were failing in PythonEngine.Initialize with "Py_IncRef: undefined symbol" because PythonDLL resolved to null and pythonnet fell back to dlopen of the dotnet binary itself.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First step toward Python 3.14 free-threaded (
Py_GIL_DISABLED) support, tracked under #2720 (and the umbrella #2610).After this PR, pythonnet runs the full pytest suite on
python3.14tagainst both .NET 8 and .NET 10 on Linux. Existing GIL builds (3.11 - 3.14) are unaffected.What changes
Free-threaded
PyObjectlayout + refcountFT
PyObjectis 16 bytes longer (ob_tid+ob_flags+ob_mutex+ob_gc_bits+ob_ref_local+ob_ref_sharedreplacesob_refcnt) and the refcount is split, so there is no single offset to read.ABI.Initializedetects the build viasys._is_gil_enabled()and setsObjectHeadOffset = 16on FT, shifting the generatedTypeOffset*values to absolutePyHeapTypeObjectoffsets.Runtime.Refcountprefers a P/Invoke intoPy_REFCNT(a real symbol on CPython 3.14+, required on FT). Older Pythons fall back to the existing direct read.Atomic type creation in
ReflectedClrType.GetOrCreate/TypeManager.GetTypeThe classic check-then-act on
ClassManager.cacheandTypeManager.cacheraced two ways under FT: duplicatecache.Add(ArgumentException), and partial-type visibility from outside-the-lock readers observing a half-initialised type during recursive class init. Two-cache design:cache— only fully-initialised types; outside-the-lock fast path is safe._inProgressCache— partial types visible only to the lock-holding builder, so self-referential definitions resolve via the reentrant lock.Serialisation snapshots remain
Dictionary<,>on the wire for binary compatibility.Atomic GCHandle ownership in
tp_clear/tp_deallocsubtype_clear(main thread) and the .NET finalizer thread can both reach a CLR-bound Python object's GCHandle slot. Previous read-then-zero inManagedType.TryFreeGCHandleandClassDerivedObject.tp_dealloc(strong→weak swap) let two threads observe the same handle and double-free it. Both paths now useInterlocked.Exchangeto atomically claim the slot.Reflection.Emitis not thread-safeConcurrent
TypeBuilder/ModuleBuilderoperations corrupt the IL stream or throw"Duplicate type name within an assembly". Two emit-and-bake sequences are now serialised:ClassDerivedObject.CreateDerivedType(Python subclasses of CLR types) on_buildersLock.DelegateManager.GetDispatcher(Python callables → CLR delegates) on a new_emitLock.Concurrent collections / atomics on shared state
Plain
Dictionary<,>,HashSet<>, and counters that the GIL implicitly serialised tear under FT.ConcurrentDictionary<,>:ExtensionType.loadedExtensions,CLRObject.reflectedObjects,ClassManager.cache,TypeManager.cache,ModuleObject.cache/allNames,Interop.delegateTypes,ClassBase.ClearVisited,InternString._string2interns/_intern2strings.Interlocked+Volatile:Finalizer._throttled(counter),Runtime.run(epoch),PyBuffer.disposedValue(dispose flag — finalizer + explicitDisposeno longer double-free_view).GenericUtil.mapping(nestedDictionary/List),PythonEngine.ShutdownHandlers,Runtime._pyRefs.volatile bool:Runtime._isInitialized/_typesInitialized,PythonEngine.initialized,Finalizer.started.Guard the .NET-finalizer / Python-shutdown interaction
The .NET finalizer thread can dispatch
Py_DecRefconcurrently withPy_Finalize; a staleob_ref_localread after teardown crashes the process.Runtime._Py_IsFinalizing()guards onFinalizer.ThrottledCollect,PyObject's finalizer, and theRefcount > 0Debug.AssertinXDecref/AddFinalizedObject(kept on GIL, skipped on FT).Tests + CI
New tests in
tests/test_thread.pycover the refcount/ABI invariants, theConcurrentDictionarycaches, atomic GCHandle ownership, the twoReflection.Emitlocks, and theShutdownHandlerslock. Some tests are marked@freethreaded_onlybecause the GIL-build code path triggers a separate pre-existing pythonnet crash under high-contention CLR allocation that's reproducible on master and out of scope for this PR..github/workflows/main.ymladds3.14tto the Linux/macOS matrix and skips Mono on 3.14t (clr-loader's mono backend isn't yet validated against free-threaded Python).Verification
Linux aarch64, full pytest suite (
tests/,--ignore=tests/domain_tests):(
9 skippedon GIL = 5 pre-existing + 4 FT-only tests;5 skippedon FT = the 5 pre-existing skips.)3.13t is not in the matrix because
cffi <2.0does not support free-threaded 3.13 andclr-loaderrequirescffi.