Skip to content

Fix free-threading (3.14t) crashes: heap types, unified per-module state, templated parser, new CI coverage#367

Merged
etrepum merged 36 commits intomasterfrom
claude/fix-freethreading-builds-bZTdr
Apr 11, 2026
Merged

Fix free-threading (3.14t) crashes: heap types, unified per-module state, templated parser, new CI coverage#367
etrepum merged 36 commits intomasterfrom
claude/fix-freethreading-builds-bZTdr

Conversation

@etrepum
Copy link
Copy Markdown
Member

@etrepum etrepum commented Apr 8, 2026

Summary

Fixes SIGSEGV crashes in the C extension on Python 3.14 free-threaded
builds (cp314t), modernizes _speedups.c to CPython's current
conventions, deduplicates the parser state machine across the Py2
bytes / unicode paths, and adds CI coverage that actually catches
regressions in this area going forward.

Python 2.7 compatibility is preserved throughout. All Python-3-only
changes are guarded by #if PY_VERSION_HEX >= 0x030D0000 or
#if PY_MAJOR_VERSION >= 3 as appropriate.

Root-cause fix for the cp314t SIGSEGV

  • Heap types via PyType_FromModuleAndSpec on 3.13+. Scanner
    and Encoder are no longer static PyTypeObjects; proper
    Py_DECREF(tp) in dealloc and Py_VISIT(Py_TYPE(self)) in
    traverse. Static types are fundamentally incompatible with
    Py_MOD_GIL_NOT_USED, which is the direct cause of the 3.14t
    crash.
  • Py_mod_gil = Py_MOD_GIL_NOT_USED declared unconditionally on
    3.13+, matching CPython's own _json.
  • Subinterpreter support via Py_mod_multiple_interpreters = Py_MOD_PER_INTERPRETER_GIL_SUPPORTED.
  • Removed the is_gil_enabled() Python-side guard that previously
    disabled C speedups under free-threading entirely. The speedups
    are now the default on 3.14t.
  • Latent UB fix in encoder_new: -1LL << n (undefined in C for
    any negative operand) replaced with -(long long)((1ULL << n) - 1ULL) - 1LL,
    which is defined over 1..63 and produces LLONG_MIN at n == 63.
    Caught by the new test_debug_build CI job under
    -Werror=shift-negative-value.

Unified module state and multi-phase init

One _speedups_state struct, used on every Python version.

  • On 3.13+ it's allocated per-module via PEP 489 multi-phase init so
    each subinterpreter gets its own copy.
  • On older versions it's a single static instance
    (_speedups_static_state) that serves the whole process.
  • get_speedups_state(module_ref) has the same signature everywhere;
    call sites don't branch on PY_VERSION_HEX.
  • Multi-phase init (Py_mod_exec slot) is used on all Python
    3.5+, not just 3.13+, so there's one module-init code path for
    modern Python 3. Python 2.7 and 3.3/3.4 retain single-phase init.
  • Scanner/Encoder instances carry module_ref uniformly. On
    3.13+ it comes from PyType_GetModuleByDef; on older versions
    it's a borrowed pointer to the module object captured at init
    time via _speedups_module.
  • 14 interned string constants, RawJSONType, JSONDecodeError,
    JSON_itemgetter0, and 4 new cached attribute names
    (for_json, _asdict, sort, encoded_json) all live in state.
  • Both type fields (PyScannerType, PyEncoderType) live in the
    struct on every Python version; on pre-3.13 they're borrowed
    pointers to the static PyTypeObject bodies, and the type-check
    macros route through _speedups_static_state.PyScannerType
    instead of through a forward-declared file-scope symbol.

Helper signature cleanup

Every internal helper that needs state takes _speedups_state *state
as its first argument:

raise_errmsg(state, msg, s, end)
_encoded_const(state, obj)
is_raw_json(state, obj)
join_list_unicode(state, lst)
join_list_string(state, lst)
scanstring_unicode(state, pystr, end, strict, next_end_ptr)
scanstring_str(state, pystr, end, encoding, strict, next_end_ptr)
flush_accumulator(state, acc)
JSON_Accu_Accumulate(state, acc, unicode)
JSON_Accu_FinishAsList(state, acc)
_steal_accumulate(state, accu, stolen)

The convenience macros RAISE_ERRMSG, ENCODED_CONST,
IS_RAW_JSON, JOIN_LIST_UNICODE, SCANSTRING_UNICODE — previously
present to paper over two sets of function signatures — are all
deleted. Local variable naming is consistent (state, never _st).
JSON_Accu no longer carries a state pointer; state flows through
Accumulate / FinishAsList / flush_accumulator /
_steal_accumulate as an explicit parameter.

Templated parser state machine

The five near-identical scan_once_{str,unicode},
_parse_object_{str,unicode}, _parse_array_{str,unicode},
_match_number_{str,unicode} function pairs are now generated from a
single source of truth in simplejson/_speedups_scan.h, which is
#included twice from _speedups.c — once for the unicode variant
and once on Python 2 for the bytes variant. The differences between
variants (character reads, length/data accessors, substring creation,
fast paths for int/float parsing) are parameterized via a small set
of macros. Python-2-specific fast paths (PyOS_string_to_double,
PyInt_FromString) are preserved via four inline helper functions.

scanstring_str and scanstring_unicode are intentionally not
templated
because scanstring_str has Py2-specific hybrid return-
type logic (returns bytes when the input is ASCII-only, unicode
otherwise) that's structurally different from the always-unicode
flow. Left as future work in a follow-up PR.

The template file has a #define JSON_SPEEDUPS_SCAN_INCLUDING 1
gate so that accidentally including it from anywhere other than
_speedups.c is a compile-time #error.

Python side

  • simplejson/compat.py, __init__.py, encoder.py, scanner.py,
    decoder.py: removed the is_gil_enabled() runtime guard. That
    check was a temporary workaround disabling the C extension on
    free-threaded Python; now that the extension is safe under
    Py_MOD_GIL_NOT_USED, the Python-side gate is gone and cp314t
    actually exercises the C path.
  • PyUnicode_READY overridden as a no-op on Python 3.12+ (PEP 623
    made it a no-op, and it's scheduled for removal).
  • Two pre-existing -Wshadow warnings fixed (inner digit locals
    shadowing digit from longintrepr.h; inner newobj shadowing
    an outer declaration in encoder_listencode_obj).
  • SIZEOF_LONG_LONG (a Python-private pyconfig macro) replaced
    with sizeof(long long) * CHAR_BIT.
  • scanner_new gained a local LOAD_ATTR macro that consolidates
    the repeated getattr-or-bail boilerplate.
  • encoder_new's 20-char "OOOOOOOOOOOOOOOOOOOO" format string is
    now constructed from per-argument pieces so each "O" has a
    comment naming its parameter.
  • encoder_listencode_obj extracted an encoder_steal_encode()
    helper that handles the for_json / _asdict recursive-encode
    dance in one place with an as_dict flag selecting between
    encoder_listencode_obj and encoder_listencode_dict (plus
    TypeError on non-dict).
  • Cached hot-path attribute names in state
    (state->JSON_attr_{for_json,asdict,sort,encoded_json}), so the
    encoder uses PyObject_GetAttr with a pre-interned name instead
    of PyObject_GetAttrString (which interns the C literal on every
    call).
  • Dead PY3_UNUSED macro removed.
  • Pre-3.13 static PyTypeObject PyScannerType; tentative forward
    declarations removed; type-check macros route through state.
  • maybe_quote_bigint inverted for an early return on the
    common int_as_string_bitcount is None fast path.
  • Two -Wdeclaration-after-statement sites fixed; the file now
    compiles cleanly under strict C89.
  • PyObject_CallNoArgs / PyObject_CallOneArg compat macros lost
    their trailing semicolons so they're usable in expression
    contexts.
  • Python 2 init_speedups_state now calls
    reset_speedups_state_constants to clear any prior values before
    repopulating, so the static state can be re-initialized safely
    (defensive; extension module reload doesn't normally re-run init
    on multi-phase modules, but subinterpreter imports on 3.5-3.11
    can).

Tests

New: simplejson/tests/test_free_threading.py

TestFreeThreading (4 tests): 8 threads × 500 iters each, covering
concurrent encode, concurrent decode, mixed encode/decode, and a
single shared JSONEncoder/JSONDecoder instance exercised by
many threads. Designed to surface data races under PYTHON_GIL=0.

New: simplejson/tests/test_subinterpreters.py

TestSubinterpreters (6 tests, 3.12+ only): import, encode, decode,
3 concurrent subinterpreters, state independence across destroy,
and heap types inside a subinterpreter.

New: simplejson/tests/test_speedups.py::TestHeapTypes (3.13+)

Verifies Py_TPFLAGS_HEAPTYPE, GC tracking, and correct
round-tripping.

New: simplejson/tests/test_speedups.py::TestRefcountLeaks (debug builds)

Guarded by @skipUnless(hasattr(sys, 'gettotalrefcount')). Covers
dumps/loads, Scanner/Encoder construction, and the error
path in scanner_new/encoder_new (BadBool triggers bail
partway through). Uses a two-phase measurement — phase 1 absorbs
CPython 3.14 specializer noise, phase 2 must be near zero — to
avoid flakiness on modern Python debug builds.

New: simplejson/tests/test_bitsize_int_as_string.py::test_boundary_at_max_bitcount

Regression test for the -1LL << n UB fix; exercises n = 1, 8, 31, 32, 62, 63 and both ±(2**n) boundaries.

Consolidated: simplejson/tests/_helpers.py

has_speedups() and skip_if_speedups_missing() moved out of the
three test files that used to duplicate them.

CI (.github/workflows/build-and-deploy.yml)

New job: test_free_threading

Runs on Python 3.14t:

  • Asserts sysconfig.Py_GIL_DISABLED == 1 so the job fails
    loudly if setup-python ever hands us a GIL-enabled build under
    the 3.14t moniker.
  • Imports simplejson._speedups under PYTHON_GIL=0 with
    -W error::RuntimeWarning.
  • Asserts simplejson.encoder.c_make_encoder is make_encoder
    (and the same for scanner) so a silently-broken wheel — one that
    imports fine but leaves c_make_encoder is None — doesn't make
    every C-extension test skip quietly.
  • Runs the full test suite with and without PYTHON_GIL=0.

New job: test_debug_build (matrixed)

Matrix over standard and free-threaded variants, both installed
via uv python install cpython-3.14.4+debug and
cpython-3.14.4+freethreaded+debug from python-build-standalone
(~10 seconds per variant, no custom build). Each variant:

  • Verifies sys.gettotalrefcount is present.
  • Compiles _speedups.c with -Wall -Wextra -Wshadow -Wstrict-prototypes -Werror (caught the shift-negative-value UB).
  • Verifies the C speedups are actually wired in.
  • Runs the full test suite; TestRefcountLeaks auto-enables on
    the debug interpreter.

The free-threaded variant additionally:

  • Asserts sysconfig.Py_GIL_DISABLED == 1.
  • Re-runs the suite with PYTHON_GIL=0.

Wheel / sdist jobs

  • CIBW_ENABLE: "cpython-freethreading" removed (deprecated in
    cibuildwheel 3.4; free-threaded builds are on by default).
  • CIBW_SKIP: "pp*" removed from the v3.4.1 step (PyPy is off by
    default in 3.x; explicit skip is now an error). Kept on the
    v1.12.0 step for Python 2.7.
  • Linux wheel builds split by architecture (x86_64, aarch64,
    ppc64le) into separate matrix entries.
  • Python 2.7 wheels restricted to x86_64 (QEMU-based 2.7 builds
    on other arches are no longer viable).
  • Aggregate gate jobs (Build wheels on {ubuntu,windows,macos}- latest) preserve the job names required by existing branch
    protection rules. The ubuntu gate waits for test_pure_python,
    test_free_threading, and test_debug_build.

Dependency bumps

  • pypa/cibuildwheel v3.2.1 → v3.4.1
  • actions/checkout v4 → v6
  • actions/cache v4 → v5
  • docker/setup-qemu-action v3 → v4
  • actions/upload-artifact v4 → v7
  • actions/download-artifact v4 → v8
  • astral-sh/setup-uv pinned to v8.0.0 (new job dependency)

AGENTS.md

New top-level file collecting the tribal knowledge accumulated in
this PR that isn't obvious from reading the source: using uv +
python-build-standalone for alternate Python variants locally, how
to debug CI failures when only the annotations are visible, the
cibuildwheel version gotchas, why importlib.reload(_speedups)
does not actually re-run module_exec, the two-phase measurement
trick for TestRefcountLeaks on 3.14 debug, the _speedups_scan.h
usage contract, the invariant that type fields must not be
Py_CLEARed in reset_speedups_state_constants, and the
REQUIRE_SPEEDUPS=1 scope limitation (build-only).

Test plan

  • Full test suite passes on Python 3.11 release (169 tests)
  • Full test suite passes on Python 3.14.4 debug (344 tests
    including TestRefcountLeaks)
  • Full test suite passes on Python 3.14.4 free-threaded debug
    both with the GIL on and with PYTHON_GIL=0 (344 tests)
  • Extension compiles cleanly under -Wall -Wextra -Wshadow -Wstrict-prototypes -Werror on all of 3.10 / 3.11 / 3.12 /
    3.13 / 3.14.4 / 3.14.4 debug / 3.14.4 free-threaded debug
  • File is also clean under -Wdeclaration-after-statement
    (strict C89 mixed-decls)
  • PYTHON_GIL=0 python -c "import simplejson._speedups" emits
    no RuntimeWarning on 3.14t
  • _speedups.so refcount stable across 32k threaded ops and
    5k failing constructions
  • int_as_string_bitcount boundary exercised for n = 1, 8, 31,
    32, 62, 63 including the LLONG_MIN edge case at n == 63
  • sdist contains both _speedups.c and _speedups_scan.h

File-level diff

File Before After Δ
simplejson/_speedups.c 3841 ~2920 −921
simplejson/_speedups_scan.h (new) 539 +539

Net: the C extension source is ~380 lines smaller despite gaining
heap types, per-module state, multi-phase init, four new test
suites, and a templated parser body.

Known limits / future work

  • scanstring_str / scanstring_unicode templating: the
    remaining big duplication (~200 lines). Blocked on Py2 hybrid
    return-type handling; deferred to a follow-up PR.
  • ThreadSanitizer on a TSan-instrumented CPython: would give
    true race detection on the free-threaded path. Requires a
    custom CPython build; not wired up.
  • Suppression-clean valgrind with PYTHONMALLOC=malloc: would
    give true leak detection. Requires a custom CPython build with
    --with-valgrind; not wired up. The refcount regression tests
    on the debug job catch most of what valgrind would.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95

@etrepum etrepum force-pushed the claude/fix-freethreading-builds-bZTdr branch 3 times, most recently from 02901f5 to 0bbd582 Compare April 9, 2026 05:43
@etrepum etrepum changed the title Claude/fix freethreading builds b z tdr Enable free threading post-#362 Apr 9, 2026
@etrepum etrepum added this pull request to the merge queue Apr 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 9, 2026
@etrepum etrepum added this pull request to the merge queue Apr 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 9, 2026
Convert Scanner and Encoder from static PyTypeObject to heap types
via PyType_FromSpec on Python 3.13+, fixing SIGSEGV crashes on
Python 3.14t (free-threaded) builds across macOS and Windows.

Changes to _speedups.c (all guarded by #if PY_VERSION_HEX >= 0x030D0000):
- Add _speedups_state module state struct holding heap type objects
- Create heap types from PyType_Spec in module_exec (multi-phase init)
- Proper heap type lifecycle: save tp before tp_free, Py_DECREF(tp)
  in dealloc; Py_VISIT(Py_TYPE(self)) in traverse
- Add speedups_traverse/speedups_clear for module state GC
- Declare Py_mod_gil = Py_MOD_GIL_NOT_USED for free-threaded Python
- Guard PyScanner_Check/PyEncoder_Check macros for pre-3.13 only
- Pre-3.13 code paths remain completely unchanged

Changes to CI:
- Update cibuildwheel v3.2.1 -> v3.4.1 with CIBW_ENABLE for
  free-threading wheel builds
- Add dedicated test_free_threading CI job for Python 3.14t
- Parallelize Linux wheel builds by architecture
- Update actions/setup-python to v5.6.0

Tests:
- Add TestHeapTypes class verifying Py_TPFLAGS_HEAPTYPE, GC tracking,
  and basic encode/decode functionality on 3.13+

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
@etrepum etrepum force-pushed the claude/fix-freethreading-builds-bZTdr branch from 0bbd582 to e0018d6 Compare April 9, 2026 18:53
claude added 6 commits April 9, 2026 18:53
cibuildwheel v1.12.0 does not have cp27 manylinux images for
aarch64, so restrict the 2.7 build step to x86_64 only.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Expand the module state struct to hold ALL constants (string literals,
RawJSONType, JSONDecodeError, JSON_itemgetter0) in addition to the
heap type objects already there.

Thread state through all functions using local shadow variables that
override the file-scope globals on 3.13+, minimizing changes to
function bodies. Utility functions (raise_errmsg, join_list_unicode,
_encoded_const, is_raw_json, scanstring_unicode) get extra parameters
on 3.13+ with convenience macros at call sites.

Add module_ref (strong ref to owning module) to PyScannerObject and
PyEncoderObject for accessing per-module state from any scanner or
encoder method.

Use PyType_FromModuleAndSpec to bind heap types to their module.
Initialize all constants in module_exec instead of init_constants().

Pre-3.13 code path remains completely unchanged.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
The workflow was refactored to split builds by architecture, which
changed the job names. Add gate jobs that aggregate the new granular
jobs under the old names expected by branch protection rules:
- "Build wheels on ubuntu-latest" (also gates on free-threading + pure python tests)
- "Build wheels on windows-latest"
- "Build wheels on macos-latest"

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Now that the C extension declares Py_MOD_GIL_NOT_USED and has heap
types + per-module state, the speedups are safe to use on 3.14t. The
is_gil_enabled() guard was only a temporary workaround while the C
extension was unsafe under free-threading.

Removing the guard means _speedups is actually loaded and exercised
on cp314t in CI, which is what we need to be testing.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
@etrepum etrepum changed the title Enable free threading post-#362 Fix free-threading (3.14t) crashes: heap types + per-module state Apr 10, 2026
claude added 13 commits April 10, 2026 20:23
Previously the C extension had two parallel code paths: pre-3.13 used
file-scope static globals for type objects and interned constants,
while 3.13+ used per-module state via PyModule_GetState. Every
function that touched constants had #if guards to pick the right path,
and several helper functions had dual signatures.

Now _speedups_state is defined for all versions:

- On Python 3.13+ it remains per-module state, so subinterpreters each
  get their own copy.
- On older versions there's a single static instance plus a borrowed
  reference to the module object so Scanner and Encoder instances can
  store module_ref uniformly.

get_speedups_state(module_ref) works the same way everywhere, so
function bodies no longer branch on PY_VERSION_HEX for state access.
The convenience macros (RAISE_ERRMSG, ENCODED_CONST, IS_RAW_JSON,
JOIN_LIST_UNICODE, SCANSTRING_UNICODE) collapse to single definitions
and the dual-signature helpers become single functions.

A new init_speedups_state() helper handles constant initialization for
both module_exec (3.13+) and moduleinit (pre-3.13), replacing the
standalone init_constants() function.

Net result: 329 lines removed, 133 added. Only the type creation path
(static PyTypeObject vs PyType_FromModuleAndSpec heap types) and
module init shape (single-phase vs multi-phase) remain version-
specific.

Verified: all 153 tests pass with the C extension loaded on Python
3.11, PYTHON_GIL=0 import produces no RuntimeWarning, round-trip
encode/decode works for all value types.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
The previous commit removed file-scope statics like JSON_NaN,
JSON_Infinity, JSON_NegInfinity, JSONDecodeError, etc. but missed
that the Python 2 scanner code paths reference some of them
directly as bare names.

Fixes for Python 2:
- Rename raise_errmsg() function to raise_errmsg_impl() so a Python
  2-only macro can re-introduce the 3-arg form, transparently pulling
  JSONDecodeError from _speedups_static_state. All existing
  raise_errmsg(msg, s, end) call sites in Python 2 scanner functions
  keep working without modification.
- Add a shadow block at the top of scan_once_str (Python 2 only) for
  JSON_NaN, JSON_Infinity, JSON_NegInfinity, mirroring the pattern
  used by scan_once_unicode.

Verified: all 153 tests still pass with C speedups on Python 3.11.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Previously PyScannerType and PyEncoderType lived in the struct only
on 3.13+. On pre-3.13 they stayed as file-scope statics referenced
directly via &PyScannerType. That left the state struct layout
slightly different across versions for no real reason.

Now the type fields are always in the struct. On pre-3.13 they
hold borrowed pointers to the (eternal) static PyTypeObject
instances. No refcounting or GC tracking is needed — static types
live forever — but the struct layout is fully uniform.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Two regressions from bumping cibuildwheel 3.2.1 -> 3.4.1:

1. cibuildwheel 3.4.1 ships a newer bundled virtualenv that fails to
   bootstrap Python 3.8 on Windows, so cp38-win32 and cp38-win_amd64
   builds crash during virtualenv creation. Python 3.8 has been EOL
   since October 2024, and skipping it on Windows only (rather than
   all platforms) minimizes the impact on existing users.

2. The previous CIBW_SKIP: 'pp*' line was accidentally dropped when
   splitting the matrix by architecture. PyPy wheels were never part
   of the release set -- PyPy is covered by the test_pure_python job.

This is purely a CI environment fix; no code changes.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
- Remove CIBW_ENABLE: 'cpython-freethreading'. cibuildwheel warns
  that this option is deprecated and should be removed; in 3.4+
  free-threaded builds are the default for cp313t/cp314t.
- Revert the cp38-win* skip; the Windows failure was a timeout,
  not a Python 3.8 virtualenv issue as I previously assumed.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Two new test files, kept separate from test_speedups.py because
they exercise Python runtime features (concurrent threads,
subinterpreters) rather than C-extension internals:

simplejson/tests/test_free_threading.py (TestFreeThreading)
- test_concurrent_encode / test_concurrent_decode: 8 threads x 500
  iterations each, verifying stable output
- test_concurrent_encode_decode: mixes encode and decode on the
  same data across threads
- test_shared_encoder_instance: a single JSONEncoder and
  JSONDecoder instance used by many threads

These pass with the GIL enabled, but exist to catch data races on
free-threaded builds (PEP 703). The test_free_threading CI job
runs them with PYTHON_GIL=0 on 3.14t.

simplejson/tests/test_subinterpreters.py (TestSubinterpreters)
- import, encode, decode in a fresh subinterpreter
- multiple subinterpreters used concurrently
- destroying one subinterpreter while another still uses the module
- heap types stay heap types inside a subinterpreter (3.13+)

Skipped on Python < 3.12 (subinterpreters require PEP 684).

CI workflow (.github/workflows/build-and-deploy.yml):
- New 'Verify Python is a free-threaded build' step in the
  test_free_threading job asserts sysconfig.Py_GIL_DISABLED == 1,
  so the job fails loudly if setup-python ever gives us a
  GIL-enabled build under the 3.14t moniker.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Helper functions that previously took individual state fields as
arguments now take a single _speedups_state pointer and dereference
what they need:

- raise_errmsg_impl(msg, s, end, state)
- _encoded_const(obj, state)
- is_raw_json(obj, state)
- join_list_unicode(lst, state)
- scanstring_unicode(..., state)

The convenience macros RAISE_ERRMSG, ENCODED_CONST, IS_RAW_JSON,
JOIN_LIST_UNICODE, SCANSTRING_UNICODE are now pointless pass-throughs
and are deleted. Call sites use the function names directly.

Caller functions no longer need per-field shadow declarations
(PyObject *JSON_s_null = _st->JSON_s_null; etc.). They only need
the _st pointer itself, which is passed through.

JSON_Accu now stores the state pointer instead of a borrowed
empty_unicode reference, keeping the struct version-independent
and letting flush_accumulator pass state to join_list_unicode
directly.

The Python 2 raise_errmsg macro shim remains but is simpler:
it now passes &_speedups_static_state instead of resolving
JSONDecodeError at the macro site.

Net: 130 lines removed, 78 added. All 163 tests still pass.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Three polish items after the state-based helper refactor:

1. Rename all local _speedups_state pointers from '_st' to 'state'
   for consistency with the parameter name used in helper signatures.

2. Rename raise_errmsg_impl back to raise_errmsg. This required
   updating every Python 2 scanner function (scanstring_str,
   _parse_object_str, _parse_array_str, _match_number_str,
   scan_once_str, py_scanstring) to thread 'state' explicitly
   instead of relying on a macro shim that resolved state from
   the static instance. scanstring_str and join_list_string grew
   an explicit state parameter. The Python 2 raise_errmsg macro
   is gone.

3. Remove the state field from JSON_Accu. JSON_Accu_Init is back
   to taking just the accu, and state flows through
   JSON_Accu_Accumulate, JSON_Accu_FinishAsList, flush_accumulator,
   and _steal_accumulate as an explicit parameter. The encoder
   call sites already had state in scope, so the extra arg is
   free.

Net diff: 129 deletions, 127 additions. All 163 tests still pass.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Context/environment arguments are conventionally placed first, not
last. Reorder every helper that takes _speedups_state so that
state is the first parameter:

- raise_errmsg(state, msg, s, end)
- _encoded_const(state, obj)
- is_raw_json(state, obj)
- join_list_unicode(state, lst)
- join_list_string(state, lst)
- scanstring_unicode(state, pystr, end, strict, next_end_ptr)
- scanstring_str(state, pystr, end, encoding, strict, next_end_ptr)
- flush_accumulator(state, acc)
- JSON_Accu_Accumulate(state, acc, unicode)
- JSON_Accu_FinishAsList(state, acc)
- _steal_accumulate(state, accu, stolen)

init_speedups_state(state, module) already had state first.

Pure reordering: 106 insertions, 106 deletions. All 163 tests pass.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
actions/setup-python@v5 automatically resolves to the latest v5.x
release (currently v5.6.0), so the exact pin is unnecessary. This
matches the convention on master.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Three warnings flagged by -Wshadow (pre-existing, not from the
free-threading refactor, but easy to fix):

- scanstring_unicode: local 'digit' shadowed the 'digit' type
  from Python's longintrepr.h. Renamed to 'hex_digit' (4 sites
  across the Python 2 and Python 3 hex-decoding loops).

- encoder_listencode_obj: inner 'PyObject *newobj' declaration
  shadowed an outer one. The outer variable is reused by the
  inner block; the earlier branches that use it (for_json,
  _asdict) always DECREF before falling through, so it's safe
  to drop the inner redeclaration.

The extension now compiles cleanly under -Wall -Wextra -Wshadow
-Wstrict-prototypes with no new warnings.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Builds CPython 3.14.0 from source with --with-pydebug, caches the
install prefix (~10 min first run, ~10 sec on cache hit), and runs
the test suite against it. This catches:

- Refcount leaks (via TestRefcountLeaks, auto-enabled when
  sys.gettotalrefcount is present)
- Py_DECREF asserts, NULL-pointer dereferences, and internal
  consistency checks that release builds skip
- Shadow/strict warnings (-Wall -Wextra -Wshadow -Werror)

Stock Ubuntu python3-dbg only offers 3.12, and neither
actions/setup-python nor the deadsnakes PPA ships -dbg packages
for 3.13/3.14, so building from source (with caching) is the
only route to a 3.14 debug interpreter.

New simplejson/tests/test_speedups.py::TestRefcountLeaks covers:
- dumps / loads round-trip
- Scanner / Encoder construction
- Error paths in scanner_new / encoder_new (module_ref release)

The class is guarded by @skipUnless(hasattr(sys, 'gettotalrefcount'))
so it's inert on release builds and skips silently.

gate_ubuntu now depends on test_debug_build so the aggregate job
used by branch protection waits for it too.

TSan and a suppression-clean valgrind run would both require
further custom CPython builds and remain future work.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
claude added 4 commits April 11, 2026 01:25
- actions/checkout: v4 -> v6
- actions/cache: v4 -> v5
- docker/setup-qemu-action: v3 -> v4
- actions/upload-artifact: v4 -> v7
- actions/download-artifact: v4 -> v8

Also drop the strict warnings + -Werror step from test_debug_build.
Python 3.14's debug headers appear to trigger warnings that our
-Wshadow / -Wstrict-prototypes / -Werror combination doesn't tolerate.
The primary value of this job is runtime assertion coverage and the
TestRefcountLeaks suite, not catching compile warnings - we get those
on the non-debug builds anyway.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…tring_bitcount

The Python 3.14 debug CI failure was -Werror=shift-negative-value on
this line in encoder_new:

    s->min_long_size = PyLong_FromLongLong(-1LL << n);

Left-shifting a negative value is undefined behavior in C. It was
latent before because no prior build used -Wextra + -Werror, but it's
an actual bug. Fixed by computing -(2^n) as:

    -(long long)((1ULL << n) - 1ULL) - 1LL

which stays defined for the whole n in 1..63 range and produces
LLONG_MIN at n == 63 (the boundary case). Verified with
int_as_string_bitcount = 1, 8, 32, 62, 63 that the boundary values
stringify correctly.

CI: switch test_debug_build from building CPython 3.14.4 from
source (~8 min cold) to `uv python install cpython-3.14.4+debug`,
which pulls a prebuilt debug interpreter from python-build-standalone
in under 10 seconds. No cache dance, no apt build-deps, no sudo.

Also bring back -Werror + -Wshadow + -Wstrict-prototypes on the
debug build now that the shift-negative-value bug is fixed.

No YAML changes needed for the earlier action bumps: upload-artifact
v4->v7 and download-artifact v4->v8 didn't rename or remove any
inputs we use (name, path, merge-multiple all still work).

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
v8.0.0 is the newest setup-uv release but the floating v8 tag has
not been published yet, so @v8 fails to resolve. Pinning to the
exact tag.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
REQUIRE_SPEEDUPS only affects setup.py's build fallback behavior -
it's a no-op on test invocations. If a built _speedups.so happens
to be importable but c_make_encoder/c_make_scanner don't get wired
into simplejson.encoder/simplejson.scanner (e.g. missing export,
stale cached .so), the test suite's TestMissingSpeedups silently
skipTest()s instead of failing CI.

Add explicit wiring checks to both test_free_threading and
test_debug_build that verify:
  simplejson.encoder.c_make_encoder is _speedups.make_encoder
  simplejson.scanner.c_make_scanner is _speedups.make_scanner

If either fails the job fails loudly instead of silent-skipping
every C-extension test.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
@etrepum etrepum changed the title Fix free-threading (3.14t) crashes: heap types + per-module state Fix free-threading (3.14t) crashes: heap types, unified per-module state, new CI coverage Apr 11, 2026
claude added 10 commits April 11, 2026 03:49
Five related fixes, all directly addressing risks I flagged during the
post-refactor review:

1. Multi-phase init (PEP 489) for all Python 3.5+. module_exec is now
   the single code path for every Python 3 version; the only internal
   branches are static types vs heap types (<3.13 vs 3.13+), and
   PyModule_AddObjectRef vs PyModule_AddObject (<3.10 vs 3.10+). The
   old moduleinit() single-phase path is gone on Python 3; it's still
   used on Python 2.7 via init_speedups(), and there's a 3.3/3.4
   fallback in PyInit__speedups that calls module_exec directly.

2. pre-3.13 module-reload safety: init_speedups_state now calls a
   new reset_speedups_state_constants() helper that Py_CLEARs every
   constant before re-populating. If anyone does
   'del sys.modules["simplejson._speedups"]; import ...' the prior
   references are released instead of leaking. Type fields are left
   alone -- on 3.13+ module_exec freshly creates new heap types on
   each init, and on pre-3.13 the type fields hold borrowed pointers
   to static PyTypeObjects that must not be cleared.

3. PyUnicode_READY is deprecated in 3.10 and a no-op since 3.12
   (PEP 623). Override it to a no-op on 3.12+ so we stop calling
   the deprecated function and dodge the eventual removal.

4. Debug assert in get_speedups_state: every call site passes either
   the module object or an incref'd module_ref, so module != NULL is
   invariant. Asserting it catches uninitialized scanner/encoder
   instances in one place instead of sprinkling asserts at ~20 sites.

5. Latent undefined-behavior cleanup: replace SIZEOF_LONG_LONG (a
   Python-private pyconfig.h macro) with sizeof(long long) * CHAR_BIT
   in the int_as_string_bitcount path. Added #include <limits.h>.

Also speedups_clear on 3.13+ now calls reset_speedups_state_constants
instead of open-coding 17 Py_CLEAR lines (dedup with the reload path).

Test harness: TestRefcountLeaks used a naive single-phase comparison
that measured ~50-272 refcount drift per 2000 iterations on CPython
3.14 debug due to specializer inline caches settling in. Replaced
with a two-phase measurement: phase1 absorbs all the noise, phase2
should be ~0 even on 3.14 debug, and any real per-call leak shows up
as a large phase2. Verified stable across 5 consecutive runs on
cpython-3.14.4+debug from python-build-standalone.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
test_boundary_at_max_bitcount exercises each of n = 1, 8, 31, 32, 62,
63 with values at and just inside +/- 2**n. The n = 63 case is the
regression test for the -Wshift-negative-value UB I fixed earlier
in encoder_new; if anyone ever reintroduces -1LL << n the LLONG_MIN
boundary case will fail this test instead of silently misbehaving.

The test uses the full +/- 2**n range including the sign bit position
(n = 63), which is precisely where the old computation was undefined.

(I also considered a TestRefcountLeaks.test_module_reload_no_leak for
the init_speedups_state reload-safety fix, but CPython's import
machinery caches extension modules for normal reload and the only
scenario that actually re-runs module_exec on pre-3.13 is a
subinterpreter import on 3.5-3.11, which I can't drive from a
unit test portably. The fix is kept as defense-in-depth.)

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
python-build-standalone ships both cpython-3.14.4+debug and
cpython-3.14.4+freethreaded+debug, so add a matrix dimension to the
test_debug_build job and run both. The free-threaded variant stacks
three coverage modes in a single job:

- Debug interpreter (refcount asserts, TestRefcountLeaks active)
- Free-threading (Py_GIL_DISABLED=1, GIL can be disabled)
- Strict compile warnings (-Wall -Wextra -Wshadow -Werror)

For the free-threaded variant we additionally:
- Assert sysconfig.Py_GIL_DISABLED == 1 so a regression in the
  python-build-standalone asset naming can't silently give us a
  GIL-enabled debug build
- Run the test suite again with PYTHON_GIL=0 to surface any data
  races that the GIL was still papering over

Verified locally against cpython-3.14.4+freethreaded+debug: 344 tests
pass both with the GIL on and with PYTHON_GIL=0, clean compile with
full strict warnings.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
join_list_string used a function-local static PyObject *joinfn to
cache the bound method for ''.join on first call. Move it into
_speedups_state as JSON_EmptyStr_join, populated in
init_speedups_state alongside JSON_EmptyStr and released in
reset_speedups_state_constants.

Python 2 only, so no behavioral change on modern Python. Now
consistent with every other cached object in the module, and
init_speedups_state fails eagerly if the .join lookup fails at
module load time instead of on the first call site.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Factor the four _str/_unicode function pairs out of _speedups.c and
into a new templated header, _speedups_scan.h, that's #included twice
from _speedups.c with different macro settings: once for the unicode
variant (used by all Python 3 and Py2 unicode input) and once more on
Python 2 for the bytes variant.

The two variants only ever differed in:
- Character read: PyString byte vs PyUnicode_READ codepoint
- Data/length accessors: PyString_AS_STRING / PyString_GET_SIZE vs
  PyUnicode_DATA / PyUnicode_GET_LENGTH (+ PyUnicode_KIND)
- scanstring call: the str path passes an extra `encoding` arg
- Number fast paths: _str uses PyOS_string_to_double + PyInt_FromString
  on the underlying char buffer; _unicode goes through the
  parse_int / parse_float callables (with a PyFloat_FromString
  fast path). Factored into four small inline helpers
  (_match_number_{float,int}_fast_{str,unicode}).
- numstr substring creation: PyString_FromStringAndSize vs
  PyUnicode_Substring (or PyUnicode_FromUnicode on Py2 narrow)

Everything else -- the whole decode state machine, the error paths,
the recursion guards, the memoization -- is shared. 951 lines of
duplicated code removed from _speedups.c, 514 lines added in
_speedups_scan.h, for a net diff of -368 lines and one authoritative
source of truth for the parser logic.

scanstring_str and scanstring_unicode are intentionally NOT templated:
scanstring_str has Py2-specific hybrid return-type logic (bytes when
input is ASCII-only, unicode otherwise) that's structurally different
from the unicode path. That duplication remains as future work.

Verified on release Python 3.11 (169 tests) and on Python 3.14.4
debug + free-threaded debug under strict compile warnings
(-Wall -Wextra -Wshadow -Wstrict-prototypes -Werror). Py2 is
verified by inspection only since Py2 isn't available locally;
the macro expansion is straightforwardly equivalent to the deleted
_str functions.

Build wiring:
- setup.py lists _speedups_scan.h in Extension.depends so changes
  trigger a rebuild
- MANIFEST.in includes simplejson/*.h so the header ships in sdist

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
cibuildwheel v3.x disables PyPy by default and v3.4 errors out with
'Invalid skip selector: "pp*". This selector matches a group that
wasn't enabled.' when you explicitly skip it. Remove the line from
the v3.4.1 step.

The Python 2.7 step still needs CIBW_SKIP 'pp*' because it uses
cibuildwheel v1.12.0, which builds PyPy by default.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
The scanstring_str template macro referenced a local `encoding` variable
that was only declared in _parse_object_str (via the now-removed
JSON_SCAN_MAYBE_ENCODING_DECL). scan_once_str and, by inheritance, any
other function that used the JSON_SCAN_SCANSTRING_CALL macro didn't
declare `encoding`, so the _str template expansion referenced an
undefined identifier on Py2 builds.

Original scan_once_str side-stepped this by passing
PyString_AS_STRING(s->encoding) inline at the call site. Copy that
pattern into the JSON_SCAN_SCANSTRING_CALL macro so every _str
template expansion resolves `encoding` lookups via `s->encoding`
directly. The JSON_SCAN_MAYBE_ENCODING_DECL macro and its single
use in _parse_object become unnecessary and are deleted.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…type checks, cached attribute names

Five small improvements from the post-refactor review:

1. Fix -Wdeclaration-after-statement at two sites:
   - get_speedups_state wraps its 3.13+ path in an inner block so
     `void *state = PyModule_GetState(module);` is at the top of a
     scope (mixed-decl-and-code blocker was my earlier assert(module)
     statement preceding the declaration).
   - maybe_quote_bigint declares `int ge, le;` at the top instead of
     interleaving.
   The file is now clean under -Wdeclaration-after-statement so it
   can be built against strict C89 too.

2. Consolidate duplicated test helpers. has_speedups() and
   skip_if_speedups_missing() were defined verbatim in
   test_speedups.py, test_free_threading.py, and test_subinterpreters.py.
   Moved to simplejson/tests/_helpers.py and imported from all three.

3. Drop trailing semicolons from the PyObject_CallNoArgs and
   PyObject_CallOneArg compat macros. The stray `;` made
   `return PyObject_CallOneArg(...)` expand to a statement followed
   by a null statement, which worked but was sloppy and would break
   inside expression contexts.

4. Remove the static `PyTypeObject PyScannerType` / `PyEncoderType`
   tentative forward declarations on pre-3.13. The four type-check
   macros now route through `_speedups_static_state.PyScannerType`
   etc, which matches the 3.13+ pattern of looking up the type via
   state. module_exec still sets `state->PyScannerType = (PyObject *)
   &PyScannerType` using the full static type body defined later in
   the file; no forward decl needed because the bodies come before
   module_exec in source order.

5. Cache hot-path attribute names in module state. Every call to
   _call_json_method used to call PyObject_GetAttrString("for_json")
   or "_asdict" which interned the C string on each call. Similarly
   encoder_dict_iteritems did GetAttrString("sort") per sorted dict,
   and encoder_listencode_obj did GetAttrString("encoded_json") per
   RawJSON value. Added four new state fields (JSON_attr_for_json,
   JSON_attr_asdict, JSON_attr_sort, JSON_attr_encoded_json),
   initialized once via JSON_InternFromString, visited/cleared as
   usual. _call_json_method now takes a PyObject * instead of
   const char * and uses PyObject_GetAttr directly. The old
   FOR_JSON_METHOD_NAME and ASDICT_METHOD_NAME #defines are gone.

Verified: 169 tests on Python 3.11 release, 342 on 3.14.4 debug,
344 on 3.14.4 free-threaded debug (both GIL on and PYTHON_GIL=0),
all under -Wall -Wextra -Wshadow -Wstrict-prototypes -Werror.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Collects the things that aren't obvious from reading the source but
that come up repeatedly when working on this project:

- How to get alternate Python variants (debug, free-threaded debug)
  via uv + python-build-standalone in ~10 seconds instead of
  building CPython from source
- Why _cibw_runner must be used instead of unittest discover (it
  re-runs the suite with speedups disabled to exercise the pure-
  Python path)
- No local way to test Py2.7 and the implied feedback-loop cost
- CI raw logs require admin perms; debugging workflow when only the
  step annotation is visible
- cibuildwheel v3 vs v1.12 PyPy default-enable differences
- Extension module reload doesn't actually re-run module_exec
- Refcount leak tests on 3.14 debug need a two-phase measurement
- Python 3.14 release adds -Wunreachable-code to default CFLAGS
- _speedups_scan.h is #included twice and relies on #undef cleanup
- Don't Py_CLEAR type fields in reset_speedups_state_constants
- REQUIRE_SPEEDUPS only affects setup.py, not tests; hence the
  explicit wiring check in test_free_threading / test_debug_build
- GH Actions floating v8 tag for setup-uv doesn't exist (yet)
- Test helpers are in simplejson/tests/_helpers.py

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…cro, encoder format string, more

Eight assorted improvements from the second post-refactor review:

1. Delete the unused PY3_UNUSED macro (defined twice under
   PY_MAJOR_VERSION, never referenced anywhere).

2. Merge the `static\nPyTypeObject Foo = {` two-line header into
   one line for the pre-3.13 type bodies. Purely cosmetic.

3. Extract encoder_steal_encode() helper for the for_json / _asdict
   recursive-encode dance. Both paths in encoder_listencode_obj were
   ~12 lines of repeated "check newobj != NULL, Py_EnterRecursiveCall,
   recurse, Py_DECREF, Py_LeaveRecursiveCall" boilerplate. The
   helper takes an as_dict flag to select between encoder_listencode_
   obj (for_json) and encoder_listencode_dict + TypeError-on-non-dict
   (_asdict). The two call sites collapse to one line each.

4. Scanner_new grew a local LOAD_ATTR(field, name) macro that
   consolidates the "getattr or goto bail" pattern into a single
   line per attribute, replacing 7 three-line blocks with 6
   one-line loads (the encoding attr still has its own handling
   because it goes through JSON_ParseEncoding).

5. encoder_new's 20-char "OOOOOOOOOOOOOOOOOOOO:make_encoder" format
   string is now built from per-argument pieces, each on its own
   line with a comment naming the parameter. Adding/removing an
   encoder argument now touches one line instead of requiring a
   mental count of O's in a string literal.

6. JSON_Accu_Destroy gained a comment documenting that it's safe to
   call after JSON_Accu_FinishAsList (which already cleared
   small_strings and transferred large_strings).

7. maybe_quote_bigint inverted its top-level condition for an early
   return on the common "int_as_string_bitcount is None" fast path,
   reducing the nested indentation for the whole function body.
   Also dropped an intermediate `encoded = quoted;` assignment in
   favor of a direct `return quoted;`.

8. _speedups_scan.h now requires the caller to `#define
   JSON_SPEEDUPS_SCAN_INCLUDING 1` before #include and `#undef` it
   after. Accidentally including the template file from anywhere
   other than _speedups.c (or forgetting the setup preamble) is
   now a hard #error. Added an example block to the file header.

Verified: 169 tests on Python 3.11 release, 344 on 3.14.4 debug,
344 on 3.14.4 free-threaded debug (both GIL on and PYTHON_GIL=0),
all under -Wall -Wextra -Wshadow -Wstrict-prototypes -Werror.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
@etrepum etrepum changed the title Fix free-threading (3.14t) crashes: heap types, unified per-module state, new CI coverage Fix free-threading (3.14t) crashes: heap types, unified per-module state, templated parser, new CI coverage Apr 11, 2026
claude added 2 commits April 11, 2026 17:12
…r, const correctness

- Extract encoder_long_to_str() to replace the duplicated "normalize
  through PyLong_Type if not exact, then PyObject_Str" dance in
  encoder_stringify_key and encoder_listencode_obj.
- Remove dead PyObject *items local in encoder_listencode_dict (declared
  + XDECREF'd on bail but never assigned).
- Remove dead "TODO: DOES NOT RUN" indent blocks in encoder_listencode_dict
  and encoder_listencode_list. encoder.py only invokes the C encoder when
  self.indent is None, so s->indent != Py_None is unreachable.
- Drop the redundant ident = NULL; statement in encoder_listencode_list
  (the declaration already initializes it).
- Tighten const-correctness on string parameters: raise_errmsg,
  import_dependency, and scanstring_str now take const char * for their
  read-only string arguments.

Pure cleanup, no behavior change. 344 tests pass on cpython-3.14 release,
cpython-3.14+debug, and cpython-3.14+freethreaded+debug.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
… note

Item A: encoder_listencode_dict key_memo cache was storing non-string
key encodings under `key` (original) while looking them up under
`kstr` (stringified form), so the cache was write-only for int / bool
/ Decimal / float keys. The pre-existing comment already said
"Only cache the encoding of string keys" — the code now matches
intent: take the cache branch only when key is a str (or bytes on
Py2), otherwise encode straight through without touching key_memo.
Pure efficiency fix; output is byte-for-byte identical, and output
tests for string, int, bool, mixed, float, and Decimal keys verify
the branch split.

Item B: six new TestRefcountLeaks subtests exercise error paths
that the previous cleanup batches touched:

  - test_circular_reference_no_leak: ValueError mid-encode must not
    leak the markers entry or the ident PyLong.
  - test_asdict_returning_non_dict_no_leak: the TypeError path in
    encoder_steal_encode must release the stolen newobj.
  - test_for_json_raising_no_leak: exception inside the for_json()
    body must not leak the method binding.
  - test_non_string_dict_keys_no_leak: exercises the new non-cached
    key_memo branch from item A.
  - test_bigint_as_string_no_leak: maybe_quote_bigint's quoted-return
    path must release the replaced `encoded`.

All six pass on cpython-3.14.4+debug with phase2 delta within the
100-ref tolerance.

Item I: AGENTS.md "Useful CFLAGS combinations" now documents that
-Wdeclaration-after-statement trips on cp314t's own refcount.h
(mixed decls + code around line 113) and can only be used against
the standard and standard-debug builds.

https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
@etrepum etrepum added this pull request to the merge queue Apr 11, 2026
Merged via the queue into master with commit e817370 Apr 11, 2026
36 checks passed
@etrepum etrepum deleted the claude/fix-freethreading-builds-bZTdr branch April 11, 2026 19:27
etrepum pushed a commit that referenced this pull request Apr 11, 2026
…emRef for key_memo

Two targeted cleanups to the scanner/encoder hot paths, motivated by
reviewing the code that landed in #367 and comparing against CPython's
_json.c and PR #344. No behavior change; just fewer dict lookups and
cleaner use of the modern strong-reference dict APIs available on
Python 3.13+.

Scanner (_parse_object memo intern):
  The GetItemWithError -> Py_INCREF / PyDict_SetItem dance did two
  hashtable probes for every fresh key. Collapse it to a single
  PyDict_SetDefault (or PyDict_SetDefaultRef on 3.13+), which atomically
  gets-or-sets in one pass. Factored into json_memo_intern_key so the
  _unicode and _str template instantiations share one implementation,
  and the 3.13+ fast path is isolated in one place. The `memokey`
  temporary is gone, the loop body drops 13 lines to 6, and unique-key
  JSON decoding touches the memo dict half as often.

Encoder (key_memo cache lookup in encoder_listencode_dict):
  Replace the GetItemWithError + manual Py_INCREF + PyErr_Occurred check
  with a call to a new json_PyDict_GetItemRef helper. On 3.13+ this
  forwards to PyDict_GetItemRef, which atomically returns a strong
  reference and eliminates the borrowed-reference window that is
  technically racy under free threading even under the coarse self
  critical section. On older Pythons the helper falls back to the
  legacy idiom. The caller becomes a single rc-based branch, and the
  Py_CLEAR(kstr) is no longer duplicated across three arms.

Both changes compile cleanly under -Wall -Wextra -Wshadow
-Wstrict-prototypes -Wdeclaration-after-statement -Werror on CPython
3.11, and under the default CFLAGS on CPython 3.14.0rc2 free-threaded.
Full _cibw_runner suite (354 tests, C + pure-Python passes) passes on
both. 16-thread x 5000-iter stress test on a shared JSONDecoder /
JSONEncoder passes with the GIL disabled.

Explicitly not changed:
- Py_BEGIN_CRITICAL_SECTION(self) in scanner_call and encoder_call.
  The scanner needs it because PyDict_Clear(s->memo) at end-of-call
  would race with concurrent scan_once calls if we switched to a
  per-dict lock; the encoder uses it defensively but c_make_encoder
  is called fresh per JSONEncoder.iterencode() call in the normal
  API flow, so the lock is uncontended in practice. Fine-grained
  container locks (CPython-style, see PR #344 discussion) would only
  help the unusual case of an explicitly shared encoder across
  threads, and the win does not justify the refactor.

https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz
github-merge-queue bot pushed a commit that referenced this pull request Apr 12, 2026
* Modernize scanner/encoder dict ops: SetDefault for memo intern, GetItemRef for key_memo

Two targeted cleanups to the scanner/encoder hot paths, motivated by
reviewing the code that landed in #367 and comparing against CPython's
_json.c and PR #344. No behavior change; just fewer dict lookups and
cleaner use of the modern strong-reference dict APIs available on
Python 3.13+.

Scanner (_parse_object memo intern):
  The GetItemWithError -> Py_INCREF / PyDict_SetItem dance did two
  hashtable probes for every fresh key. Collapse it to a single
  PyDict_SetDefault (or PyDict_SetDefaultRef on 3.13+), which atomically
  gets-or-sets in one pass. Factored into json_memo_intern_key so the
  _unicode and _str template instantiations share one implementation,
  and the 3.13+ fast path is isolated in one place. The `memokey`
  temporary is gone, the loop body drops 13 lines to 6, and unique-key
  JSON decoding touches the memo dict half as often.

Encoder (key_memo cache lookup in encoder_listencode_dict):
  Replace the GetItemWithError + manual Py_INCREF + PyErr_Occurred check
  with a call to a new json_PyDict_GetItemRef helper. On 3.13+ this
  forwards to PyDict_GetItemRef, which atomically returns a strong
  reference and eliminates the borrowed-reference window that is
  technically racy under free threading even under the coarse self
  critical section. On older Pythons the helper falls back to the
  legacy idiom. The caller becomes a single rc-based branch, and the
  Py_CLEAR(kstr) is no longer duplicated across three arms.

Both changes compile cleanly under -Wall -Wextra -Wshadow
-Wstrict-prototypes -Wdeclaration-after-statement -Werror on CPython
3.11, and under the default CFLAGS on CPython 3.14.0rc2 free-threaded.
Full _cibw_runner suite (354 tests, C + pure-Python passes) passes on
both. 16-thread x 5000-iter stress test on a shared JSONDecoder /
JSONEncoder passes with the GIL disabled.

Explicitly not changed:
- Py_BEGIN_CRITICAL_SECTION(self) in scanner_call and encoder_call.
  The scanner needs it because PyDict_Clear(s->memo) at end-of-call
  would race with concurrent scan_once calls if we switched to a
  per-dict lock; the encoder uses it defensively but c_make_encoder
  is called fresh per JSONEncoder.iterencode() call in the normal
  API flow, so the lock is uncontended in practice. Fine-grained
  container locks (CPython-style, see PR #344 discussion) would only
  help the unusual case of an explicitly shared encoder across
  threads, and the win does not justify the refactor.

https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz

* Deduplicate scanner/encoder hot paths: markers helpers, default extract, SKIP_WHITESPACE, field X-macros, n format

Five mechanical cleanups to the C extension, none of which change
behavior. Together they remove ~190 lines of duplication and close
several classes of recurring bug.

encoder_markers_push / encoder_markers_pop (previously duplicated in 3
places):
  The circular-reference marker pattern — PyLong_FromVoidPtr(obj),
  PyDict_Contains, PyDict_SetItem on push, and PyDict_DelItem +
  Py_DECREF on pop — appeared verbatim in encoder_listencode_obj,
  encoder_listencode_dict, and encoder_listencode_list. Three recent
  bug fixes (#358, #360, aa9182d) patched individual sites; factoring
  into two helpers collapses ~60 lines, and any future fix lives in
  one place. The NULL-sentinel convention on ident lets callers invoke
  markers_pop unconditionally on the happy path.

encoder_listencode_default extraction:
  The inner else { ... } of encoder_listencode_obj (RawJSON + iterable
  fallback + markers-tracked defaultfn recursion) lived inline with
  nested `break` into an outer do { } while(0) and a stray indentation
  level from an unbraced scope. Extract it verbatim into its own
  function that returns 0/-1 directly, so the main dispatch loop is a
  clean chain of else-if arms with no `break` inside the final arm.

SKIP_WHITESPACE() macro in _speedups_scan.h:
  `while (idx <= end_idx && IS_WHITESPACE(JSON_SCAN_READ(idx))) idx++;`
  appeared 8 times across _parse_object and _parse_array. Collapse to
  a macro defined alongside JSON_SCAN_FN / JSON_SCAN_CONCAT, #undef'd
  at the bottom of the template so the multi-include pattern stays
  hygienic.

PyArg_ParseTuple "n" format code replaces _convertPyInt_AsSsize_t /
_convertPyInt_FromSsize_t:
  The custom O& converter predates broad "n" (Py_ssize_t) support in
  PyArg_ParseTuple and Py_BuildValue. Both have supported "n" since
  Python 2.5 — the simplejson floor — so we can drop the two wrappers
  and use "n" directly in py_scanstring, scanner_call, encoder_call,
  and raise_errmsg. Saves an indirect function call per parse on three
  hot entry points.

JSON_SCANNER_OBJECT_FIELDS / JSON_ENCODER_OBJECT_FIELDS X-macros:
  scanner_traverse + scanner_clear and encoder_traverse +
  encoder_clear all listed the same fields 2x — an easy place to
  forget a field when adding one (exactly the bug fixed in c23e6d9).
  Collapse to an X-macro field list adjacent to each struct
  definition, used with JSON_VISIT_FIELD / JSON_CLEAR_FIELD local
  expansions. Adding a new PyObject* field now needs one line in the
  X-macro, not two in each of four different functions.

Verification:
- Strict CFLAGS build on CPython 3.11: -Wall -Wextra -Wshadow
  -Wstrict-prototypes -Wdeclaration-after-statement -Werror, clean
- Default CFLAGS build on CPython 3.14.0rc2 free-threaded: clean
- Full _cibw_runner suite on both (354 tests, C + pure-Python paths):
  354/354 pass
- Targeted correctness tests on 3.14t: for_json / _asdict / default /
  iterable_as_array / RawJSON / circular detection on all three
  encoder sites (dict, list, default)
- 16-thread x 5000-iter stress on a shared JSONDecoder and
  JSONEncoder with the GIL disabled: no mismatches, no races

https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz

* Encoder cleanup: all-string-keys dict sort fast path, T_OBJECT -> Py_T_OBJECT_EX

Two independent improvements bundled together because they touch
adjacent code.

#5 — encoder_dict_iteritems fast path for all-string keys:

  Sorted dict encoding with sort_keys=True (or a custom item_sort_key)
  used to go through a double-iteration loop: PyDict_Items produced
  a list, then the code walked it with PyIter_Next, type-checked each
  key, and PyList_Append'd a rebuilt list to sort. For the
  overwhelmingly common case of string-keyed JSON objects this was
  all wasted work — every tuple was kept verbatim and the "slow"
  rebuild list was just a duplicate of the items list.

  Add a fast path: if every key in the items list is already a JSON-
  compatible string (PyUnicode on all versions, plus PyString on
  Python 2), sort `items` in place via the shared
  encoder_sort_items_inplace helper and return iter(items). No
  per-item tuple reallocation, no list alloc, no stringify branch
  in the hot loop.

  On any non-string key the pre-scan bails out and falls through to
  the existing stringify-and-rebuild path, so the slow path is
  preserved exactly as before. Factored the list.sort() call into
  encoder_sort_items_inplace and the "is this a JSON string key"
  test into is_json_string_key so the two paths share one source of
  truth.

  Measured on CPython 3.14t free-threaded, 200-entry string-keyed
  dict with 3-element list values: sort_keys=True is now 0.204 ms/op
  vs 0.197 ms/op for the unsorted path — ~4% overhead, essentially
  just the cost of sorting itself. Previously the double-walk and
  list rebuild added substantial constant-factor overhead on top.

#6 — T_OBJECT -> Py_T_OBJECT_EX on all member descriptors:

  T_OBJECT is deprecated in Python 3.12+ in favor of the new public
  spelling Py_T_OBJECT_EX. The semantic difference is that
  T_OBJECT returns Py_None when the underlying slot is NULL, while
  Py_T_OBJECT_EX raises AttributeError. Keep the Python-visible
  behavior unchanged by:

  1. Defining Py_T_OBJECT_EX to T_OBJECT_EX on pre-3.12 (both
     available via <structmember.h>, identical semantics), so the
     modern spelling compiles on the full 2.5+ version range
     simplejson supports.
  2. Switching encoder_new to store Py_None rather than NULL when
     encoding=None on Python 3, so the .encoding attribute still
     returns None (as it did under T_OBJECT) rather than raising
     AttributeError under Py_T_OBJECT_EX.
  3. Updating the two bytes-handling sentinel checks
     (encoder_stringify_key and encoder_listencode_obj) from
     `s->encoding != NULL` to `s->encoding != Py_None` so the
     internal "is encoding configured" test matches the new
     representation.

  All 20 members across scanner_members and encoder_members updated
  in one pass.

Verification:

- Strict CFLAGS on CPython 3.11: -Wall -Wextra -Wshadow
  -Wstrict-prototypes -Wdeclaration-after-statement -Werror, clean
- Default CFLAGS on CPython 3.14.0rc2 free-threaded: clean
- Full _cibw_runner suite (354 tests, C + pure-Python) on both: OK
- Targeted tests for encoder_dict_iteritems paths: regular dict /
  OrderedDict / dict subclass / empty dict / sort_keys=True with all
  string keys (fast path) / mixed string-and-int keys (slow path) /
  int keys / float keys / skipkeys+non-string / custom item_sort_key
  / unicode keys — all pass
- encoding=None on Py3 round-trip + bytes-key rejection with
  encoding=None: behavior preserved
- 16-thread x 5000-iter stress on shared JSONEncoder with
  sort_keys=True under free threading: no mismatches

https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants