Conversation
02901f5 to
0bbd582
Compare
Convert Scanner and Encoder from static PyTypeObject to heap types via PyType_FromSpec on Python 3.13+, fixing SIGSEGV crashes on Python 3.14t (free-threaded) builds across macOS and Windows. Changes to _speedups.c (all guarded by #if PY_VERSION_HEX >= 0x030D0000): - Add _speedups_state module state struct holding heap type objects - Create heap types from PyType_Spec in module_exec (multi-phase init) - Proper heap type lifecycle: save tp before tp_free, Py_DECREF(tp) in dealloc; Py_VISIT(Py_TYPE(self)) in traverse - Add speedups_traverse/speedups_clear for module state GC - Declare Py_mod_gil = Py_MOD_GIL_NOT_USED for free-threaded Python - Guard PyScanner_Check/PyEncoder_Check macros for pre-3.13 only - Pre-3.13 code paths remain completely unchanged Changes to CI: - Update cibuildwheel v3.2.1 -> v3.4.1 with CIBW_ENABLE for free-threading wheel builds - Add dedicated test_free_threading CI job for Python 3.14t - Parallelize Linux wheel builds by architecture - Update actions/setup-python to v5.6.0 Tests: - Add TestHeapTypes class verifying Py_TPFLAGS_HEAPTYPE, GC tracking, and basic encode/decode functionality on 3.13+ https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
0bbd582 to
e0018d6
Compare
cibuildwheel v1.12.0 does not have cp27 manylinux images for aarch64, so restrict the 2.7 build step to x86_64 only. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Expand the module state struct to hold ALL constants (string literals, RawJSONType, JSONDecodeError, JSON_itemgetter0) in addition to the heap type objects already there. Thread state through all functions using local shadow variables that override the file-scope globals on 3.13+, minimizing changes to function bodies. Utility functions (raise_errmsg, join_list_unicode, _encoded_const, is_raw_json, scanstring_unicode) get extra parameters on 3.13+ with convenience macros at call sites. Add module_ref (strong ref to owning module) to PyScannerObject and PyEncoderObject for accessing per-module state from any scanner or encoder method. Use PyType_FromModuleAndSpec to bind heap types to their module. Initialize all constants in module_exec instead of init_constants(). Pre-3.13 code path remains completely unchanged. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
The workflow was refactored to split builds by architecture, which changed the job names. Add gate jobs that aggregate the new granular jobs under the old names expected by branch protection rules: - "Build wheels on ubuntu-latest" (also gates on free-threading + pure python tests) - "Build wheels on windows-latest" - "Build wheels on macos-latest" https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Now that the C extension declares Py_MOD_GIL_NOT_USED and has heap types + per-module state, the speedups are safe to use on 3.14t. The is_gil_enabled() guard was only a temporary workaround while the C extension was unsafe under free-threading. Removing the guard means _speedups is actually loaded and exercised on cp314t in CI, which is what we need to be testing. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Previously the C extension had two parallel code paths: pre-3.13 used file-scope static globals for type objects and interned constants, while 3.13+ used per-module state via PyModule_GetState. Every function that touched constants had #if guards to pick the right path, and several helper functions had dual signatures. Now _speedups_state is defined for all versions: - On Python 3.13+ it remains per-module state, so subinterpreters each get their own copy. - On older versions there's a single static instance plus a borrowed reference to the module object so Scanner and Encoder instances can store module_ref uniformly. get_speedups_state(module_ref) works the same way everywhere, so function bodies no longer branch on PY_VERSION_HEX for state access. The convenience macros (RAISE_ERRMSG, ENCODED_CONST, IS_RAW_JSON, JOIN_LIST_UNICODE, SCANSTRING_UNICODE) collapse to single definitions and the dual-signature helpers become single functions. A new init_speedups_state() helper handles constant initialization for both module_exec (3.13+) and moduleinit (pre-3.13), replacing the standalone init_constants() function. Net result: 329 lines removed, 133 added. Only the type creation path (static PyTypeObject vs PyType_FromModuleAndSpec heap types) and module init shape (single-phase vs multi-phase) remain version- specific. Verified: all 153 tests pass with the C extension loaded on Python 3.11, PYTHON_GIL=0 import produces no RuntimeWarning, round-trip encode/decode works for all value types. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
The previous commit removed file-scope statics like JSON_NaN, JSON_Infinity, JSON_NegInfinity, JSONDecodeError, etc. but missed that the Python 2 scanner code paths reference some of them directly as bare names. Fixes for Python 2: - Rename raise_errmsg() function to raise_errmsg_impl() so a Python 2-only macro can re-introduce the 3-arg form, transparently pulling JSONDecodeError from _speedups_static_state. All existing raise_errmsg(msg, s, end) call sites in Python 2 scanner functions keep working without modification. - Add a shadow block at the top of scan_once_str (Python 2 only) for JSON_NaN, JSON_Infinity, JSON_NegInfinity, mirroring the pattern used by scan_once_unicode. Verified: all 153 tests still pass with C speedups on Python 3.11. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Previously PyScannerType and PyEncoderType lived in the struct only on 3.13+. On pre-3.13 they stayed as file-scope statics referenced directly via &PyScannerType. That left the state struct layout slightly different across versions for no real reason. Now the type fields are always in the struct. On pre-3.13 they hold borrowed pointers to the (eternal) static PyTypeObject instances. No refcounting or GC tracking is needed — static types live forever — but the struct layout is fully uniform. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Two regressions from bumping cibuildwheel 3.2.1 -> 3.4.1: 1. cibuildwheel 3.4.1 ships a newer bundled virtualenv that fails to bootstrap Python 3.8 on Windows, so cp38-win32 and cp38-win_amd64 builds crash during virtualenv creation. Python 3.8 has been EOL since October 2024, and skipping it on Windows only (rather than all platforms) minimizes the impact on existing users. 2. The previous CIBW_SKIP: 'pp*' line was accidentally dropped when splitting the matrix by architecture. PyPy wheels were never part of the release set -- PyPy is covered by the test_pure_python job. This is purely a CI environment fix; no code changes. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
- Remove CIBW_ENABLE: 'cpython-freethreading'. cibuildwheel warns that this option is deprecated and should be removed; in 3.4+ free-threaded builds are the default for cp313t/cp314t. - Revert the cp38-win* skip; the Windows failure was a timeout, not a Python 3.8 virtualenv issue as I previously assumed. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Two new test files, kept separate from test_speedups.py because they exercise Python runtime features (concurrent threads, subinterpreters) rather than C-extension internals: simplejson/tests/test_free_threading.py (TestFreeThreading) - test_concurrent_encode / test_concurrent_decode: 8 threads x 500 iterations each, verifying stable output - test_concurrent_encode_decode: mixes encode and decode on the same data across threads - test_shared_encoder_instance: a single JSONEncoder and JSONDecoder instance used by many threads These pass with the GIL enabled, but exist to catch data races on free-threaded builds (PEP 703). The test_free_threading CI job runs them with PYTHON_GIL=0 on 3.14t. simplejson/tests/test_subinterpreters.py (TestSubinterpreters) - import, encode, decode in a fresh subinterpreter - multiple subinterpreters used concurrently - destroying one subinterpreter while another still uses the module - heap types stay heap types inside a subinterpreter (3.13+) Skipped on Python < 3.12 (subinterpreters require PEP 684). CI workflow (.github/workflows/build-and-deploy.yml): - New 'Verify Python is a free-threaded build' step in the test_free_threading job asserts sysconfig.Py_GIL_DISABLED == 1, so the job fails loudly if setup-python ever gives us a GIL-enabled build under the 3.14t moniker. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Helper functions that previously took individual state fields as arguments now take a single _speedups_state pointer and dereference what they need: - raise_errmsg_impl(msg, s, end, state) - _encoded_const(obj, state) - is_raw_json(obj, state) - join_list_unicode(lst, state) - scanstring_unicode(..., state) The convenience macros RAISE_ERRMSG, ENCODED_CONST, IS_RAW_JSON, JOIN_LIST_UNICODE, SCANSTRING_UNICODE are now pointless pass-throughs and are deleted. Call sites use the function names directly. Caller functions no longer need per-field shadow declarations (PyObject *JSON_s_null = _st->JSON_s_null; etc.). They only need the _st pointer itself, which is passed through. JSON_Accu now stores the state pointer instead of a borrowed empty_unicode reference, keeping the struct version-independent and letting flush_accumulator pass state to join_list_unicode directly. The Python 2 raise_errmsg macro shim remains but is simpler: it now passes &_speedups_static_state instead of resolving JSONDecodeError at the macro site. Net: 130 lines removed, 78 added. All 163 tests still pass. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Three polish items after the state-based helper refactor: 1. Rename all local _speedups_state pointers from '_st' to 'state' for consistency with the parameter name used in helper signatures. 2. Rename raise_errmsg_impl back to raise_errmsg. This required updating every Python 2 scanner function (scanstring_str, _parse_object_str, _parse_array_str, _match_number_str, scan_once_str, py_scanstring) to thread 'state' explicitly instead of relying on a macro shim that resolved state from the static instance. scanstring_str and join_list_string grew an explicit state parameter. The Python 2 raise_errmsg macro is gone. 3. Remove the state field from JSON_Accu. JSON_Accu_Init is back to taking just the accu, and state flows through JSON_Accu_Accumulate, JSON_Accu_FinishAsList, flush_accumulator, and _steal_accumulate as an explicit parameter. The encoder call sites already had state in scope, so the extra arg is free. Net diff: 129 deletions, 127 additions. All 163 tests still pass. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Context/environment arguments are conventionally placed first, not last. Reorder every helper that takes _speedups_state so that state is the first parameter: - raise_errmsg(state, msg, s, end) - _encoded_const(state, obj) - is_raw_json(state, obj) - join_list_unicode(state, lst) - join_list_string(state, lst) - scanstring_unicode(state, pystr, end, strict, next_end_ptr) - scanstring_str(state, pystr, end, encoding, strict, next_end_ptr) - flush_accumulator(state, acc) - JSON_Accu_Accumulate(state, acc, unicode) - JSON_Accu_FinishAsList(state, acc) - _steal_accumulate(state, accu, stolen) init_speedups_state(state, module) already had state first. Pure reordering: 106 insertions, 106 deletions. All 163 tests pass. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
actions/setup-python@v5 automatically resolves to the latest v5.x release (currently v5.6.0), so the exact pin is unnecessary. This matches the convention on master. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Three warnings flagged by -Wshadow (pre-existing, not from the free-threading refactor, but easy to fix): - scanstring_unicode: local 'digit' shadowed the 'digit' type from Python's longintrepr.h. Renamed to 'hex_digit' (4 sites across the Python 2 and Python 3 hex-decoding loops). - encoder_listencode_obj: inner 'PyObject *newobj' declaration shadowed an outer one. The outer variable is reused by the inner block; the earlier branches that use it (for_json, _asdict) always DECREF before falling through, so it's safe to drop the inner redeclaration. The extension now compiles cleanly under -Wall -Wextra -Wshadow -Wstrict-prototypes with no new warnings. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Builds CPython 3.14.0 from source with --with-pydebug, caches the install prefix (~10 min first run, ~10 sec on cache hit), and runs the test suite against it. This catches: - Refcount leaks (via TestRefcountLeaks, auto-enabled when sys.gettotalrefcount is present) - Py_DECREF asserts, NULL-pointer dereferences, and internal consistency checks that release builds skip - Shadow/strict warnings (-Wall -Wextra -Wshadow -Werror) Stock Ubuntu python3-dbg only offers 3.12, and neither actions/setup-python nor the deadsnakes PPA ships -dbg packages for 3.13/3.14, so building from source (with caching) is the only route to a 3.14 debug interpreter. New simplejson/tests/test_speedups.py::TestRefcountLeaks covers: - dumps / loads round-trip - Scanner / Encoder construction - Error paths in scanner_new / encoder_new (module_ref release) The class is guarded by @skipUnless(hasattr(sys, 'gettotalrefcount')) so it's inert on release builds and skips silently. gate_ubuntu now depends on test_debug_build so the aggregate job used by branch protection waits for it too. TSan and a suppression-clean valgrind run would both require further custom CPython builds and remain future work. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
3.14.4 is the latest stable release. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
- actions/checkout: v4 -> v6 - actions/cache: v4 -> v5 - docker/setup-qemu-action: v3 -> v4 - actions/upload-artifact: v4 -> v7 - actions/download-artifact: v4 -> v8 Also drop the strict warnings + -Werror step from test_debug_build. Python 3.14's debug headers appear to trigger warnings that our -Wshadow / -Wstrict-prototypes / -Werror combination doesn't tolerate. The primary value of this job is runtime assertion coverage and the TestRefcountLeaks suite, not catching compile warnings - we get those on the non-debug builds anyway. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…tring_bitcount
The Python 3.14 debug CI failure was -Werror=shift-negative-value on
this line in encoder_new:
s->min_long_size = PyLong_FromLongLong(-1LL << n);
Left-shifting a negative value is undefined behavior in C. It was
latent before because no prior build used -Wextra + -Werror, but it's
an actual bug. Fixed by computing -(2^n) as:
-(long long)((1ULL << n) - 1ULL) - 1LL
which stays defined for the whole n in 1..63 range and produces
LLONG_MIN at n == 63 (the boundary case). Verified with
int_as_string_bitcount = 1, 8, 32, 62, 63 that the boundary values
stringify correctly.
CI: switch test_debug_build from building CPython 3.14.4 from
source (~8 min cold) to `uv python install cpython-3.14.4+debug`,
which pulls a prebuilt debug interpreter from python-build-standalone
in under 10 seconds. No cache dance, no apt build-deps, no sudo.
Also bring back -Werror + -Wshadow + -Wstrict-prototypes on the
debug build now that the shift-negative-value bug is fixed.
No YAML changes needed for the earlier action bumps: upload-artifact
v4->v7 and download-artifact v4->v8 didn't rename or remove any
inputs we use (name, path, merge-multiple all still work).
https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
v8.0.0 is the newest setup-uv release but the floating v8 tag has not been published yet, so @v8 fails to resolve. Pinning to the exact tag. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
REQUIRE_SPEEDUPS only affects setup.py's build fallback behavior - it's a no-op on test invocations. If a built _speedups.so happens to be importable but c_make_encoder/c_make_scanner don't get wired into simplejson.encoder/simplejson.scanner (e.g. missing export, stale cached .so), the test suite's TestMissingSpeedups silently skipTest()s instead of failing CI. Add explicit wiring checks to both test_free_threading and test_debug_build that verify: simplejson.encoder.c_make_encoder is _speedups.make_encoder simplejson.scanner.c_make_scanner is _speedups.make_scanner If either fails the job fails loudly instead of silent-skipping every C-extension test. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Five related fixes, all directly addressing risks I flagged during the post-refactor review: 1. Multi-phase init (PEP 489) for all Python 3.5+. module_exec is now the single code path for every Python 3 version; the only internal branches are static types vs heap types (<3.13 vs 3.13+), and PyModule_AddObjectRef vs PyModule_AddObject (<3.10 vs 3.10+). The old moduleinit() single-phase path is gone on Python 3; it's still used on Python 2.7 via init_speedups(), and there's a 3.3/3.4 fallback in PyInit__speedups that calls module_exec directly. 2. pre-3.13 module-reload safety: init_speedups_state now calls a new reset_speedups_state_constants() helper that Py_CLEARs every constant before re-populating. If anyone does 'del sys.modules["simplejson._speedups"]; import ...' the prior references are released instead of leaking. Type fields are left alone -- on 3.13+ module_exec freshly creates new heap types on each init, and on pre-3.13 the type fields hold borrowed pointers to static PyTypeObjects that must not be cleared. 3. PyUnicode_READY is deprecated in 3.10 and a no-op since 3.12 (PEP 623). Override it to a no-op on 3.12+ so we stop calling the deprecated function and dodge the eventual removal. 4. Debug assert in get_speedups_state: every call site passes either the module object or an incref'd module_ref, so module != NULL is invariant. Asserting it catches uninitialized scanner/encoder instances in one place instead of sprinkling asserts at ~20 sites. 5. Latent undefined-behavior cleanup: replace SIZEOF_LONG_LONG (a Python-private pyconfig.h macro) with sizeof(long long) * CHAR_BIT in the int_as_string_bitcount path. Added #include <limits.h>. Also speedups_clear on 3.13+ now calls reset_speedups_state_constants instead of open-coding 17 Py_CLEAR lines (dedup with the reload path). Test harness: TestRefcountLeaks used a naive single-phase comparison that measured ~50-272 refcount drift per 2000 iterations on CPython 3.14 debug due to specializer inline caches settling in. Replaced with a two-phase measurement: phase1 absorbs all the noise, phase2 should be ~0 even on 3.14 debug, and any real per-call leak shows up as a large phase2. Verified stable across 5 consecutive runs on cpython-3.14.4+debug from python-build-standalone. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
test_boundary_at_max_bitcount exercises each of n = 1, 8, 31, 32, 62, 63 with values at and just inside +/- 2**n. The n = 63 case is the regression test for the -Wshift-negative-value UB I fixed earlier in encoder_new; if anyone ever reintroduces -1LL << n the LLONG_MIN boundary case will fail this test instead of silently misbehaving. The test uses the full +/- 2**n range including the sign bit position (n = 63), which is precisely where the old computation was undefined. (I also considered a TestRefcountLeaks.test_module_reload_no_leak for the init_speedups_state reload-safety fix, but CPython's import machinery caches extension modules for normal reload and the only scenario that actually re-runs module_exec on pre-3.13 is a subinterpreter import on 3.5-3.11, which I can't drive from a unit test portably. The fix is kept as defense-in-depth.) https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
python-build-standalone ships both cpython-3.14.4+debug and cpython-3.14.4+freethreaded+debug, so add a matrix dimension to the test_debug_build job and run both. The free-threaded variant stacks three coverage modes in a single job: - Debug interpreter (refcount asserts, TestRefcountLeaks active) - Free-threading (Py_GIL_DISABLED=1, GIL can be disabled) - Strict compile warnings (-Wall -Wextra -Wshadow -Werror) For the free-threaded variant we additionally: - Assert sysconfig.Py_GIL_DISABLED == 1 so a regression in the python-build-standalone asset naming can't silently give us a GIL-enabled debug build - Run the test suite again with PYTHON_GIL=0 to surface any data races that the GIL was still papering over Verified locally against cpython-3.14.4+freethreaded+debug: 344 tests pass both with the GIL on and with PYTHON_GIL=0, clean compile with full strict warnings. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
join_list_string used a function-local static PyObject *joinfn to cache the bound method for ''.join on first call. Move it into _speedups_state as JSON_EmptyStr_join, populated in init_speedups_state alongside JSON_EmptyStr and released in reset_speedups_state_constants. Python 2 only, so no behavioral change on modern Python. Now consistent with every other cached object in the module, and init_speedups_state fails eagerly if the .join lookup fails at module load time instead of on the first call site. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Factor the four _str/_unicode function pairs out of _speedups.c and
into a new templated header, _speedups_scan.h, that's #included twice
from _speedups.c with different macro settings: once for the unicode
variant (used by all Python 3 and Py2 unicode input) and once more on
Python 2 for the bytes variant.
The two variants only ever differed in:
- Character read: PyString byte vs PyUnicode_READ codepoint
- Data/length accessors: PyString_AS_STRING / PyString_GET_SIZE vs
PyUnicode_DATA / PyUnicode_GET_LENGTH (+ PyUnicode_KIND)
- scanstring call: the str path passes an extra `encoding` arg
- Number fast paths: _str uses PyOS_string_to_double + PyInt_FromString
on the underlying char buffer; _unicode goes through the
parse_int / parse_float callables (with a PyFloat_FromString
fast path). Factored into four small inline helpers
(_match_number_{float,int}_fast_{str,unicode}).
- numstr substring creation: PyString_FromStringAndSize vs
PyUnicode_Substring (or PyUnicode_FromUnicode on Py2 narrow)
Everything else -- the whole decode state machine, the error paths,
the recursion guards, the memoization -- is shared. 951 lines of
duplicated code removed from _speedups.c, 514 lines added in
_speedups_scan.h, for a net diff of -368 lines and one authoritative
source of truth for the parser logic.
scanstring_str and scanstring_unicode are intentionally NOT templated:
scanstring_str has Py2-specific hybrid return-type logic (bytes when
input is ASCII-only, unicode otherwise) that's structurally different
from the unicode path. That duplication remains as future work.
Verified on release Python 3.11 (169 tests) and on Python 3.14.4
debug + free-threaded debug under strict compile warnings
(-Wall -Wextra -Wshadow -Wstrict-prototypes -Werror). Py2 is
verified by inspection only since Py2 isn't available locally;
the macro expansion is straightforwardly equivalent to the deleted
_str functions.
Build wiring:
- setup.py lists _speedups_scan.h in Extension.depends so changes
trigger a rebuild
- MANIFEST.in includes simplejson/*.h so the header ships in sdist
https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
cibuildwheel v3.x disables PyPy by default and v3.4 errors out with 'Invalid skip selector: "pp*". This selector matches a group that wasn't enabled.' when you explicitly skip it. Remove the line from the v3.4.1 step. The Python 2.7 step still needs CIBW_SKIP 'pp*' because it uses cibuildwheel v1.12.0, which builds PyPy by default. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
The scanstring_str template macro referenced a local `encoding` variable that was only declared in _parse_object_str (via the now-removed JSON_SCAN_MAYBE_ENCODING_DECL). scan_once_str and, by inheritance, any other function that used the JSON_SCAN_SCANSTRING_CALL macro didn't declare `encoding`, so the _str template expansion referenced an undefined identifier on Py2 builds. Original scan_once_str side-stepped this by passing PyString_AS_STRING(s->encoding) inline at the call site. Copy that pattern into the JSON_SCAN_SCANSTRING_CALL macro so every _str template expansion resolves `encoding` lookups via `s->encoding` directly. The JSON_SCAN_MAYBE_ENCODING_DECL macro and its single use in _parse_object become unnecessary and are deleted. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…type checks, cached attribute names
Five small improvements from the post-refactor review:
1. Fix -Wdeclaration-after-statement at two sites:
- get_speedups_state wraps its 3.13+ path in an inner block so
`void *state = PyModule_GetState(module);` is at the top of a
scope (mixed-decl-and-code blocker was my earlier assert(module)
statement preceding the declaration).
- maybe_quote_bigint declares `int ge, le;` at the top instead of
interleaving.
The file is now clean under -Wdeclaration-after-statement so it
can be built against strict C89 too.
2. Consolidate duplicated test helpers. has_speedups() and
skip_if_speedups_missing() were defined verbatim in
test_speedups.py, test_free_threading.py, and test_subinterpreters.py.
Moved to simplejson/tests/_helpers.py and imported from all three.
3. Drop trailing semicolons from the PyObject_CallNoArgs and
PyObject_CallOneArg compat macros. The stray `;` made
`return PyObject_CallOneArg(...)` expand to a statement followed
by a null statement, which worked but was sloppy and would break
inside expression contexts.
4. Remove the static `PyTypeObject PyScannerType` / `PyEncoderType`
tentative forward declarations on pre-3.13. The four type-check
macros now route through `_speedups_static_state.PyScannerType`
etc, which matches the 3.13+ pattern of looking up the type via
state. module_exec still sets `state->PyScannerType = (PyObject *)
&PyScannerType` using the full static type body defined later in
the file; no forward decl needed because the bodies come before
module_exec in source order.
5. Cache hot-path attribute names in module state. Every call to
_call_json_method used to call PyObject_GetAttrString("for_json")
or "_asdict" which interned the C string on each call. Similarly
encoder_dict_iteritems did GetAttrString("sort") per sorted dict,
and encoder_listencode_obj did GetAttrString("encoded_json") per
RawJSON value. Added four new state fields (JSON_attr_for_json,
JSON_attr_asdict, JSON_attr_sort, JSON_attr_encoded_json),
initialized once via JSON_InternFromString, visited/cleared as
usual. _call_json_method now takes a PyObject * instead of
const char * and uses PyObject_GetAttr directly. The old
FOR_JSON_METHOD_NAME and ASDICT_METHOD_NAME #defines are gone.
Verified: 169 tests on Python 3.11 release, 342 on 3.14.4 debug,
344 on 3.14.4 free-threaded debug (both GIL on and PYTHON_GIL=0),
all under -Wall -Wextra -Wshadow -Wstrict-prototypes -Werror.
https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
Collects the things that aren't obvious from reading the source but that come up repeatedly when working on this project: - How to get alternate Python variants (debug, free-threaded debug) via uv + python-build-standalone in ~10 seconds instead of building CPython from source - Why _cibw_runner must be used instead of unittest discover (it re-runs the suite with speedups disabled to exercise the pure- Python path) - No local way to test Py2.7 and the implied feedback-loop cost - CI raw logs require admin perms; debugging workflow when only the step annotation is visible - cibuildwheel v3 vs v1.12 PyPy default-enable differences - Extension module reload doesn't actually re-run module_exec - Refcount leak tests on 3.14 debug need a two-phase measurement - Python 3.14 release adds -Wunreachable-code to default CFLAGS - _speedups_scan.h is #included twice and relies on #undef cleanup - Don't Py_CLEAR type fields in reset_speedups_state_constants - REQUIRE_SPEEDUPS only affects setup.py, not tests; hence the explicit wiring check in test_free_threading / test_debug_build - GH Actions floating v8 tag for setup-uv doesn't exist (yet) - Test helpers are in simplejson/tests/_helpers.py https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…cro, encoder format string, more
Eight assorted improvements from the second post-refactor review:
1. Delete the unused PY3_UNUSED macro (defined twice under
PY_MAJOR_VERSION, never referenced anywhere).
2. Merge the `static\nPyTypeObject Foo = {` two-line header into
one line for the pre-3.13 type bodies. Purely cosmetic.
3. Extract encoder_steal_encode() helper for the for_json / _asdict
recursive-encode dance. Both paths in encoder_listencode_obj were
~12 lines of repeated "check newobj != NULL, Py_EnterRecursiveCall,
recurse, Py_DECREF, Py_LeaveRecursiveCall" boilerplate. The
helper takes an as_dict flag to select between encoder_listencode_
obj (for_json) and encoder_listencode_dict + TypeError-on-non-dict
(_asdict). The two call sites collapse to one line each.
4. Scanner_new grew a local LOAD_ATTR(field, name) macro that
consolidates the "getattr or goto bail" pattern into a single
line per attribute, replacing 7 three-line blocks with 6
one-line loads (the encoding attr still has its own handling
because it goes through JSON_ParseEncoding).
5. encoder_new's 20-char "OOOOOOOOOOOOOOOOOOOO:make_encoder" format
string is now built from per-argument pieces, each on its own
line with a comment naming the parameter. Adding/removing an
encoder argument now touches one line instead of requiring a
mental count of O's in a string literal.
6. JSON_Accu_Destroy gained a comment documenting that it's safe to
call after JSON_Accu_FinishAsList (which already cleared
small_strings and transferred large_strings).
7. maybe_quote_bigint inverted its top-level condition for an early
return on the common "int_as_string_bitcount is None" fast path,
reducing the nested indentation for the whole function body.
Also dropped an intermediate `encoded = quoted;` assignment in
favor of a direct `return quoted;`.
8. _speedups_scan.h now requires the caller to `#define
JSON_SPEEDUPS_SCAN_INCLUDING 1` before #include and `#undef` it
after. Accidentally including the template file from anywhere
other than _speedups.c (or forgetting the setup preamble) is
now a hard #error. Added an example block to the file header.
Verified: 169 tests on Python 3.11 release, 344 on 3.14.4 debug,
344 on 3.14.4 free-threaded debug (both GIL on and PYTHON_GIL=0),
all under -Wall -Wextra -Wshadow -Wstrict-prototypes -Werror.
https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
…r, const correctness - Extract encoder_long_to_str() to replace the duplicated "normalize through PyLong_Type if not exact, then PyObject_Str" dance in encoder_stringify_key and encoder_listencode_obj. - Remove dead PyObject *items local in encoder_listencode_dict (declared + XDECREF'd on bail but never assigned). - Remove dead "TODO: DOES NOT RUN" indent blocks in encoder_listencode_dict and encoder_listencode_list. encoder.py only invokes the C encoder when self.indent is None, so s->indent != Py_None is unreachable. - Drop the redundant ident = NULL; statement in encoder_listencode_list (the declaration already initializes it). - Tighten const-correctness on string parameters: raise_errmsg, import_dependency, and scanstring_str now take const char * for their read-only string arguments. Pure cleanup, no behavior change. 344 tests pass on cpython-3.14 release, cpython-3.14+debug, and cpython-3.14+freethreaded+debug. https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
… note
Item A: encoder_listencode_dict key_memo cache was storing non-string
key encodings under `key` (original) while looking them up under
`kstr` (stringified form), so the cache was write-only for int / bool
/ Decimal / float keys. The pre-existing comment already said
"Only cache the encoding of string keys" — the code now matches
intent: take the cache branch only when key is a str (or bytes on
Py2), otherwise encode straight through without touching key_memo.
Pure efficiency fix; output is byte-for-byte identical, and output
tests for string, int, bool, mixed, float, and Decimal keys verify
the branch split.
Item B: six new TestRefcountLeaks subtests exercise error paths
that the previous cleanup batches touched:
- test_circular_reference_no_leak: ValueError mid-encode must not
leak the markers entry or the ident PyLong.
- test_asdict_returning_non_dict_no_leak: the TypeError path in
encoder_steal_encode must release the stolen newobj.
- test_for_json_raising_no_leak: exception inside the for_json()
body must not leak the method binding.
- test_non_string_dict_keys_no_leak: exercises the new non-cached
key_memo branch from item A.
- test_bigint_as_string_no_leak: maybe_quote_bigint's quoted-return
path must release the replaced `encoded`.
All six pass on cpython-3.14.4+debug with phase2 delta within the
100-ref tolerance.
Item I: AGENTS.md "Useful CFLAGS combinations" now documents that
-Wdeclaration-after-statement trips on cp314t's own refcount.h
(mixed decls + code around line 113) and can only be used against
the standard and standard-debug builds.
https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95
etrepum
pushed a commit
that referenced
this pull request
Apr 11, 2026
…emRef for key_memo Two targeted cleanups to the scanner/encoder hot paths, motivated by reviewing the code that landed in #367 and comparing against CPython's _json.c and PR #344. No behavior change; just fewer dict lookups and cleaner use of the modern strong-reference dict APIs available on Python 3.13+. Scanner (_parse_object memo intern): The GetItemWithError -> Py_INCREF / PyDict_SetItem dance did two hashtable probes for every fresh key. Collapse it to a single PyDict_SetDefault (or PyDict_SetDefaultRef on 3.13+), which atomically gets-or-sets in one pass. Factored into json_memo_intern_key so the _unicode and _str template instantiations share one implementation, and the 3.13+ fast path is isolated in one place. The `memokey` temporary is gone, the loop body drops 13 lines to 6, and unique-key JSON decoding touches the memo dict half as often. Encoder (key_memo cache lookup in encoder_listencode_dict): Replace the GetItemWithError + manual Py_INCREF + PyErr_Occurred check with a call to a new json_PyDict_GetItemRef helper. On 3.13+ this forwards to PyDict_GetItemRef, which atomically returns a strong reference and eliminates the borrowed-reference window that is technically racy under free threading even under the coarse self critical section. On older Pythons the helper falls back to the legacy idiom. The caller becomes a single rc-based branch, and the Py_CLEAR(kstr) is no longer duplicated across three arms. Both changes compile cleanly under -Wall -Wextra -Wshadow -Wstrict-prototypes -Wdeclaration-after-statement -Werror on CPython 3.11, and under the default CFLAGS on CPython 3.14.0rc2 free-threaded. Full _cibw_runner suite (354 tests, C + pure-Python passes) passes on both. 16-thread x 5000-iter stress test on a shared JSONDecoder / JSONEncoder passes with the GIL disabled. Explicitly not changed: - Py_BEGIN_CRITICAL_SECTION(self) in scanner_call and encoder_call. The scanner needs it because PyDict_Clear(s->memo) at end-of-call would race with concurrent scan_once calls if we switched to a per-dict lock; the encoder uses it defensively but c_make_encoder is called fresh per JSONEncoder.iterencode() call in the normal API flow, so the lock is uncontended in practice. Fine-grained container locks (CPython-style, see PR #344 discussion) would only help the unusual case of an explicitly shared encoder across threads, and the win does not justify the refactor. https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 12, 2026
* Modernize scanner/encoder dict ops: SetDefault for memo intern, GetItemRef for key_memo Two targeted cleanups to the scanner/encoder hot paths, motivated by reviewing the code that landed in #367 and comparing against CPython's _json.c and PR #344. No behavior change; just fewer dict lookups and cleaner use of the modern strong-reference dict APIs available on Python 3.13+. Scanner (_parse_object memo intern): The GetItemWithError -> Py_INCREF / PyDict_SetItem dance did two hashtable probes for every fresh key. Collapse it to a single PyDict_SetDefault (or PyDict_SetDefaultRef on 3.13+), which atomically gets-or-sets in one pass. Factored into json_memo_intern_key so the _unicode and _str template instantiations share one implementation, and the 3.13+ fast path is isolated in one place. The `memokey` temporary is gone, the loop body drops 13 lines to 6, and unique-key JSON decoding touches the memo dict half as often. Encoder (key_memo cache lookup in encoder_listencode_dict): Replace the GetItemWithError + manual Py_INCREF + PyErr_Occurred check with a call to a new json_PyDict_GetItemRef helper. On 3.13+ this forwards to PyDict_GetItemRef, which atomically returns a strong reference and eliminates the borrowed-reference window that is technically racy under free threading even under the coarse self critical section. On older Pythons the helper falls back to the legacy idiom. The caller becomes a single rc-based branch, and the Py_CLEAR(kstr) is no longer duplicated across three arms. Both changes compile cleanly under -Wall -Wextra -Wshadow -Wstrict-prototypes -Wdeclaration-after-statement -Werror on CPython 3.11, and under the default CFLAGS on CPython 3.14.0rc2 free-threaded. Full _cibw_runner suite (354 tests, C + pure-Python passes) passes on both. 16-thread x 5000-iter stress test on a shared JSONDecoder / JSONEncoder passes with the GIL disabled. Explicitly not changed: - Py_BEGIN_CRITICAL_SECTION(self) in scanner_call and encoder_call. The scanner needs it because PyDict_Clear(s->memo) at end-of-call would race with concurrent scan_once calls if we switched to a per-dict lock; the encoder uses it defensively but c_make_encoder is called fresh per JSONEncoder.iterencode() call in the normal API flow, so the lock is uncontended in practice. Fine-grained container locks (CPython-style, see PR #344 discussion) would only help the unusual case of an explicitly shared encoder across threads, and the win does not justify the refactor. https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz * Deduplicate scanner/encoder hot paths: markers helpers, default extract, SKIP_WHITESPACE, field X-macros, n format Five mechanical cleanups to the C extension, none of which change behavior. Together they remove ~190 lines of duplication and close several classes of recurring bug. encoder_markers_push / encoder_markers_pop (previously duplicated in 3 places): The circular-reference marker pattern — PyLong_FromVoidPtr(obj), PyDict_Contains, PyDict_SetItem on push, and PyDict_DelItem + Py_DECREF on pop — appeared verbatim in encoder_listencode_obj, encoder_listencode_dict, and encoder_listencode_list. Three recent bug fixes (#358, #360, aa9182d) patched individual sites; factoring into two helpers collapses ~60 lines, and any future fix lives in one place. The NULL-sentinel convention on ident lets callers invoke markers_pop unconditionally on the happy path. encoder_listencode_default extraction: The inner else { ... } of encoder_listencode_obj (RawJSON + iterable fallback + markers-tracked defaultfn recursion) lived inline with nested `break` into an outer do { } while(0) and a stray indentation level from an unbraced scope. Extract it verbatim into its own function that returns 0/-1 directly, so the main dispatch loop is a clean chain of else-if arms with no `break` inside the final arm. SKIP_WHITESPACE() macro in _speedups_scan.h: `while (idx <= end_idx && IS_WHITESPACE(JSON_SCAN_READ(idx))) idx++;` appeared 8 times across _parse_object and _parse_array. Collapse to a macro defined alongside JSON_SCAN_FN / JSON_SCAN_CONCAT, #undef'd at the bottom of the template so the multi-include pattern stays hygienic. PyArg_ParseTuple "n" format code replaces _convertPyInt_AsSsize_t / _convertPyInt_FromSsize_t: The custom O& converter predates broad "n" (Py_ssize_t) support in PyArg_ParseTuple and Py_BuildValue. Both have supported "n" since Python 2.5 — the simplejson floor — so we can drop the two wrappers and use "n" directly in py_scanstring, scanner_call, encoder_call, and raise_errmsg. Saves an indirect function call per parse on three hot entry points. JSON_SCANNER_OBJECT_FIELDS / JSON_ENCODER_OBJECT_FIELDS X-macros: scanner_traverse + scanner_clear and encoder_traverse + encoder_clear all listed the same fields 2x — an easy place to forget a field when adding one (exactly the bug fixed in c23e6d9). Collapse to an X-macro field list adjacent to each struct definition, used with JSON_VISIT_FIELD / JSON_CLEAR_FIELD local expansions. Adding a new PyObject* field now needs one line in the X-macro, not two in each of four different functions. Verification: - Strict CFLAGS build on CPython 3.11: -Wall -Wextra -Wshadow -Wstrict-prototypes -Wdeclaration-after-statement -Werror, clean - Default CFLAGS build on CPython 3.14.0rc2 free-threaded: clean - Full _cibw_runner suite on both (354 tests, C + pure-Python paths): 354/354 pass - Targeted correctness tests on 3.14t: for_json / _asdict / default / iterable_as_array / RawJSON / circular detection on all three encoder sites (dict, list, default) - 16-thread x 5000-iter stress on a shared JSONDecoder and JSONEncoder with the GIL disabled: no mismatches, no races https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz * Encoder cleanup: all-string-keys dict sort fast path, T_OBJECT -> Py_T_OBJECT_EX Two independent improvements bundled together because they touch adjacent code. #5 — encoder_dict_iteritems fast path for all-string keys: Sorted dict encoding with sort_keys=True (or a custom item_sort_key) used to go through a double-iteration loop: PyDict_Items produced a list, then the code walked it with PyIter_Next, type-checked each key, and PyList_Append'd a rebuilt list to sort. For the overwhelmingly common case of string-keyed JSON objects this was all wasted work — every tuple was kept verbatim and the "slow" rebuild list was just a duplicate of the items list. Add a fast path: if every key in the items list is already a JSON- compatible string (PyUnicode on all versions, plus PyString on Python 2), sort `items` in place via the shared encoder_sort_items_inplace helper and return iter(items). No per-item tuple reallocation, no list alloc, no stringify branch in the hot loop. On any non-string key the pre-scan bails out and falls through to the existing stringify-and-rebuild path, so the slow path is preserved exactly as before. Factored the list.sort() call into encoder_sort_items_inplace and the "is this a JSON string key" test into is_json_string_key so the two paths share one source of truth. Measured on CPython 3.14t free-threaded, 200-entry string-keyed dict with 3-element list values: sort_keys=True is now 0.204 ms/op vs 0.197 ms/op for the unsorted path — ~4% overhead, essentially just the cost of sorting itself. Previously the double-walk and list rebuild added substantial constant-factor overhead on top. #6 — T_OBJECT -> Py_T_OBJECT_EX on all member descriptors: T_OBJECT is deprecated in Python 3.12+ in favor of the new public spelling Py_T_OBJECT_EX. The semantic difference is that T_OBJECT returns Py_None when the underlying slot is NULL, while Py_T_OBJECT_EX raises AttributeError. Keep the Python-visible behavior unchanged by: 1. Defining Py_T_OBJECT_EX to T_OBJECT_EX on pre-3.12 (both available via <structmember.h>, identical semantics), so the modern spelling compiles on the full 2.5+ version range simplejson supports. 2. Switching encoder_new to store Py_None rather than NULL when encoding=None on Python 3, so the .encoding attribute still returns None (as it did under T_OBJECT) rather than raising AttributeError under Py_T_OBJECT_EX. 3. Updating the two bytes-handling sentinel checks (encoder_stringify_key and encoder_listencode_obj) from `s->encoding != NULL` to `s->encoding != Py_None` so the internal "is encoding configured" test matches the new representation. All 20 members across scanner_members and encoder_members updated in one pass. Verification: - Strict CFLAGS on CPython 3.11: -Wall -Wextra -Wshadow -Wstrict-prototypes -Wdeclaration-after-statement -Werror, clean - Default CFLAGS on CPython 3.14.0rc2 free-threaded: clean - Full _cibw_runner suite (354 tests, C + pure-Python) on both: OK - Targeted tests for encoder_dict_iteritems paths: regular dict / OrderedDict / dict subclass / empty dict / sort_keys=True with all string keys (fast path) / mixed string-and-int keys (slow path) / int keys / float keys / skipkeys+non-string / custom item_sort_key / unicode keys — all pass - encoding=None on Py3 round-trip + bytes-key rejection with encoding=None: behavior preserved - 16-thread x 5000-iter stress on shared JSONEncoder with sort_keys=True under free threading: no mismatches https://claude.ai/code/session_011EfS4WKeHCX3xPsmHuvCnz --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes SIGSEGV crashes in the C extension on Python 3.14 free-threaded
builds (
cp314t), modernizes_speedups.cto CPython's currentconventions, deduplicates the parser state machine across the Py2
bytes / unicode paths, and adds CI coverage that actually catches
regressions in this area going forward.
Python 2.7 compatibility is preserved throughout. All Python-3-only
changes are guarded by
#if PY_VERSION_HEX >= 0x030D0000or#if PY_MAJOR_VERSION >= 3as appropriate.Root-cause fix for the
cp314tSIGSEGVPyType_FromModuleAndSpecon 3.13+.Scannerand
Encoderare no longer staticPyTypeObjects; properPy_DECREF(tp)in dealloc andPy_VISIT(Py_TYPE(self))intraverse. Static types are fundamentally incompatible with
Py_MOD_GIL_NOT_USED, which is the direct cause of the 3.14tcrash.
Py_mod_gil = Py_MOD_GIL_NOT_USEDdeclared unconditionally on3.13+, matching CPython's own
_json.Py_mod_multiple_interpreters = Py_MOD_PER_INTERPRETER_GIL_SUPPORTED.is_gil_enabled()Python-side guard that previouslydisabled C speedups under free-threading entirely. The speedups
are now the default on 3.14t.
encoder_new:-1LL << n(undefined in C forany negative operand) replaced with
-(long long)((1ULL << n) - 1ULL) - 1LL,which is defined over 1..63 and produces
LLONG_MINat n == 63.Caught by the new
test_debug_buildCI job under-Werror=shift-negative-value.Unified module state and multi-phase init
One
_speedups_statestruct, used on every Python version.each subinterpreter gets its own copy.
(
_speedups_static_state) that serves the whole process.get_speedups_state(module_ref)has the same signature everywhere;call sites don't branch on
PY_VERSION_HEX.Py_mod_execslot) is used on all Python3.5+, not just 3.13+, so there's one module-init code path for
modern Python 3. Python 2.7 and 3.3/3.4 retain single-phase init.
Scanner/Encoderinstances carrymodule_refuniformly. On3.13+ it comes from
PyType_GetModuleByDef; on older versionsit's a borrowed pointer to the module object captured at init
time via
_speedups_module.RawJSONType,JSONDecodeError,JSON_itemgetter0, and 4 new cached attribute names(
for_json,_asdict,sort,encoded_json) all live in state.PyScannerType,PyEncoderType) live in thestruct on every Python version; on pre-3.13 they're borrowed
pointers to the static
PyTypeObjectbodies, and the type-checkmacros route through
_speedups_static_state.PyScannerTypeinstead of through a forward-declared file-scope symbol.
Helper signature cleanup
Every internal helper that needs state takes
_speedups_state *stateas its first argument:
The convenience macros
RAISE_ERRMSG,ENCODED_CONST,IS_RAW_JSON,JOIN_LIST_UNICODE,SCANSTRING_UNICODE— previouslypresent to paper over two sets of function signatures — are all
deleted. Local variable naming is consistent (
state, never_st).JSON_Accuno longer carries a state pointer; state flows throughAccumulate/FinishAsList/flush_accumulator/_steal_accumulateas an explicit parameter.Templated parser state machine
The five near-identical
scan_once_{str,unicode},_parse_object_{str,unicode},_parse_array_{str,unicode},_match_number_{str,unicode}function pairs are now generated from asingle source of truth in
simplejson/_speedups_scan.h, which is#included twice from_speedups.c— once for the unicode variantand once on Python 2 for the bytes variant. The differences between
variants (character reads, length/data accessors, substring creation,
fast paths for int/float parsing) are parameterized via a small set
of macros. Python-2-specific fast paths (
PyOS_string_to_double,PyInt_FromString) are preserved via four inline helper functions.scanstring_strandscanstring_unicodeare intentionally nottemplated because
scanstring_strhas Py2-specific hybrid return-type logic (returns
byteswhen the input is ASCII-only,unicodeotherwise) that's structurally different from the always-unicode
flow. Left as future work in a follow-up PR.
The template file has a
#define JSON_SPEEDUPS_SCAN_INCLUDING 1gate so that accidentally including it from anywhere other than
_speedups.cis a compile-time#error.Python side
simplejson/compat.py,__init__.py,encoder.py,scanner.py,decoder.py: removed theis_gil_enabled()runtime guard. Thatcheck was a temporary workaround disabling the C extension on
free-threaded Python; now that the extension is safe under
Py_MOD_GIL_NOT_USED, the Python-side gate is gone andcp314tactually exercises the C path.
PyUnicode_READYoverridden as a no-op on Python 3.12+ (PEP 623made it a no-op, and it's scheduled for removal).
-Wshadowwarnings fixed (innerdigitlocalsshadowing
digitfromlongintrepr.h; innernewobjshadowingan outer declaration in
encoder_listencode_obj).SIZEOF_LONG_LONG(a Python-private pyconfig macro) replacedwith
sizeof(long long) * CHAR_BIT.scanner_newgained a localLOAD_ATTRmacro that consolidatesthe repeated getattr-or-bail boilerplate.
encoder_new's 20-char"OOOOOOOOOOOOOOOOOOOO"format string isnow constructed from per-argument pieces so each
"O"has acomment naming its parameter.
encoder_listencode_objextracted anencoder_steal_encode()helper that handles the
for_json/_asdictrecursive-encodedance in one place with an
as_dictflag selecting betweenencoder_listencode_objandencoder_listencode_dict(plusTypeError on non-dict).
(
state->JSON_attr_{for_json,asdict,sort,encoded_json}), so theencoder uses
PyObject_GetAttrwith a pre-interned name insteadof
PyObject_GetAttrString(which interns the C literal on everycall).
PY3_UNUSEDmacro removed.static PyTypeObject PyScannerType;tentative forwarddeclarations removed; type-check macros route through state.
maybe_quote_bigintinverted for an early return on thecommon
int_as_string_bitcount is Nonefast path.-Wdeclaration-after-statementsites fixed; the file nowcompiles cleanly under strict C89.
PyObject_CallNoArgs/PyObject_CallOneArgcompat macros losttheir trailing semicolons so they're usable in expression
contexts.
init_speedups_statenow callsreset_speedups_state_constantsto clear any prior values beforerepopulating, so the static state can be re-initialized safely
(defensive; extension module reload doesn't normally re-run init
on multi-phase modules, but subinterpreter imports on 3.5-3.11
can).
Tests
New:
simplejson/tests/test_free_threading.pyTestFreeThreading(4 tests): 8 threads × 500 iters each, coveringconcurrent encode, concurrent decode, mixed encode/decode, and a
single shared
JSONEncoder/JSONDecoderinstance exercised bymany threads. Designed to surface data races under
PYTHON_GIL=0.New:
simplejson/tests/test_subinterpreters.pyTestSubinterpreters(6 tests, 3.12+ only): import, encode, decode,3 concurrent subinterpreters, state independence across destroy,
and heap types inside a subinterpreter.
New:
simplejson/tests/test_speedups.py::TestHeapTypes(3.13+)Verifies
Py_TPFLAGS_HEAPTYPE, GC tracking, and correctround-tripping.
New:
simplejson/tests/test_speedups.py::TestRefcountLeaks(debug builds)Guarded by
@skipUnless(hasattr(sys, 'gettotalrefcount')). Coversdumps/loads,Scanner/Encoderconstruction, and the errorpath in
scanner_new/encoder_new(BadBooltriggers bailpartway through). Uses a two-phase measurement — phase 1 absorbs
CPython 3.14 specializer noise, phase 2 must be near zero — to
avoid flakiness on modern Python debug builds.
New:
simplejson/tests/test_bitsize_int_as_string.py::test_boundary_at_max_bitcountRegression test for the
-1LL << nUB fix; exercisesn = 1, 8, 31, 32, 62, 63and both±(2**n)boundaries.Consolidated:
simplejson/tests/_helpers.pyhas_speedups()andskip_if_speedups_missing()moved out of thethree test files that used to duplicate them.
CI (
.github/workflows/build-and-deploy.yml)New job:
test_free_threadingRuns on Python
3.14t:sysconfig.Py_GIL_DISABLED == 1so the job failsloudly if
setup-pythonever hands us a GIL-enabled build underthe
3.14tmoniker.simplejson._speedupsunderPYTHON_GIL=0with-W error::RuntimeWarning.simplejson.encoder.c_make_encoder is make_encoder(and the same for scanner) so a silently-broken wheel — one that
imports fine but leaves
c_make_encoder is None— doesn't makeevery C-extension test skip quietly.
PYTHON_GIL=0.New job:
test_debug_build(matrixed)Matrix over
standardandfree-threadedvariants, both installedvia
uv python install cpython-3.14.4+debugandcpython-3.14.4+freethreaded+debugfrom python-build-standalone(~10 seconds per variant, no custom build). Each variant:
sys.gettotalrefcountis present._speedups.cwith-Wall -Wextra -Wshadow -Wstrict-prototypes -Werror(caught the shift-negative-value UB).TestRefcountLeaksauto-enables onthe debug interpreter.
The free-threaded variant additionally:
sysconfig.Py_GIL_DISABLED == 1.PYTHON_GIL=0.Wheel / sdist jobs
CIBW_ENABLE: "cpython-freethreading"removed (deprecated incibuildwheel 3.4; free-threaded builds are on by default).
CIBW_SKIP: "pp*"removed from the v3.4.1 step (PyPy is off bydefault in 3.x; explicit skip is now an error). Kept on the
v1.12.0 step for Python 2.7.
x86_64,aarch64,ppc64le) into separate matrix entries.x86_64(QEMU-based 2.7 buildson other arches are no longer viable).
Build wheels on {ubuntu,windows,macos}- latest) preserve the job names required by existing branchprotection rules. The ubuntu gate waits for
test_pure_python,test_free_threading, andtest_debug_build.Dependency bumps
pypa/cibuildwheelv3.2.1 → v3.4.1actions/checkoutv4 → v6actions/cachev4 → v5docker/setup-qemu-actionv3 → v4actions/upload-artifactv4 → v7actions/download-artifactv4 → v8astral-sh/setup-uvpinned to v8.0.0 (new job dependency)AGENTS.md
New top-level file collecting the tribal knowledge accumulated in
this PR that isn't obvious from reading the source: using uv +
python-build-standalone for alternate Python variants locally, how
to debug CI failures when only the annotations are visible, the
cibuildwheel version gotchas, why
importlib.reload(_speedups)does not actually re-run
module_exec, the two-phase measurementtrick for
TestRefcountLeakson 3.14 debug, the_speedups_scan.husage contract, the invariant that type fields must not be
Py_CLEARed inreset_speedups_state_constants, and theREQUIRE_SPEEDUPS=1scope limitation (build-only).Test plan
including
TestRefcountLeaks)both with the GIL on and with
PYTHON_GIL=0(344 tests)-Wall -Wextra -Wshadow -Wstrict-prototypes -Werroron all of 3.10 / 3.11 / 3.12 /3.13 / 3.14.4 / 3.14.4 debug / 3.14.4 free-threaded debug
-Wdeclaration-after-statement(strict C89 mixed-decls)
PYTHON_GIL=0 python -c "import simplejson._speedups"emitsno
RuntimeWarningon 3.14t_speedups.sorefcount stable across 32k threaded ops and5k failing constructions
int_as_string_bitcountboundary exercised for n = 1, 8, 31,32, 62, 63 including the
LLONG_MINedge case at n == 63_speedups.cand_speedups_scan.hFile-level diff
simplejson/_speedups.csimplejson/_speedups_scan.hNet: the C extension source is ~380 lines smaller despite gaining
heap types, per-module state, multi-phase init, four new test
suites, and a templated parser body.
Known limits / future work
scanstring_str/scanstring_unicodetemplating: theremaining big duplication (~200 lines). Blocked on Py2 hybrid
return-type handling; deferred to a follow-up PR.
true race detection on the free-threaded path. Requires a
custom CPython build; not wired up.
PYTHONMALLOC=malloc: wouldgive true leak detection. Requires a custom CPython build with
--with-valgrind; not wired up. The refcount regression testson the debug job catch most of what valgrind would.
https://claude.ai/code/session_01EoWzUsmRRvrZBF2nwQhF95