Skip to content

MAINT: Make most DTypes (not np.dtype) HeapTypes#31364

Open
seberg wants to merge 3 commits into
numpy:mainfrom
seberg:heap-dtypes
Open

MAINT: Make most DTypes (not np.dtype) HeapTypes#31364
seberg wants to merge 3 commits into
numpy:mainfrom
seberg:heap-dtypes

Conversation

@seberg
Copy link
Copy Markdown
Member

@seberg seberg commented Apr 30, 2026

This makes all DTypes HeapTypes. For the builtin ones (and non-user created ones) we also mark them all as immortal as it might help a bit with refcounting on threaded execution.

Because it is IMO nonsense to limit that to free-threaded, added a helper and use it throughout.

(Claude was used with guidance to execute what I wanted.)


Summary of changes:

  • Use PyType_FromMetaclass for all dynamically created dtypes:
    • We make sure to also make the immortal, and keep them immutable for now.
  • Simplify the abstract (and semi-concrete Python dtype) creation with a loop.
    • This should init from spec for the abstract ones at least ideally, but this works and is a big first step.
  • add_newdoc is changed to work with mutable heap-types where one can set __doc__. Making all of these immutable means it doesn't matter, but thought I'd just keep it.
    • The new branch is in theory a bit of an improvement as it stores the stripped version (since we have __doc__ available in the dict), but now with immutable it isn't actually used.
      So there is a potential small improvement here.
    • I can pull this out again, I don't think it matters as it's internal, but it may only be quite the right direction.
  • I didn't understand why PyUnstable_SetImmortal was only used for free-threading. It makes sense on all Python as all Python can be threaded. So added NpyUnstable_SetImmortal to set it whenver we can.
    (There could be places where we really do it as a work-around on free-threading, but I don't think we are there yet.)
  • _PyArray_MapPyTypeToDType for the other dtypes is just moved out for now, didn't seem clearer (and the order didn't work out).
  • (StringDType isn't moved yet, but it is straight-forward. The dynamically created ones seemed more interesting.)

Overall, I think this is all very straight-forward actually. I thought there might be something tricky, but the trickiest thing was that all DTypes need to be moved if we move the abstract ones for subclassing.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Apr 30, 2026

@ngoldbaum free-threaded crash in `macOS tests / Accelerate - macos_x86_64 - 3.14t (pull_request). Presumably unrelated (let's see if it reproduces).

EDIT, no it's reliable. This happened before in #30375 except that there it didn't actually segfault. Here it starts to actually segfault.

Details
Fatal Python error: Segmentation fault

<Cannot show all threads while the GIL is disabled>
Stack (most recent call first):
  File "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/tests/test_dtype.py", line 886 in test_tuple_recursion
  File "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/testing/_private/utils.py", line 2865 in wrapper
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/python.py", line 1720 in runtest
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/runner.py", line 353 in from_call
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/xdist/remote.py", line 227 in run_one_test
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/xdist/remote.py", line 206 in pytest_runtestloop
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/main.py", line 372 in _main
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/main.py", line 318 in wrap_session
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/xdist/remote.py", line 427 in <module>
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/execnet/gateway_base.py", line 1291 in executetask
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/execnet/gateway_base.py", line 341 in run
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/execnet/gateway_base.py", line 389 in integrate_as_primary_thread
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/execnet/gateway_base.py", line 1273 in serve
  File "/Library/Frameworks/PythonT.framework/Versions/3.14/lib/python3.14t/site-packages/execnet/gateway_base.py", line 1806 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>

Current thread's C stack trace (most recent call first):
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_DumpStack+0x42 [0x1080430a2]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at faulthandler_dump_c_stack+0x3f [0x10805913f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at faulthandler_fatal_error+0x127 [0x108059027]
  Binary file "/usr/lib/system/libsystem_platform.dylib", at _sigtramp+0x1d [0x7ff81b97c31d]
  Binary file '<unknown>' [0x0]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at _Py_MergeZeroLocalRefcount+0x63 [0x107d9ce03]
  Binary file "/Users/runner/work/numpy/numpy/build-install/usr/lib/python3.14t/site-packages/numpy/_core/_multiarray_umath.cpython-314t-darwin.so", at arraydescr_dealloc+0x123 [0x108aeeed3]
  Binary file "/Library/Frameworks/PythonT.framework/Versions/3.14/PythonT", at subtype_dealloc+0xbaf [0x107dddf7f]
  <truncated rest of calls>

EDIT: My impression is that its OKish to skip the test, but after it's done and we have nightlies, we either re-open the Python issue or make a new one -- I would hope this is Python.

@seberg seberg changed the title MAINT,ENH: Make DTypes (except np.dtype) proper HeapTypes MAINT,ENH: Make most DTypes (not np.dtype) proper HeapTypes Apr 30, 2026
@seberg seberg changed the title MAINT,ENH: Make most DTypes (not np.dtype) proper HeapTypes MAINT: Make most DTypes (not np.dtype) proper HeapTypes Apr 30, 2026
@seberg seberg changed the title MAINT: Make most DTypes (not np.dtype) proper HeapTypes MAINT: Make most DTypes (not np.dtype) HeapTypes Apr 30, 2026
@mattip
Copy link
Copy Markdown
Member

mattip commented May 11, 2026

Any thoughts about benchmarks?

@mattip
Copy link
Copy Markdown
Member

mattip commented May 11, 2026

Very cool, BTW.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented May 12, 2026

I would think it doesn't make a real difference. The main difference I could see would be related to them being implicitly immortal previously and now they are only explicitly immortal.
And I am not sure if the explicitly immortalizing them actually prevents them fully from taking part in cyclic GC handling.

I suppose there may be some other minor differences between heap and static types, but I don't think for the most part.

I ran some benchmarks (-t small -t Small -t scalar as I thought larger array tests really shouldn't notice this). Some are surprisingly different (and consistently so), although a test that runs for 300-400us doesn't seem like it should see heap-type changes... I think those ones randomly spread both ways.

Details
| Change   | Before [c36307a4] <main>   | After [c2bf62ca] <heap-dtypes>   |   Ratio | Benchmark (Parameter)                                                                           |
|----------|----------------------------|----------------------------------|---------|-------------------------------------------------------------------------------------------------|
| +        | 317±0μs                    | 461±0μs                          |    1.45 | bench_ma.MAFunctions1v.time_functions_1v('np', 'log', 'big')                                    |
| +        | 1.92±0μs                   | 2.56±0.01μs                      |    1.34 | bench_ufunc.CustomComparison.time_less_than_scalar1(<class 'numpy.bool'>)                       |
| +        | 36.2±0μs                   | 48.2±0μs                         |    1.33 | bench_ufunc.NDArrayGetItem.time_methods_getitem([0, -1], 'big')                                 |
| +        | 1.94±0μs                   | 2.46±0μs                         |    1.27 | bench_ma.MAMethod0v.time_methods_0v('conjugate', 'big')                                         |
| +        | 2.05±0μs                   | 2.59±0.01μs                      |    1.27 | bench_ufunc.CustomComparison.time_less_than_scalar2(<class 'numpy.uint8'>)                      |
| +        | 175±0μs                    | 219±0μs                          |    1.25 | bench_ma.MAFunctions1v.time_functions_1v('np.ma', 'sqrt', 'big')                                |
| +        | 37.1±0μs                   | 45.3±0μs                         |    1.22 | bench_ma.MAMethodSetItem.time_methods_setitem((0, 0), masked, 'big')                            |
| +        | 34.9±0μs                   | 41.3±0μs                         |    1.18 | bench_ma.MAMethodGetItem.time_methods_getitem((0, 0), 'big')                                    |
| +        | 32.4±0μs                   | 37.1±0μs                         |    1.15 | bench_ma.MAMethodSetItem.time_methods_setitem((-1, 0), masked, 'big')                           |
| +        | 2.07±0ms                   | 2.35±0ms                         |    1.14 | bench_ma.Corrcoef.time_corrcoef('large')                                                        |
| +        | 150±0μs                    | 171±0μs                          |    1.14 | bench_ma.MAFunctions1v.time_functions_1v('np', 'sin', 'big')                                    |
| +        | 324±0μs                    | 370±0μs                          |    1.14 | bench_ma.MAFunctions2v.time_functions_2v('np', 'divide', 'big')                                 |
| +        | 56.0±0μs                   | 63.9±0μs                         |    1.14 | bench_ma.MAMethodGetItem.time_methods_getitem(0, 'big')                                         |
| +        | 36.4±0μs                   | 41.0±0μs                         |    1.13 | bench_ufunc.NDArraySetItem.time_methods_setitem([0, -1], 'big')                                 |
| +        | 149±0.6ns                  | 166±5ns                          |    1.12 | bench_ufunc.NDArrayGetItem.time_methods_getitem((0, 0), 'small')                                |
| +        | 77.9±0.2ns                 | 87.2±10ns                        |    1.12 | bench_ufunc.NDArraySetItem.time_methods_setitem(0, 'small')                                     |
| +        | 169±2ns                    | 188±2ns                          |    1.11 | bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar('sqrt')                                          |
| +        | 230±2ns                    | 253±7ns                          |    1.1  | bench_array_coercion.ArrayCoercionSmall.time_array_subok(1)                                     |
| +        | 150±0.4ns                  | 165±7ns                          |    1.1  | bench_ufunc.NDArrayGetItem.time_methods_getitem((-1, 0), 'small')                               |
| +        | 150±1ns                    | 166±2ns                          |    1.1  | bench_ufunc.UFuncSmall.time_ufunc_python_float('sqrt')                                          |
| +        | 208±0.5ns                  | 227±8ns                          |    1.09 | bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs(1)                          |
| +        | 25.2±0μs                   | 27.5±0μs                         |    1.09 | bench_ma.MAMethod0v.time_methods_0v('transpose', 'big')                                         |
| +        | 162±0.9ns                  | 177±6ns                          |    1.09 | bench_ufunc.NDArrayGetItem.time_methods_getitem(0, 'small')                                     |
| +        | 4.18±0.1ms                 | 4.57±0.03ms                      |    1.09 | bench_ufunc_strides.BinaryComplex.time_binary_scalar_in1(<ufunc 'multiply'>, 4, 2, 4, 'D')      |
| +        | 222±0.7ns                  | 241±5ns                          |    1.08 | bench_array_coercion.ArrayCoercionSmall.time_array_no_copy(1)                                   |
| +        | 172±1ns                    | 186±9ns                          |    1.08 | bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar('cos')                                           |
| +        | 696±2ns                    | 742±9ns                          |    1.07 | bench_ma.Indexing.time_scalar(False, 1, 10)                                                     |
| +        | 714±6ns                    | 765±6ns                          |    1.07 | bench_ma.Indexing.time_scalar(False, 2, 1000)                                                   |
| +        | 273±2ns                    | 292±2ns                          |    1.07 | bench_scalar.ScalarMath.time_multiplication('int32')                                            |
| +        | 284±3ns                    | 303±3ns                          |    1.07 | bench_scalar.ScalarMath.time_multiplication('int64')                                            |
| +        | 339±5ns                    | 358±7ns                          |    1.06 | bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs(1)                                |
| +        | 247±3ns                    | 262±10ns                         |    1.06 | bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs(array([5]))                 |
| +        | 245±0.6ns                  | 259±5ns                          |    1.06 | bench_array_coercion.ArrayCoercionSmall.time_array_subok(np.int64(5))                           |
| +        | 337±0.8ns                  | 358±6ns                          |    1.06 | bench_array_coercion.ArrayCoercionSmall.time_asanyarray_dtype_order([1])                        |
| +        | 699±2ns                    | 737±7ns                          |    1.06 | bench_ma.Indexing.time_scalar(False, 1, 1000)                                                   |
| +        | 718±2ns                    | 761±10ns                         |    1.06 | bench_ma.Indexing.time_scalar(False, 2, 100)                                                    |
| +        | 402±2ns                    | 427±10ns                         |    1.06 | bench_scalar.ScalarMath.time_addition('float16')                                                |
| +        | 30.0±0.3μs                 | 31.7±0.1μs                       |    1.06 | bench_scalar.ScalarStr.time_str_repr('complex128')                                              |
| +        | 182±0.8ns                  | 193±10ns                         |    1.06 | bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar('abs')                                           |
| +        | 219±1ns                    | 230±7ns                          |    1.05 | bench_array_coercion.ArrayCoercionSmall.time_array(array([5]))                                  |
| +        | 331±0.6ns                  | 349±10ns                         |    1.05 | bench_array_coercion.ArrayCoercionSmall.time_array_no_copy([1])                                 |
| +        | 238±2ns                    | 250±7ns                          |    1.05 | bench_array_coercion.ArrayCoercionSmall.time_ascontiguousarray(1)                               |
| +        | 699±2ns                    | 735±10ns                         |    1.05 | bench_ma.Indexing.time_scalar(False, 1, 100)                                                    |
| -        | 1.63±0μs                   | 1.55±0μs                         |    0.95 | bench_polynomial.Polynomial.time_polynomial_evaluation_scalar                                   |
| -        | 2.65±0.03μs                | 2.52±0.01μs                      |    0.95 | bench_scalar.ScalarMath.time_add_int32arr_and_other('int32')                                    |
| -        | 33.8±0.4μs                 | 32.1±0.1μs                       |    0.95 | bench_ufunc_strides.BinaryInt.time_binary_scalar_in0(<ufunc 'minimum'>, 1, 1, 1, 'l')           |
| -        | 40.9±0.5μs                 | 38.7±0.1μs                       |    0.95 | bench_ufunc_strides.BinaryInt.time_binary_scalar_in1(<ufunc 'maximum'>, 2, 2, 2, 'l')           |
| -        | 23.5±0.3μs                 | 22.3±0.8μs                       |    0.95 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'bitwise_and'>, 1, 1, 1, 'b') |
| -        | 46.2±0.7μs                 | 43.8±2μs                         |    0.95 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'multiply'>, 1, 1, 1, 'H')    |
| -        | 46.2±0.3μs                 | 43.8±1μs                         |    0.95 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'subtract'>, 1, 1, 1, 'H')    |
| -        | 45.4±0.6μs                 | 43.0±1μs                         |    0.95 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'bitwise_and'>, 1, 1, 1, 'H') |
| -        | 23.8±0.2μs                 | 22.5±1μs                         |    0.95 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'multiply'>, 1, 1, 1, 'B')    |
| -        | 4.17±0μs                   | 3.93±0μs                         |    0.94 | bench_ufunc.NDArraySetItem.time_methods_setitem((0, 0), 'big')                                  |
| -        | 333±0.9ns                  | 314±8ns                          |    0.94 | bench_ufunc.UFuncSmall.time_ufunc_small_array_inplace('sqrt')                                   |
| -        | 46.7±0.4μs                 | 43.8±2μs                         |    0.94 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'bitwise_and'>, 1, 1, 1, 'h') |
| -        | 46.6±0.3μs                 | 43.7±2μs                         |    0.94 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'bitwise_xor'>, 1, 1, 1, 'h') |
| -        | 36.3±0.6μs                 | 34.0±0.9μs                       |    0.94 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'logical_xor'>, 1, 1, 1, 'h') |
| -        | 24.0±0.07μs                | 22.4±0.8μs                       |    0.94 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'bitwise_xor'>, 1, 1, 1, 'B') |
| -        | 2.00±0.01ms                | 1.86±0.01ms                      |    0.93 | bench_alloc_cache.SmallArrayCreation.time_empty_loop(256)                                       |
| -        | 6.72±0.05μs                | 6.25±0.01μs                      |    0.93 | bench_core.NumPyChar.time_add_big_list_small_string                                             |
| -        | 24.0±0.1μs                 | 22.3±0.6μs                       |    0.93 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'bitwise_and'>, 1, 1, 1, 'B') |
| -        | 24.2±0.1μs                 | 22.4±0.9μs                       |    0.93 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'bitwise_xor'>, 1, 1, 1, 'b') |
| -        | 46.8±1μs                   | 43.3±2μs                         |    0.93 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'bitwise_xor'>, 1, 1, 1, 'h') |
| -        | 24.7±0.3μs                 | 22.9±1μs                         |    0.93 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'subtract'>, 1, 1, 1, 'B')    |
| -        | 45.7±1μs                   | 42.6±1μs                         |    0.93 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'subtract'>, 1, 1, 1, 'H')    |
| -        | 1.61±0.01ms                | 1.48±0ms                         |    0.92 | bench_alloc_cache.SmallArrayCreation.time_empty_loop(16)                                        |
| -        | 1.75±0.02ms                | 1.60±0.01ms                      |    0.91 | bench_alloc_cache.SmallArrayCreation.time_empty_loop(128)                                       |
| -        | 1.60±0.02ms                | 1.46±0.02ms                      |    0.91 | bench_alloc_cache.SmallArrayCreation.time_empty_loop(4)                                         |
| -        | 2.01±0.01ms                | 1.83±0.01ms                      |    0.91 | bench_alloc_cache.SmallArrayCreation.time_empty_loop(512)                                       |
| -        | 4.82±0.2ms                 | 4.39±0.02ms                      |    0.91 | bench_ufunc_strides.BinaryComplex.time_binary_scalar_in0(<ufunc 'add'>, 4, 4, 4, 'D')           |
| -        | 47.9±1μs                   | 43.5±2μs                         |    0.91 | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in1(<ufunc 'bitwise_and'>, 1, 1, 1, 'h') |
| -        | 1.60±0.01ms                | 1.44±0.01ms                      |    0.9  | bench_alloc_cache.SmallArrayCreation.time_empty_loop(64)                                        |
| -        | 274±0μs                    | 248±0μs                          |    0.9  | bench_ma.MAFunctions2v.time_functions_2v('np.ma', 'power', 'big')                               |
| -        | 16.8±0μs                   | 15.1±0μs                         |    0.9  | bench_ufunc.NDArraySetItem.time_methods_setitem(0, 'big')                                       |
| -        | 3.73±0.05ms                | 3.37±0.1ms                       |    0.9  | bench_ufunc_strides.BinaryComplex.time_binary_scalar_in0(<ufunc 'subtract'>, 1, 2, 4, 'D')      |
| -        | 24.9±0.4μs                 | 22.4±0.8μs                       |    0.9  | bench_ufunc_strides.BinaryIntContig.time_binary_scalar_in0(<ufunc 'bitwise_or'>, 1, 1, 1, 'b')  |
| -        | 1.62±0.02ms                | 1.44±0.01ms                      |    0.89 | bench_alloc_cache.SmallArrayCreation.time_empty_loop(127)                                       |
| -        | 228±0μs                    | 197±0μs                          |    0.86 | bench_ma.MAFunctions1v.time_functions_1v('np.ma', 'sin', 'big')                                 |
| -        | 166±0μs                    | 143±0μs                          |    0.86 | bench_ma.MAFunctions2v.time_functions_2v('np', 'multiply', 'big')                               |
| -        | 372±0μs                    | 318±0μs                          |    0.86 | bench_ma.MAFunctions2v.time_functions_2v('np.ma', 'divide', 'big')                              |
| -        | 48.2±0μs                   | 41.3±0μs                         |    0.86 | bench_ma.MAMethodSetItem.time_methods_setitem(0, 17, 'big')                                     |
| -        | 34.7±0μs                   | 29.4±0μs                         |    0.85 | bench_ma.MAMethod0v.time_methods_0v('ravel', 'big')                                             |
| -        | 48.6±0μs                   | 41.3±0μs                         |    0.85 | bench_ma.MAMethodSetItem.time_methods_setitem((0, 0), 17, 'big')                                |
| -        | 345±0μs                    | 291±0μs                          |    0.84 | bench_ma.MAFunctions1v.time_functions_1v('np', 'sqrt', 'big')                                   |
| -        | 191±0μs                    | 156±0μs                          |    0.82 | bench_ma.MAFunctions1v.time_functions_1v('np.ma', 'log', 'big')                                 |
| -        | 10.5±0μs                   | 7.80±0μs                         |    0.74 | bench_ufunc.NDArrayGetItem.time_methods_getitem((-1, 0), 'big')                                 |
| -        | 2.58±0.01μs                | 1.82±0μs                         |    0.71 | bench_ufunc.CustomComparison.time_less_than_scalar1(<class 'numpy.int8'>)                       |
| -        | 10.5±0μs                   | 7.11±0μs                         |    0.68 | bench_ufunc.NDArrayGetItem.time_methods_getitem(0, 'big')                                       |
| -        | 11.7±0μs                   | 7.68±0μs                         |    0.66 | bench_ufunc.NDArrayGetItem.time_methods_getitem((0, 0), 'big')                                  |

seberg added 2 commits May 12, 2026 11:52
This makes all DTypes HeapTypes.  For the builtin ones (and non-user
created ones) we also mark them all as immortal as it might help
a bit with refcounting on threaded execution.

Because it is IMO nonsense to limit that to free-threaded, added a
helper and use it throughout.

(Claude was used with guidance to execute what I wanted.)
@mattip
Copy link
Copy Markdown
Member

mattip commented May 12, 2026

Seems something is off with a release note

/home/circleci/repo/doc/source/release/notes-towncrier.rst:341: WARNING: py:obj reference target not found: numpy.dtypes.register_dlpack_dtype
/home/circleci/repo/doc/source/release/notes-towncrier.rst:341: WARNING: py:obj reference target not found: numpy.dtypes.register_dlpack_dtype
/home/circleci/repo/doc/source/release/notes-towncrier.rst:341: WARNING: py:obj reference target not found: numpy.dtypes.register_dlpack_dtype

Some are surprisingly different (and consistently so),

I wonder if heap types are forcing a different memory layout and causing some cache misses. These are consistent over a few runs and recompilations?

@seberg
Copy link
Copy Markdown
Member Author

seberg commented May 12, 2026

Seems something is off with a release note

Yeah, I'll open a PR. I thought it was just a missing rebase first, but I think I truly forgot to add this function to the docs probably... And it didn't register because I thought there was a general CI failure that also hit the circle CI build.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented May 12, 2026

I wonder if heap types are forcing a different memory layout and causing some cache misses. These are consistent over a few runs and recompilations?

I don't really think so, but of course it might change allocation patterns in the grand scheme? The allocation size didn't change (they were heap-type sized also before, we just didn't use the heaptype part of it) and in practice we also heap allocated them before as well...

The changes in those benchmarks are consistent yes, but I think we had +consistent random shifts in performance before in many PRs that are completely unrelated?

One other thing I could imagine (since I can't find gc.disable in asv) is that the pattern in which the GC kicks in changed and that makes a difference. Although I doubt that should make such a big difference...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants