ENH, PERF: allocate memory as part of the array object for scalars#31092
ENH, PERF: allocate memory as part of the array object for scalars#31092mhvk wants to merge 2 commits intonumpy:mainfrom
Conversation
87e035f to
0f8ba39
Compare
| npy_free_cache_dim(PyArray_DIMS(arr), PyArray_NDIM(arr)); | ||
| /* deallocate if not part of the array instance */ | ||
| if ((PyArray_DIMS(arr) != NULL) | ||
| && (PyArray_DIMS(arr) != |
There was a problem hiding this comment.
Possibly, there should be another flag for whether the strides and dimensions are stored on the array, but it seemed overkill.
There was a problem hiding this comment.
NPY_ARRAY_DIMSONINSTANCE seems clearer to me than the pointer math that happens here but I agree it's not a big deal
c5f5dfd to
4ee2623
Compare
eendebakpt
left a comment
There was a problem hiding this comment.
Some comments and small things, but overall I am +1 on this
There are a couple of cases (pickling, reshapes) where we could make more use of the inline data, dims and strides, but since these are deprecated or not performance critical I suggest we do not add them here.
| * This flag may be tested for in PyArray_FLAGS(arr). | ||
| * (MAYBE: allow it to be requested as well?) | ||
| */ | ||
| #define NPY_ARRAY_DATAONINSTANCE 0x0008 |
There was a problem hiding this comment.
This is not strictly needed (where the data is allocated is an implementation detail, in any case for the default memory handler). But fine in having this in the interface. If we keep this, can we add a test for it?
There was a problem hiding this comment.
Indeed, this should be an implementation detail, and in principle one could also tell that the data is on the instance from the fact that mem_handler=NULL -- unfortunately, in the code elsewhere that is taken to mean that someone messed around with the instance and just set data (it is free'd in the deallocator).
That said, I agree this does not need to be public; I just need to have some way to know the data are on the instance internally.
I actually wondered whether one could adjust the default memory handler to somehow keep track of this, but am not sure how... (it doesn't have access to the instance, after all).
Allocate memory for the dimensions and strides as part of the object, to avoid the overhead of an extra allocation. This will only become useless memory if the array shape is set (now deprecated) or the array is resized in-place (frowned upon). Note that the above is done only for standard arrays, though I think in principle it could be extended to subttypes with a suitable check. Also, allocating space for data is done only for the standard allocator. Timings ``` %timeit np.empty(()) 133->127 ns a = 1.6 %timeit np.array(a) 151->137 ns %timeit np.add(a, a) 581->557 ns b = np.array(a) %timeit np.add(b, b) 293->279 ns ```
4ee2623 to
50923bb
Compare
|
|
||
| fail: | ||
| NPY_traverse_info_xfree(&fill_zero_info); | ||
| Py_XDECREF(fa->mem_handler); |
There was a problem hiding this comment.
This was a mistake in the failure path - mem_handler gets decref'd in the deallocator too.
Allocate memory for small amounts of data as part of the object, to avoid the overhead of the allocation of tracked memory. Like for the dimensions and strides, this is done only for standard arrays. Furthermore, allocating space for data is done only for the standard allocator. Timings relative to having dimensions, strides and data not on the object: ``` %timeit np.empty(()) 133->68.2 ns a = 1.6 %timeit np.array(a) 151->83 ns %timeit np.add(a, a) 581->423 ns b = np.array(a) %timeit np.add(b, b) 293->225 ns ```
50923bb to
13df5f5
Compare
|
OK, now split into 2 commits, one for storing the dimensions and strides on the object (which has only a modest benefit for performance, see commit message), and the second for also storing small amounts of data. I'm still not quite happy with having to keep track of whether or not the data is on the instance via a flag; suggestions for alternatives appreciated! Ideally it would somehow be possible to let the default allocator do some of the work, by registering the address of the data-on-instance as memory that does not need freeing (so |
ngoldbaum
left a comment
There was a problem hiding this comment.
I left some minor comments below. I asked Claude Code to do a review pass and most of these are generated from the report is made. All the wording is from me and I tried to validate everything I commented about.
| return PyErr_NoMemory(); | ||
| } | ||
| /* PyType_GenericAlloc zeroes the extra bytes, but we don't need to. */ | ||
| fa = (PyArrayObject_fields *)PyObject_Init((PyObject *)alloc, subtype); |
There was a problem hiding this comment.
this can fail to due to memory exhaustion, might as well add a NULL check just in case
There was a problem hiding this comment.
This cannot fail - the allocation has already been made above. Python also does not check:
https://github.com/python/cpython/blob/9e5b8383724211d14165a32c0e7682e56e13843a/Objects/object.c#L531-L540
| Py_DECREF(descr); | ||
| return PyErr_NoMemory(); | ||
| } | ||
| /* PyType_GenericAlloc zeroes the extra bytes, but we don't need to. */ |
There was a problem hiding this comment.
| /* PyType_GenericAlloc zeroes the extra bytes, but we don't need to. */ | |
| /* PyType_GenericAlloc zeroes the extra bytes, but we don't need to | |
| because all fields are explicitly initialized below */ |
for future readers, in case new fields ever get added?
There was a problem hiding this comment.
Agreed. I've done this in the other PR, though, since I think that ended up the better approach.
| npy_free_cache_dim(PyArray_DIMS(arr), PyArray_NDIM(arr)); | ||
| /* deallocate if not part of the array instance */ | ||
| if ((PyArray_DIMS(arr) != NULL) | ||
| && (PyArray_DIMS(arr) != |
There was a problem hiding this comment.
NPY_ARRAY_DIMSONINSTANCE seems clearer to me than the pointer math that happens here but I agree it's not a big deal
| * TODO: make this a plain else; see comment in array_dealloc. | ||
| */ | ||
| if (newnbytes <= oldnbytes) { | ||
| new_data = PyArray_DATA(self); |
There was a problem hiding this comment.
If we get here and PyArray_NDIM(self) != new_nd, then we might get into a state where the dimensions are stored outside the array but the data are still stored on-instance. Maybe there should be a PyArray_NDIM(self) == new_nd check here?
| /* | ||
| * Create the array instance. | ||
| * | ||
| * For standard arrays, we allocate extra memory for the dimensions and |
There was a problem hiding this comment.
I would say "for ndarrays (but not subtypes)" instead of 'standard arrays'.
| Py_DECREF(descr); | ||
| return NULL; | ||
| if (subtype == &PyArray_Type) { | ||
| size_t size = subtype->tp_basicsize; |
There was a problem hiding this comment.
On 64 bit architectures, this is 96 bytes, which is enough space for 12 64 bit data elements. So this optimization applies for small arrays as well as scalars. Unless I'm missing something that restricts it to just scalars.
The PR description and title talk about scalars, so maybe that behavior is unintentional.
There was a problem hiding this comment.
Sorry, originally I only did scalars, but now it is for small arrays as well, indeed.
|
See gh-31108 for an alternative implementation that uses a "hacked" allocator to deal with possible resize, etc., but avoids having a flag. It has the advantage that it needs far less change elsewhere in the code base, thus giving hope that if people use |
@seberg, @eendebakpt: this is an updated (and greatly improved) version of #29878, making use of the fact that in previous PRs I removed the explicit dependence on how data were allocated.
Details
Allocate memory for the dimensions and strides, as well as small amounts of data, as part of the object, to avoid the overhead of multiple allocations (with the allocation of tracked memory for data especially large).
Note that the above is done only for standard arrays, though I think in principle it could be extended to subttypes with a suitable check.
Also, allocating space for data is done only for the standard allocator.
Timings
No AI was used.