Skip to content

ENH: Implement opaque PyObject ABI support #31091

Open
kumaraditya303 wants to merge 28 commits intonumpy:mainfrom
kumaraditya303:opaque
Open

ENH: Implement opaque PyObject ABI support #31091
kumaraditya303 wants to merge 28 commits intonumpy:mainfrom
kumaraditya303:opaque

Conversation

@kumaraditya303
Copy link
Copy Markdown
Contributor

@kumaraditya303 kumaraditya303 commented Mar 29, 2026

PR summary

This PR adds support for building opaque PyObject stable ABI extensions using NumPy.

AI Disclosure

Claude was used to generate the tests.

@kumaraditya303 kumaraditya303 changed the title Implement opaque PyObject ABI support ENH: Implement opaque PyObject ABI support Mar 29, 2026
@ngoldbaum
Copy link
Copy Markdown
Member

Sorry for adding and removing the milestone repeatedly. I know we talked about this yesterday at the meeting, but it occurs to me that maybe it's reasonable for NumPy 2.5 to support Python 3.15, e.g. by uploading wheels to PyPI? If so, it would be logical to include this as well.

NumPy 2.5 is scheduled to be released this summer, after the ABI for Python 3.15 is frozen, so it should be perfectly fine to build and upload NumPy wheels using a CPython beta.

I know that has implications for the release workflow, so ping @rgommers.

@charris
Copy link
Copy Markdown
Member

charris commented Apr 9, 2026

We don't upload wheels to PyPI for new Python releases until the rc release and cibuildwheel support for it.

EDIT: it is fine to build the wheels, just not to upload them. For instance, we build wheels for pyodide, but we don't upload them.

Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Claude to review this PR. I unfortunately don't have the mental patience to actually read this whole diff in one sitting.

It spotted a couple subtle issues in the iterator macros on the opaque pyobject build. Up to you, but I think it might be worth adding tests for the C iterator APIs under the opaque pyobject builds given the complexity of these macros.

But maybe also check for open-source uses first by searching github for items in the C API to see if it's worth doing.

Comment thread numpy/_core/include/numpy/ndarraytypes.h Outdated
Comment thread numpy/_core/include/numpy/ndarraytypes.h Outdated
Comment thread numpy/_core/code_generators/numpy_api.py Outdated
Comment thread numpy/__init__.cython-30.pxd Outdated
@rgommers
Copy link
Copy Markdown
Member

rgommers commented Apr 9, 2026

NumPy 2.5 is scheduled to be released this summer, after the ABI for Python 3.15 is frozen, so it should be perfectly fine to build and upload NumPy wheels using a CPython beta.

What Chuck said - only after 3.15.0rc1. There's nothing in this PR that requires this though, and there's no problem merging this PR and having support in the headers in 2.5.0.

Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't scan too closely for bugs, but I think Nathan got them (and honestly, I am not all that worried because all of these look like they should be compile time errors downstream and then even a minor release can fix them when they come up -- doesn't mean it isn't better to do it now :)).

Overall, the diff seems much more churn than it actually is, IMO, so this should be fast to wrap up.

Comment thread numpy/_core/code_generators/numpy_api.py Outdated
Comment thread numpy/_core/code_generators/numpy_api.py
Comment thread numpy/_core/include/numpy/ndarraytypes.h Outdated
Comment thread numpy/_core/include/numpy/ndarraytypes.h
Comment thread numpy/_core/include/numpy/numpyconfig.h Outdated
@kumaraditya303 kumaraditya303 marked this pull request as ready for review April 11, 2026 09:28
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Claude to give it another once-over and spotted a couple more issues.

I also tried building this branch against Pandas and saw compilation errors. We'll need to fix those before this can be merged.

Comment thread numpy/__init__.cython-30.pxd Outdated
# PyObject_HEAD
npy_datetime obval
PyArray_DatetimeMetaData obmeta
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried compiling Pandas against this branch and I get a compiler error in Pandas related to the changes to these structs:

     cython -M --fast-fail -3 -Xfreethreading_compatible=true --include-dir /Users/goldbaum/Documents/pandas/.mesonpy-dvqz12__/pandas/_libs/tslibs '-X always_allow_keywords=true' --shared=pandas._libs._cyutility /Users/goldbaum/Documents/pandas/pandas/_libs/tslibs/np_datetime.pyx -o pandas/_libs/tslibs/np_datetime.cpython-314-darwin.so.p/pandas/_libs/tslibs/np_datetime.pyx.c

      Error compiling Cython file:
      ------------------------------------------------------------
      ...

      cdef NPY_DATETIMEUNIT get_datetime64_unit(object obj) noexcept nogil:
          """
          returns the unit part of the dtype for a numpy datetime64 object.
          """
          return <NPY_DATETIMEUNIT>(<PyDatetimeScalarObject*>obj).obmeta.base
                                    ^
      ------------------------------------------------------------
      /Users/goldbaum/Documents/pandas/pandas/_libs/tslibs/np_datetime.pyx:78:30: Cannot convert 'PyDatetimeScalarObject *' to Python object
      [12/159] Compiling Cython source /Users/goldbaum/Documents/pandas/pandas/_libs/tslibs/dtypes.pyx
      [13/159] Generating _cyutility.c with a custom command
      [14/159] Compiling Cython source /Users/goldbaum/Documents/pandas/pandas/_libs/tslibs/conversion.pyx
      FAILED: [code=1] pandas/_libs/tslibs/conversion.cpython-314-darwin.so.p/pandas/_libs/tslibs/conversion.pyx.c
      cython -M --fast-fail -3 -Xfreethreading_compatible=true --include-dir /Users/goldbaum/Documents/pandas/.mesonpy-dvqz12__/pandas/_libs/tslibs '-X always_allow_keywords=true' --shared=pandas._libs._cyutility /Users/goldbaum/Documents/pandas/pandas/_libs/tslibs/conversion.pyx -o pandas/_libs/tslibs/conversion.cpython-314-darwin.so.p/pandas/_libs/tslibs/conversion.pyx.c

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
              return convert_str_to_tsobject(ts, tz, dayfirst, yearfirst)

          if checknull_with_nat_and_na(ts):
              obj.value = NPY_NAT
          elif cnp.is_datetime64_object(ts):
              num = (<PyDatetimeScalarObject*>ts).obmeta.num
                     ^
      ------------------------------------------------------------
      /Users/goldbaum/Documents/pandas/pandas/_libs/tslibs/conversion.pyx:384:15: Cannot convert 'PyDatetimeScalarObject *' to Python object

So I don't think we can unconditionally change these definitions without providing an API to migrate to. We might need to ask David Woods for help if it's not clear how to write the Cython bindings without relying on conditional compilation or other discouraged constructs in Cython (see https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#conditional-compilation).

Maybe we need new getter functions for obval as well as obmeta?

Copy link
Copy Markdown
Contributor Author

@kumaraditya303 kumaraditya303 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdef NPY_DATETIMEUNIT get_datetime64_unit(object obj) noexcept nogil:
"""
returns the unit part of the dtype for a numpy datetime64 object.
"""
return <NPY_DATETIMEUNIT>(<PyDatetimeScalarObject*>obj).obmeta.base

This looks like from older version of this branch, are you trying with the latest commit?

Never mind, this is from pandas internals.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it by adding it back, so that projects like pandas which directly access the struct can continue working. This also means that anyone trying to access in limited API would see compilation errors. I don't think there is an easy way to add it for limited API, @da-woods?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason not to give the struct contents in the pxd file - it doesn't generate any code until you try to use it.

It won't actually be possible to use the fields of these structs when using the opaque limited API. I don't currently think there's much way around that.


Possibly:

I added an argument to cython.cast(..., objstruct_cast=True) which gets the objstruct of an opaque type by doing PyObject_GetTypeData. That'd require unmerged and unreleased Cython versions. It would also be a bit dodgy because the original objects aren't actually opaque, but given the object layout you might get away with it. I think Cython might stop you from doing it though because it'll know that it isn't safe.

Comment thread numpy/__init__.cython-30.pxd Outdated
Comment thread numpy/_core/code_generators/generate_numpy_api.py
Comment thread numpy/_core/include/numpy/arrayscalars.h Outdated
Comment thread numpy/_core/include/numpy/numpyconfig.h Outdated
@ngoldbaum ngoldbaum added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Apr 16, 2026
@ngoldbaum
Copy link
Copy Markdown
Member

I added needs-release-note because it's a new feature so we should at least describe it in the release notes.

We have zero narrative docs about how to use the NumPy C API with the Python limited API. I'm going to open a new issue tracking that and we can describe abi3t builds there.

@kumaraditya303 kumaraditya303 added 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) labels Apr 16, 2026
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m planning to merge this next week, let me know if anyone else wants a chance to review.

#undef _PyDataType_GET_ITEM_DATA
/*NUMPY_API*/
NPY_NO_EXPORT PyArray_Descr_fields *
_PyDataType_GET_ITEM_DATA(const PyArray_Descr *dtype)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two potential issues with this pattern:

  1. I think it's really a violation of the c and c++ strict aliasing rules and so probably undefined behaviour. It's probably not problematic undefined behaviour (since the compiler probably can't see the access through the "other" type in opaque code). But I decided against it as a solution to a similar problem in Cython for now. The problem remains unsolved though so you should probably just do it. But be aware...
  2. If you somehow manage to by-pass the macro in non-opaque code this definitely ends up with the wrong answer. You could do that by taking a function pointer for example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's really a violation of the c and c++ strict aliasing rules and so probably undefined behaviour. It's probably not problematic undefined behaviour (since the compiler probably can't see the access through the "other" type in opaque code).

The pointer is first cast to char * which avoid strict aliasing violation because char * is allowed to alias any other object pointer.

If you somehow manage to by-pass the macro in non-opaque code this definitely ends up with the wrong answer. You could do that by taking a function pointer for example.

I don't think that is an issue as that should never happen.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is first cast to char * which avoid strict aliasing violation because char * is allowed to alias any other object pointer.

I don't going via a cast to char * avoids it - the issue is accessing the same data through seemingly unrelated structures. The idea of allowing a cast to char * to to provide a way to reading the binary representation of an object rather than to allow any cast through A -> char* -> B.

As I say, I don't think it will be a problem in practice but it makes me a bit nervous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement 39 - free-threading PRs and issues related to support for free-threading CPython (a.k.a. no-GIL, PEP 703) 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants