Skip to content

Avoid NaN <-> UNDEFINED conversions in astropy.wcs to improve thread-safety#19862

Open
astrofrog wants to merge 17 commits into
astropy:mainfrom
astrofrog:wcs-canonical-undefined
Open

Avoid NaN <-> UNDEFINED conversions in astropy.wcs to improve thread-safety#19862
astrofrog wants to merge 17 commits into
astropy:mainfrom
astrofrog:wcs-canonical-undefined

Conversation

@astrofrog
Copy link
Copy Markdown
Member

@astrofrog astrofrog commented Jun 3, 2026

sorry I know this PR description is long but I wrote it all by hand, it is not an AI slop grenade 😆

Motivation

This PR is part of a series (hopefully one of the last such PRs) that aims to clean up some of the C-level code in astropy.wcs in order to make it thread-safe. One of the thread-unsafe things we do in astropy.wcs is to constantly call wcsprm_python2c and wcsprm_c2python to change values from being NaN (Python-side) to UNDEFINED (a special sentinel value in the C code) and back. But mutating the WCS is thread-unsafe and while one thread may be modifying the Wcsprm object to be in Python convention, the other might be doing an operation expecting it to be in C convention. Here is an example of buggy behavior with to_header():

import concurrent.futures
from astropy.wcs import WCS

wcs = WCS(naxis=2)
wcs.wcs.ctype = ["RA---TAN", "DEC--TAN"]
wcs.wcs.crval, wcs.wcs.crpix, wcs.wcs.cdelt = [10, 20], [256, 256], [-1e-3, 1e-3]
wcs.wcs.set()


def headers(_):
    return {wcs.wcs.to_header() for _ in range(500)}  # shared WCS, 500 calls


with concurrent.futures.ThreadPoolExecutor(8) as ex:
    seen = set().union(*ex.map(headers, range(8)))

print(
    f"distinct headers: {len(seen)} (want 1); "
    f"with NAN: {sum('NAN' in h.upper() for h in seen)} (want 0)"
)

print(next((h for h in seen if "NAN" in h.upper()), "").strip()[:300])

You can run this with:

PYTHON_GIL=0 uv run --python 3.14t --with numpy --with astropy mwe2.py

The output is:

distinct headers: 46 (want 1); with NAN: 45 (want 0)
WCSAXES =                    2 / Number of coordinate axes
CRPIX1  =                256.0 / Pixel coordinate of reference point
CRPIX2  =                256.0 / Pixel coordinate of reference point
CDELT1  =               -0.001 / [deg] Coordinate increment at reference point
...
RADESYS = 'ICRS'               / Equatorial coordinate system
EQUINOX =                NAN.0 / [yr] Equinox of equatorial coordinates
COMMENT  WCS header keyrecords produced by WCSLIB 8.4 

There are 46 different header outputs (as in, 46 different variants of the above header), and as you can see above, some of the values are NAN.0.

What's happening here is that one of the threads is making a header (in the C code) while the other thread had converted some of the values to NaN (for Python). I tried to fix most of these issues in other PRs with non-free-threaded Python by making sure we only release the GIL in safe places, but these issues are going to become more common with free-threaded Python since the GIL is never there (as you can see above).

There are a number of other examples of similar bugs, which fundamentally come down to the fact we keep modifying the Wcsprm in-place. The ideal solution is to simply get rid of these back and forth conversions, which is what this PR does.

Approach

The general approach of this PR is to instead always store the WCS with UNDEFINED values (where these are needed) and convert WCS parameter values from NaN to UNDEFINED (where needed) on-the-fly in the getters and setters for Wcsprm. For scalars, this is easy. For array values, it's also easy to support e.g.

wcs.wcs.obsgeo = [1, 2, 3, 4, 5, np.nan]

and have this translate to [1, 2, 3, 4, 5, UNDEFINED] in the C layer, but where it gets trickier is supporting:

wcs.wcs.obsgeo[5] = np.nan

i.e. modify single elements of the arrays. This doesn't go through the setter, it just uses the getter for the parameter then uses Numpy to modify the underlying buffer directly, so we can't detect if a user does this kind of operation easily. In order to properly support this, we therefore need a thin Numpy array subclass which can deal with the conversion on-the-fly including on individual elements as above (which this PR includes). The way this array subclass works is that when the getter for e.g. wcs.wcs.obsgeo is called, we make an instance of the subclass with a buffer which is a copy of the original one in the WCS but with undefined values changed to NaNs, and then when a value is set via __setitem__ on the array, we update the subclass's buffer but then also update the WCS's buffer, converting NaN -> UNDEFINED.

Limitations

There is one limitation I am aware of - since we make a copy of the buffer when creating the numpy subclass instance, this won't work:

>>> wcs.wcs.obsgeo = (1, 2, 3, 4, 5, 6)
>>> arr = wcs.wcs.obsgeo
>>> wcs.wcs.obsgeo[1] = np.nan
>>> arr
[1, 2, 3, 4, 5, 6]

that is, because the subclass has a copy of the buffer, that buffer won't be updated if the contents of the WCS change through some other mechanism. However, doing e.g. arr[1] = np.nan will work though. So it's just the corner case of getting the array and then modifying it in-place using the original wcs path and then expecting the array 'reference' to see an updated value that won't work.

Primary array parameters

The limitation above is not perfect, but we can limit its impact. I realized while working on all this that WCSLIB only cares about UNDEFINED values in some parameters, but not all. For example, WCSLIB does not do anything special with UNDEFINED values in wcs.wcs.crval, and in fact this can lead to weird issues - the following is on main:

In [4]: import numpy as np

In [5]: from astropy.wcs import WCS

In [6]: wcs = WCS(naxis=2)

In [8]: wcs.wcs.crval[1] = np.nan

In [9]: wcs.wcs_pix2world(1, 2, 0)
Out[9]: [array(2.), array(9.87654321e+107)]

For such fields, since WCSLIB doesn't look for the sentinel value, it would actually be better to actually have the C parameters also set to NaN, not the UNDEFINED sentinel value. So in this PR, the approach of having the wrapper numpy class is only done for array-like parameters for which WCSLIB actually cares about UNDEFINED values - and there are only six:

  • WCS.wcs.obsgeo
  • WCS.wcs.crder
  • WCS.wcs.csyer
  • WCS.wcs.cperi
  • WCS.wcs.czphs
  • WCS.wcs.mjdref

For e.g. WCS.wcs.crval, in this PR we don't use the Numpy array subclass, and we let the NaN values through. This actually then produces less weird results for the above example:

In [2]: import numpy as np

In [3]: from astropy.wcs import WCS

In [4]: wcs = WCS(naxis=2)

In [5]: wcs.wcs.crval[1] = np.nan

In [6]: wcs.wcs_pix2world(1, 2, 0)
Out[6]: [array(2.), array(nan)]

Much better!

Summary

  • This PR gets rid of all the calls to wcsprm_python2c and wcsprm_c2python
  • For scalar parameters, we convert between NaN and UNDEFINED in the getters and setters
  • For most array parameters (all those for which WCSLIB doesn't look for UNDEFINED), we don't do anything, and pass NaN values through which leads to improved behavior
  • For the six array parameters listed above, we use the Numpy subclass, called WCSParameterArray. This subclass is where most of the complexity is here, so I've put it in its own file to try and keep the diff of all the other files simpler.
  • The only downside of all this is that for the corner case of trying to get a reference to one of these six parameters and then modifying the parameter in-place using the WCS object, the reference will not also be updated - to me this is a very small corner case which I very much doubt will matter in practice
  • We could actually deprecate modifying the arrays in-place for the six parameter values which would allow the Numpy subclass approach to be a temporary solution, and in the long term we'd get rid of it and just return read-only arrays for parameters in which UNDEFINED is special, so one would need to do e.g.
wcs.wcs.obsgeo = [...]

rather than setting individual elements in-place. We could do this deprecation in a follow-up PR though if we like that idea.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@github-actions github-actions Bot added the wcs label Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@astrofrog astrofrog added this to the v8.1.0 milestone Jun 3, 2026
@astrofrog astrofrog added Extra CI Run cron CI as part of PR Build all wheels Run all the wheel builds rather than just a selection labels Jun 3, 2026
@astrofrog astrofrog changed the title WIP: avoid NaN <-> UNDEFINED conversions in astropy.wcs to improve thread-safety Avoid NaN <-> UNDEFINED conversions in astropy.wcs to improve thread-safety Jun 3, 2026
@astrofrog
Copy link
Copy Markdown
Member Author

I think this is ready to review! @mhvk - I added you as a reviewer since this adds a numpy array subclass in the C code so I'm sure you will have some useful insight there!

@astrofrog astrofrog marked this pull request as ready for review June 4, 2026 08:32
@astrofrog astrofrog requested a review from mcara as a code owner June 4, 2026 08:32
Copy link
Copy Markdown
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog - I like the overall idea! I've only glanced through things right now, so the two comments are rather random, but one bigger thought: in a way, what you are trying to mimic is a float-like dtype, where UNDEFINED means NaN. Since numpy 2.0, it is possible to make user dtypes in C that would allow to make such a float-like dtype with the translation implemented there. In numpy itself, something similar is done for the new StringDType (which can have different sentinels for missing data), though probably the closest example is the test "scaled float type" defined at https://github.com/numpy/numpy/blob/main/numpy/_core/src/umath/_scaled_float_dtype.c

Anyway, I don't think anything you've done here precludes moving to that later...

EDIT: more user dtypes at https://github.com/numpy/numpy-user-dtypes

Comment thread astropy/wcs/src/pyutil.c
if (x != NULL) {
wcsprm_fix_values(x, &undefined2nan);
}
(void)x;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle could use the numpy DEPRECATE macro, which will emit a warning.

Comment thread astropy/wcs/src/pyutil.c
*dest = PyFloat_AsDouble(value);
double v = PyFloat_AsDouble(value);

if (PyErr_Occurred()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made faster by doing if (v == -1.0 && PyErr_Occurred())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build all wheels Run all the wheel builds rather than just a selection Extra CI Run cron CI as part of PR multi-threading wcs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants