Avoid NaN <-> UNDEFINED conversions in astropy.wcs to improve thread-safety#19862
Avoid NaN <-> UNDEFINED conversions in astropy.wcs to improve thread-safety#19862astrofrog wants to merge 17 commits into
Conversation
…aN conversion functions no-ops
…DEFINED<->NaN translation and element write-back
…so units machinery works
…-API stubs deprecated
…ned and expose core linear params as plain views
… mutating in place
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
… slots and a capsule
…array, and reuse the NaN/UNDEFINED conversion helpers
… parameters and add tests for the parameter conversion behaviour
…arameter-array limitations, and add a concurrent to_header consistency test
…nted assignment does
|
I think this is ready to review! @mhvk - I added you as a reviewer since this adds a numpy array subclass in the C code so I'm sure you will have some useful insight there! |
There was a problem hiding this comment.
@astrofrog - I like the overall idea! I've only glanced through things right now, so the two comments are rather random, but one bigger thought: in a way, what you are trying to mimic is a float-like dtype, where UNDEFINED means NaN. Since numpy 2.0, it is possible to make user dtypes in C that would allow to make such a float-like dtype with the translation implemented there. In numpy itself, something similar is done for the new StringDType (which can have different sentinels for missing data), though probably the closest example is the test "scaled float type" defined at https://github.com/numpy/numpy/blob/main/numpy/_core/src/umath/_scaled_float_dtype.c
Anyway, I don't think anything you've done here precludes moving to that later...
EDIT: more user dtypes at https://github.com/numpy/numpy-user-dtypes
| if (x != NULL) { | ||
| wcsprm_fix_values(x, &undefined2nan); | ||
| } | ||
| (void)x; |
There was a problem hiding this comment.
In principle could use the numpy DEPRECATE macro, which will emit a warning.
| *dest = PyFloat_AsDouble(value); | ||
| double v = PyFloat_AsDouble(value); | ||
|
|
||
| if (PyErr_Occurred()) { |
There was a problem hiding this comment.
This can be made faster by doing if (v == -1.0 && PyErr_Occurred())
sorry I know this PR description is long but I wrote it all by hand, it is not an AI slop grenade 😆
Motivation
This PR is part of a series (hopefully one of the last such PRs) that aims to clean up some of the C-level code in astropy.wcs in order to make it thread-safe. One of the thread-unsafe things we do in astropy.wcs is to constantly call
wcsprm_python2candwcsprm_c2pythonto change values from being NaN (Python-side) to UNDEFINED (a special sentinel value in the C code) and back. But mutating the WCS is thread-unsafe and while one thread may be modifying theWcsprmobject to be in Python convention, the other might be doing an operation expecting it to be in C convention. Here is an example of buggy behavior withto_header():You can run this with:
The output is:
There are 46 different header outputs (as in, 46 different variants of the above header), and as you can see above, some of the values are
NAN.0.What's happening here is that one of the threads is making a header (in the C code) while the other thread had converted some of the values to NaN (for Python). I tried to fix most of these issues in other PRs with non-free-threaded Python by making sure we only release the GIL in safe places, but these issues are going to become more common with free-threaded Python since the GIL is never there (as you can see above).
There are a number of other examples of similar bugs, which fundamentally come down to the fact we keep modifying the
Wcsprmin-place. The ideal solution is to simply get rid of these back and forth conversions, which is what this PR does.Approach
The general approach of this PR is to instead always store the WCS with UNDEFINED values (where these are needed) and convert WCS parameter values from NaN to UNDEFINED (where needed) on-the-fly in the getters and setters for
Wcsprm. For scalars, this is easy. For array values, it's also easy to support e.g.and have this translate to
[1, 2, 3, 4, 5, UNDEFINED]in the C layer, but where it gets trickier is supporting:i.e. modify single elements of the arrays. This doesn't go through the setter, it just uses the getter for the parameter then uses Numpy to modify the underlying buffer directly, so we can't detect if a user does this kind of operation easily. In order to properly support this, we therefore need a thin Numpy array subclass which can deal with the conversion on-the-fly including on individual elements as above (which this PR includes). The way this array subclass works is that when the getter for e.g.
wcs.wcs.obsgeois called, we make an instance of the subclass with a buffer which is a copy of the original one in the WCS but with undefined values changed to NaNs, and then when a value is set via__setitem__on the array, we update the subclass's buffer but then also update the WCS's buffer, converting NaN -> UNDEFINED.Limitations
There is one limitation I am aware of - since we make a copy of the buffer when creating the numpy subclass instance, this won't work:
that is, because the subclass has a copy of the buffer, that buffer won't be updated if the contents of the WCS change through some other mechanism. However, doing e.g. arr[1] = np.nan will work though. So it's just the corner case of getting the array and then modifying it in-place using the original
wcspath and then expecting the array 'reference' to see an updated value that won't work.Primary array parameters
The limitation above is not perfect, but we can limit its impact. I realized while working on all this that WCSLIB only cares about UNDEFINED values in some parameters, but not all. For example, WCSLIB does not do anything special with UNDEFINED values in
wcs.wcs.crval, and in fact this can lead to weird issues - the following is on main:For such fields, since WCSLIB doesn't look for the sentinel value, it would actually be better to actually have the C parameters also set to NaN, not the UNDEFINED sentinel value. So in this PR, the approach of having the wrapper numpy class is only done for array-like parameters for which WCSLIB actually cares about UNDEFINED values - and there are only six:
WCS.wcs.obsgeoWCS.wcs.crderWCS.wcs.csyerWCS.wcs.cperiWCS.wcs.czphsWCS.wcs.mjdrefFor e.g.
WCS.wcs.crval, in this PR we don't use the Numpy array subclass, and we let the NaN values through. This actually then produces less weird results for the above example:Much better!
Summary
wcsprm_python2candwcsprm_c2pythonWCSParameterArray. This subclass is where most of the complexity is here, so I've put it in its own file to try and keep the diff of all the other files simpler.WCSobject, the reference will not also be updated - to me this is a very small corner case which I very much doubt will matter in practicerather than setting individual elements in-place. We could do this deprecation in a follow-up PR though if we like that idea.