Allow the use of Masked arrays in sigma clip and friends#19858
Conversation
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
Here, replace all use of ``np.ma.equal`` to test equality with ``assert_equal`` so that we do not rely on assumptions about private attributes that masked arrays must have.
8c882c1 to
fd5d1c1
Compare
|
Would it make sense to perhaps copy some of the tests over from my PR if you are concerned about test coverage? |
…statistics via a shared astropy.stats helper that routes them through the numpy.ma path
|
@astrofrog - good point, and that indeed uncovered bugs in both our implementations (yours only that the unit of variance was wrong, should have been squared). I had not done biweight, but that now seems OK too. |
mhvk
left a comment
There was a problem hiding this comment.
Some in-line comments to (hopefully) help review.
| else: | ||
| result = func(np.abs(data - data_median), axis=axis, overwrite_input=True) | ||
|
|
||
| if axis is None and np.ma.isMaskedArray(result): |
There was a problem hiding this comment.
I tried adjusting but then realized no test whatsoever depends on these statements, so possibly the reason for having them has disappeared?
| with np.errstate(invalid="ignore"): | ||
| out = np.ma.masked_invalid(data, copy=False) | ||
|
|
||
| filtered_data = np.ma.masked_where( |
There was a problem hiding this comment.
I don't think this was ever needed - every clipped element in filtered_data is already set to NaN.
| data = np.ma.MaskedArray(data, mask) | ||
| if mask_value is not None: | ||
| data = np.ma.masked_values(data, mask_value) | ||
| value_mask = np.ma.masked_values(data, mask_value).mask |
There was a problem hiding this comment.
Didn't quite want to reimplement masked_values so use it here only to get a new mask, independent of input class.
| @@ -0,0 +1 @@ | |||
| ``Masked`` instances can now be used in sigma clipping. | |||
There was a problem hiding this comment.
I'll need to update this to note that it can also be used in biweight calculations and with units. But maybe some review first is better... Arguably this deserves a whatsnew anyway.
| assert np.isnan(biweight_location(data1d)) | ||
| bw_loc = biweight_location(data1d_masked) | ||
| assert not isinstance(bw_loc, np.ma.MaskedArray) | ||
| if masked_array is np.ma.MaskedArray: |
There was a problem hiding this comment.
The minimally intrusive way I used means that one gets back Masked(np.nan) instead of plain np.nan. I'm not sure there is a point in explicitly removing the class; it might also make sense to just remove the test here (and elsewhere).
|
|
||
| elif isinstance(data, Masked): | ||
| if ignore_nan: | ||
| median_func = np.nanmedian |
There was a problem hiding this comment.
I cannot use nanmedian and nansum since Masked cannot override the bottleneck versions.
|
I don't think we backport "feature"? |
|
Indeed, let me remove the backport labels (could still discuss 8.0.x, but no real reason). |
|
Ok no problem, I was treating it as a bug fix originally (since the two issues it closes are bug reports) but no strong opinion |
astrofrog
left a comment
There was a problem hiding this comment.
A few small comments, and a corner case change in behavior:
before this PR:
In [1]: import numpy as np
...: from astropy.stats import median_absolute_deviation as mad
...:
...: arr = np.array([[1., 4., 3., np.nan],
...: [2., 5., np.nan, 4.]])
...:
...: r = mad(arr, func=np.ma.median, axis=1)
...:
In [2]: r
Out[2]: array([nan, nan])
In [3]: type(r)
Out[3]: numpy.ndarrayafter this PR:
In [1]: import numpy as np
...: from astropy.stats import median_absolute_deviation as mad
...:
...: arr = np.array([[1., 4., 3., np.nan],
...: [2., 5., np.nan, 4.]])
...:
...: r = mad(arr, func=np.ma.median, axis=1)
In [2]: r
Out[2]:
masked_array(data=[nan, nan],
mask=[False, False],
fill_value=1e+20)
In [3]: type(r)
Out[3]: numpy.ma.MaskedArray| if ignore_nan: | ||
| data = np.ma.masked_where(np.isnan(data), data, copy=True) | ||
| elif isinstance(data, Masked): | ||
| is_masked = True |
There was a problem hiding this comment.
Is is_masked used anymore? (here and elsewhere)
There was a problem hiding this comment.
Indeed, it isn't; I've removed it locally and will push if we think your corner case exception is OK (larger reply in a moment).
| __all__ = ["SigmaClip", "SigmaClippedStats", "sigma_clip", "sigma_clipped_stats"] | ||
|
|
||
|
|
||
| def make_masked(data, mask, **kwargs): |
There was a problem hiding this comment.
Maybe make explicitly private? (_make_masked)
There was a problem hiding this comment.
OK, done (again for now just locally)
|
|
||
|
|
||
| def test_sigmaclip_fully_masked(): | ||
| @pytest.mark.parametrize("unit", [None]) |
There was a problem hiding this comment.
Oops, that was meant to have a real unit too, indeed!
|
On the change you found, I guess the question is what is the expected output array type for your sample call: I think one can at least argue that if the function explicitly converts the input to a My preference is to stick with the change I made, if only because it makes the code simpler, but it is a very mild preference, and it is easy enough to keep the old behaviour. @larrybradley - what do you think? |
This pull request allows
Maskedarrays to be used on equal footing withnp.ma.MaskedArrayfor sigma clipping, etc. It is the same as #16876, but rebased (I cannot seem to reopen the latter since I rebased and force-pushed first).From the previous PR, my main note was that test coverage may be incomplete.
It is still an alternative to #19857, and in my opinion better in that it treats
Maskedas a first-class citizen.I set the milestone the same as #19857, though I'm not completely sure backporting to 7.2 is reasonable.EDIT: decision further down was not to backport.Fixes #13200
Fixes #14360