Skip to content

Allow the use of Masked arrays in sigma clip and friends#19858

Open
mhvk wants to merge 8 commits into
astropy:mainfrom
mhvk:sigma-clip-with-masked
Open

Allow the use of Masked arrays in sigma clip and friends#19858
mhvk wants to merge 8 commits into
astropy:mainfrom
mhvk:sigma-clip-with-masked

Conversation

@mhvk
Copy link
Copy Markdown
Contributor

@mhvk mhvk commented Jun 3, 2026

This pull request allows Masked arrays to be used on equal footing with np.ma.MaskedArray for sigma clipping, etc. It is the same as #16876, but rebased (I cannot seem to reopen the latter since I rebased and force-pushed first).

From the previous PR, my main note was that test coverage may be incomplete.

It is still an alternative to #19857, and in my opinion better in that it treats Masked as a first-class citizen. I set the milestone the same as #19857, though I'm not completely sure backporting to 7.2 is reasonable. EDIT: decision further down was not to backport.

Fixes #13200
Fixes #14360

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@mhvk mhvk added this to the v7.2.1 milestone Jun 3, 2026
@mhvk mhvk requested a review from larrybradley as a code owner June 3, 2026 07:24
@mhvk mhvk added stats Enhancement backport-v7.2.x on-merge: backport to v7.2.x backport-v8.0.x on-merge: backport to v8.0.x labels Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

mhvk added 2 commits June 3, 2026 09:25
Here, replace all use of ``np.ma.equal`` to test equality
with ``assert_equal`` so that we do not rely on assumptions
about private attributes that masked arrays must have.
@mhvk mhvk force-pushed the sigma-clip-with-masked branch from 8c882c1 to fd5d1c1 Compare June 3, 2026 07:25
@mhvk mhvk requested a review from astrofrog June 3, 2026 07:26
@mhvk mhvk changed the title Sigma clip with masked Allow the use of Masked arrays in sigma clip and friends Jun 3, 2026
@astrofrog
Copy link
Copy Markdown
Member

Would it make sense to perhaps copy some of the tests over from my PR if you are concerned about test coverage?

@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Jun 3, 2026

@astrofrog - good point, and that indeed uncovered bugs in both our implementations (yours only that the unit of variance was wrong, should have been squared). I had not done biweight, but that now seems OK too.

Copy link
Copy Markdown
Contributor Author

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some in-line comments to (hopefully) help review.

Comment thread astropy/stats/funcs.py
else:
result = func(np.abs(data - data_median), axis=axis, overwrite_input=True)

if axis is None and np.ma.isMaskedArray(result):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adjusting but then realized no test whatsoever depends on these statements, so possibly the reason for having them has disappeared?

with np.errstate(invalid="ignore"):
out = np.ma.masked_invalid(data, copy=False)

filtered_data = np.ma.masked_where(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was ever needed - every clipped element in filtered_data is already set to NaN.

data = np.ma.MaskedArray(data, mask)
if mask_value is not None:
data = np.ma.masked_values(data, mask_value)
value_mask = np.ma.masked_values(data, mask_value).mask
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite want to reimplement masked_values so use it here only to get a new mask, independent of input class.

@@ -0,0 +1 @@
``Masked`` instances can now be used in sigma clipping.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to update this to note that it can also be used in biweight calculations and with units. But maybe some review first is better... Arguably this deserves a whatsnew anyway.

assert np.isnan(biweight_location(data1d))
bw_loc = biweight_location(data1d_masked)
assert not isinstance(bw_loc, np.ma.MaskedArray)
if masked_array is np.ma.MaskedArray:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimally intrusive way I used means that one gets back Masked(np.nan) instead of plain np.nan. I'm not sure there is a point in explicitly removing the class; it might also make sense to just remove the test here (and elsewhere).

Comment thread astropy/stats/biweight.py

elif isinstance(data, Masked):
if ignore_nan:
median_func = np.nanmedian
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot use nanmedian and nansum since Masked cannot override the bottleneck versions.

@pllim
Copy link
Copy Markdown
Member

pllim commented Jun 3, 2026

I don't think we backport "feature"?

@mhvk mhvk modified the milestones: v7.2.1, v8.1.0 Jun 3, 2026
@mhvk mhvk removed backport-v7.2.x on-merge: backport to v7.2.x backport-v8.0.x on-merge: backport to v8.0.x labels Jun 3, 2026
@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Jun 3, 2026

Indeed, let me remove the backport labels (could still discuss 8.0.x, but no real reason).

@astrofrog
Copy link
Copy Markdown
Member

Ok no problem, I was treating it as a bug fix originally (since the two issues it closes are bug reports) but no strong opinion

Copy link
Copy Markdown
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments, and a corner case change in behavior:

before this PR:

In [1]: import numpy as np
   ...: from astropy.stats import median_absolute_deviation as mad
   ...: 
   ...: arr = np.array([[1., 4., 3., np.nan],
   ...:                 [2., 5., np.nan, 4.]])
   ...: 
   ...: r = mad(arr, func=np.ma.median, axis=1)
   ...: 

In [2]: r
Out[2]: array([nan, nan])

In [3]: type(r)
Out[3]: numpy.ndarray

after this PR:

In [1]: import numpy as np
   ...: from astropy.stats import median_absolute_deviation as mad
   ...: 
   ...: arr = np.array([[1., 4., 3., np.nan],
   ...:                 [2., 5., np.nan, 4.]])
   ...: 
   ...: r = mad(arr, func=np.ma.median, axis=1)

In [2]: r
Out[2]: 
masked_array(data=[nan, nan],
             mask=[False, False],
       fill_value=1e+20)

In [3]: type(r)
Out[3]: numpy.ma.MaskedArray

Comment thread astropy/stats/funcs.py
if ignore_nan:
data = np.ma.masked_where(np.isnan(data), data, copy=True)
elif isinstance(data, Masked):
is_masked = True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is_masked used anymore? (here and elsewhere)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it isn't; I've removed it locally and will push if we think your corner case exception is OK (larger reply in a moment).

__all__ = ["SigmaClip", "SigmaClippedStats", "sigma_clip", "sigma_clipped_stats"]


def make_masked(data, mask, **kwargs):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make explicitly private? (_make_masked)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done (again for now just locally)



def test_sigmaclip_fully_masked():
@pytest.mark.parametrize("unit", [None])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try other units?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that was meant to have a real unit too, indeed!

@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Jun 4, 2026

On the change you found, I guess the question is what is the expected output array type for your sample call:

r = mad(arr, func=np.ma.median, axis=1)

I think one can at least argue that if the function explicitly converts the input to a MaskedArray, it seems reasonable to give that back to the user. It certainly would seem fine if mad was introduced here. But of course mad is not new, and one can just as reasonably argue that there is no reason to change the behaviour of mad even for an untested corner case.

My preference is to stick with the change I made, if only because it makes the code simpler, but it is a very mild preference, and it is easy enough to keep the old behaviour. @larrybradley - what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants