Skip to content

Stop preserving NULL once a logical column is read as a bool array (without logical_as_bytes)#19941

Draft
astrofrog wants to merge 1 commit into
astropy:mainfrom
astrofrog:logical-null-collapse-on-access
Draft

Stop preserving NULL once a logical column is read as a bool array (without logical_as_bytes)#19941
astrofrog wants to merge 1 commit into
astropy:mainfrom
astrofrog:logical-null-collapse-on-access

Conversation

@astrofrog

Copy link
Copy Markdown
Member

This is one last fix which is again also arguably a bug fix which I'd like to include in 8.0.1 to fix some of the behavior with the new logical NULL functionality introduced in 8.0.0.

Reading a FITS logical column without logical_as_bytes=True returns a bool array in which NULL (b'\x00') is coerced to False, and warns that it has done so. But on write-back the original NULL bytes are currently preserved rather than converted, so the bool array users read and edit does not round-trip to disk: explicitly assigning False to a row that was NULL is silently dropped, and the column keeps emitting the contains NULL warning on every read even after every value has been set. In other words the warning is not correctly reflecting the truth.

Here is a minimal example to demonstrate the bug:

>>> import numpy as np
>>> from astropy.io import fits

>>> # build a file whose logical column has a NULL in row 1 (raw bytes b'T', b'\x00', b'F')
>>> col = fits.Column("flag", "L", array=np.array([b"T", b"\x00", b"F"], dtype="S1"))
>>> fits.BinTableHDU.from_columns([col]).writeto("flags.fits", overwrite=True)

>>> hdul = fits.open("flags.fits")
>>> flag = hdul[1].data["flag"]          # materialize the column as a bool array
WARNING: Column 'flag' contains NULL (undefined) values which will be converted to False. To preserve NULL information, reopen the file with logical_as_bytes=True and check for bytes values of b'\x00' (NULL), b'T' (True), and b'F' (False). [astropy.io.fits.fitsrec]
>>> flag
array([ True, False, False])
>>> flag[0] = False                      # row 0 was True  -> changes to b'F'
>>> flag[1] = False                      # row 1 was NULL  -> *should* change to b'F'
>>> flag[2] = True                       # row 2 was False -> changes to b'T'
>>> hdul.writeto("out.fits", overwrite=True)
>>> hdul.close()

>>> # before this PR: rows 0 and 2 changed, but row 1 is still NULL
>>> fits.open("out.fits", logical_as_bytes=True)[1].data["flag"].tobytes()
b'F\x00T'

The issue here is that _scale_back decides which raw bytes to rewrite with raw_field == ord("T"), which is False for both a NULL byte and a b'F' byte. The cached False therefore "matches" the NULL byte and the byte is left untouched. The selective write was added with the 8.0 NULL support to preserve genuine NULLs, but it cannot distinguish an explicit False from a False produced by coercing a NULL.

The cleanest way I've found to fix it is that once a NULL-containing logical column has been materialized as a bool array (the lossy view the warning is about), write it back as b'T'/b'F'. With this PR the same edit gives:

>>> hdul = fits.open("flags.fits")
>>> hdul[1].data["flag"][1] = False
WARNING: Column 'flag' contains NULL (undefined) values which will be converted to False. ... [astropy.io.fits.fitsrec]
>>> hdul.writeto("out.fits", overwrite=True)
>>> hdul.close()
>>> fits.open("out.fits", logical_as_bytes=True)[1].data["flag"].tobytes()
b'TFF'

There are two side effects of this, which I think/hope we can live with:

  1. Conversion is lazy and per-column, so whether NULL survives a copy depends on whether you touched the column. If you read a file and never access the column, NULLs are preserved on disk even if you opened with logical_as_bytes=False - we have to do this because data access is lazy, so we can't go through all the data at open time and convert all the data over to bool preemptively.

  2. Operations that access every column -- repr/printing the table, Table.read, iterating rows -- count as reading the logical column, so they too cause its NULLs to be written as False on a later write:

>>> hdul = fits.open("flags.fits")
>>> _ = repr(hdul[1].data)              # converts all columns
WARNING: Column 'flag' contains NULL (undefined) values which will be converted to False. ... [astropy.io.fits.fitsrec]
>>> hdul.writeto("copy2.fits", overwrite=True)
>>> hdul.close()
>>> fits.open("copy2.fits", logical_as_bytes=True)[1].data["flag"].tobytes()
b'TFF'

The same rule applies under mode='update': reading a NULL logical column and letting the file flush on close rewrites those NULLs as False in place. To read or edit NULL deliberately in any of these situations, use logical_as_bytes=True.

To be clear though, the most important here is that the warning and behavior are consistent. If a warning is emitted, then the data will be changed when the file is written (which is not the case currently). If a user opens a file in update mode just to update the header, the logical data won't be touched.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@astrofrog astrofrog requested a review from saimn as a code owner June 18, 2026 11:30
@github-actions

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@astrofrog astrofrog added Bug backport-v8.0.x on-merge: backport to v8.0.x labels Jun 18, 2026
@astrofrog astrofrog changed the title Stop preserving NULL when a logical column is not read with logical_as_bytes=True Stop preserving NULL once a logical column is read as a bool array (without logical_as_bytes) Jun 18, 2026
@pllim pllim added this to the v8.0.1 milestone Jun 18, 2026
@astrofrog astrofrog force-pushed the logical-null-collapse-on-access branch 2 times, most recently from 440e43f to 9e3451b Compare June 19, 2026 13:27
@astrofrog astrofrog marked this pull request as draft June 19, 2026 13:28
@astrofrog

Copy link
Copy Markdown
Member Author

This will need rebasing once #19952 is merged in

@astrofrog astrofrog force-pushed the logical-null-collapse-on-access branch from 9e3451b to f3d8b05 Compare June 30, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-v8.0.x on-merge: backport to v8.0.x Bug io.fits

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants