Stop preserving NULL once a logical column is read as a bool array (without logical_as_bytes)#19941
Draft
astrofrog wants to merge 1 commit into
Draft
Stop preserving NULL once a logical column is read as a bool array (without logical_as_bytes)#19941astrofrog wants to merge 1 commit into
astrofrog wants to merge 1 commit into
Conversation
Contributor
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
logical_as_bytes=True440e43f to
9e3451b
Compare
Member
Author
|
This will need rebasing once #19952 is merged in |
…h the plain bool view
9e3451b to
f3d8b05
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is one last fix which is again also arguably a bug fix which I'd like to include in 8.0.1 to fix some of the behavior with the new logical NULL functionality introduced in 8.0.0.
Reading a FITS logical column without
logical_as_bytes=Truereturns a bool array in which NULL (b'\x00') is coerced toFalse, and warns that it has done so. But on write-back the original NULL bytes are currently preserved rather than converted, so the bool array users read and edit does not round-trip to disk: explicitly assigningFalseto a row that was NULL is silently dropped, and the column keeps emitting thecontains NULLwarning on every read even after every value has been set. In other words the warning is not correctly reflecting the truth.Here is a minimal example to demonstrate the bug:
The issue here is that
_scale_backdecides which raw bytes to rewrite withraw_field == ord("T"), which isFalsefor both a NULL byte and ab'F'byte. The cachedFalsetherefore "matches" the NULL byte and the byte is left untouched. The selective write was added with the 8.0 NULL support to preserve genuine NULLs, but it cannot distinguish an explicitFalsefrom aFalseproduced by coercing a NULL.The cleanest way I've found to fix it is that once a NULL-containing logical column has been materialized as a bool array (the lossy view the warning is about), write it back as
b'T'/b'F'. With this PR the same edit gives:There are two side effects of this, which I think/hope we can live with:
Conversion is lazy and per-column, so whether NULL survives a copy depends on whether you touched the column. If you read a file and never access the column, NULLs are preserved on disk even if you opened with
logical_as_bytes=False- we have to do this because data access is lazy, so we can't go through all the data at open time and convert all the data over to bool preemptively.Operations that access every column --
repr/printing the table,Table.read, iterating rows -- count as reading the logical column, so they too cause its NULLs to be written asFalseon a later write:The same rule applies under
mode='update': reading a NULL logical column and letting the file flush on close rewrites those NULLs asFalsein place. To read or edit NULL deliberately in any of these situations, uselogical_as_bytes=True.To be clear though, the most important here is that the warning and behavior are consistent. If a warning is emitted, then the data will be changed when the file is written (which is not the case currently). If a user opens a file in update mode just to update the header, the logical data won't be touched.