Fix incorrect table::len warnings when reading non-astropy VOParquet files#19817
Open
stvoutsin wants to merge 1 commit into
Open
Fix incorrect table::len warnings when reading non-astropy VOParquet files#19817stvoutsin wants to merge 1 commit into
stvoutsin wants to merge 1 commit into
Conversation
ab9e516 to
be75b06
Compare
Contributor
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
3f38461 to
88288a5
Compare
88288a5 to
9ed9fad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request fixes a bug where
read_parquet_votableincorrectly emitsAstropyUserWarningmessages for every string column when reading VOParquet files produced by non-astropy writers (e.g. TAP services).read_parquet_votabledelegates the raw data load toread_table_parquetviaTable.read(filename, format="parquet"). That code path looks fortable::len::<name>keys in the Parquet file metadata to determine string column widths. Those keys are an astropy-private convention written only bywrite_parquet_table, but not part of the IVOA VOTable-in-Parquet specification.So conformant VOParquet files not produced by astropy will not include these keys, leading to warnings incorrectly being fired for any such files.
The fix here adds a
string_lengthsparameter toread_table_parquetthat lets callers supply column widths directly, bypassing thetable::lenlookup and the associated scans & warnings.read_parquet_votableuses this parameter by extracting the widths from thearraysizeattributes of the<FIELD>elements in the embedded VOTable XML, as that information should always be present in every conformant VOParquet file.Fixes #19783
Changes
astropy/io/misc/parquet.pyread_table_parquet: newstring_lengths: dict[str, int | None] | Noneparameter. When a column name is present in the dict, the value is used as the string width (Nonemeans variable-length). Thetable::lenmetadata lookup and scan/warning are only reached when the column is absent fromstring_lengths.read_parquet_votable: parses the embedded VOTable XML, builds a completestring_lengthsdict from<FIELD arraysize="...">attributes, and passes it to theTable.read(filename, format="parquet")call.astropy/io/misc/tests/test_parquet_votable.py_write_voparquethelper that writes a minimal VOParquet file usingpyarrow directly, simulating a TAP service (no
table::lenwritten).test_compare_parquet_votableto assert that VOParquet reads neveremit the warning (plain parquet reads still do).