Skip to content

Fix incorrect table::len warnings when reading non-astropy VOParquet files#19817

Open
stvoutsin wants to merge 1 commit into
astropy:mainfrom
stvoutsin:issue-19783-voparquet-strlen
Open

Fix incorrect table::len warnings when reading non-astropy VOParquet files#19817
stvoutsin wants to merge 1 commit into
astropy:mainfrom
stvoutsin:issue-19783-voparquet-strlen

Conversation

@stvoutsin
Copy link
Copy Markdown
Contributor

Description

This pull request fixes a bug where read_parquet_votable incorrectly emits AstropyUserWarning messages for every string column when reading VOParquet files produced by non-astropy writers (e.g. TAP services).

read_parquet_votable delegates the raw data load to read_table_parquet via Table.read(filename, format="parquet"). That code path looks for table::len::<name> keys in the Parquet file metadata to determine string column widths. Those keys are an astropy-private convention written only by write_parquet_table, but not part of the IVOA VOTable-in-Parquet specification.

So conformant VOParquet files not produced by astropy will not include these keys, leading to warnings incorrectly being fired for any such files.

The fix here adds a string_lengths parameter to read_table_parquet that lets callers supply column widths directly, bypassing the table::len lookup and the associated scans & warnings.

read_parquet_votable uses this parameter by extracting the widths from the arraysize attributes of the <FIELD> elements in the embedded VOTable XML, as that information should always be present in every conformant VOParquet file.

Fixes #19783


Changes

astropy/io/misc/parquet.py

  • read_table_parquet: new string_lengths: dict[str, int | None] | None parameter. When a column name is present in the dict, the value is used as the string width (None means variable-length). The table::len metadata lookup and scan/warning are only reached when the column is absent from string_lengths.
  • read_parquet_votable: parses the embedded VOTable XML, builds a complete string_lengths dict from <FIELD arraysize="..."> attributes, and passes it to the Table.read(filename, format="parquet") call.

astropy/io/misc/tests/test_parquet_votable.py

  • Added _write_voparquet helper that writes a minimal VOParquet file using
    pyarrow directly, simulating a TAP service (no table::len written).
  • New tests covering various cases.
  • Updated test_compare_parquet_votable to assert that VOParquet reads never
    emit the warning (plain parquet reads still do).

@stvoutsin stvoutsin force-pushed the issue-19783-voparquet-strlen branch from ab9e516 to be75b06 Compare May 27, 2026 17:36
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@stvoutsin stvoutsin force-pushed the issue-19783-voparquet-strlen branch 3 times, most recently from 3f38461 to 88288a5 Compare May 27, 2026 18:18
@stvoutsin stvoutsin marked this pull request as ready for review May 27, 2026 19:09
@stvoutsin stvoutsin force-pushed the issue-19783-voparquet-strlen branch from 88288a5 to 9ed9fad Compare May 29, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: read_parquet_votable emits table::len warnings when reading IVOA-conformant VOParquet files

1 participant