Description
When reading a VOParquet file produced by a conformant non-astropy writer using format="parquet.votable", astropy emits a warning for every string column:
WARNING: No table::len::band found in metadata. Using longest string (1 characters). [astropy.io.misc.parquet]
Root cause
read_parquet_votable (parquet.py:640) reads the data payload by calling:
data_table_with_no_metadata = Table.read(filename, format="parquet")
This routes through read_table_parquet, which checks for table::len::<name> in the parquet file metadata for every pa.string() column. If it is not found, it scans the column data and throws out a warning.
As far as I understand table::len is an astropy-private convention written only by write_parquet_table to help round-trip numpy fixed-width string dtypes, and it is not part of the IVOA VOTable-in-Parquet specification.
Therefore any VOParquet producer not using astropy as a writer will never include these keys, and the warning fires unconditionally.
Suggested fix
Since read_parquet_votable knows it is reading a VOParquet file we could probably detect the presence of IVOA.VOTable-Parquet.version in read_table_parquet and skip the table::len warning
Alternatively, we could potentially extract the string column lengths from the embedded VOTable XML (where arraysize is already present on each <FIELD>) and pass them into the parquet reader so no scan is needed, but this is more effort and probably involves changing the read_table_parquet API.
Expected behavior
No warning should be emmitted when reading a file that contains IVOA.VOTable-Parquet.version in its metadata and was produced by a spec-conformant writer sincetable::len is not a VOParquet requirement.
How to Reproduce
- Using current astropy main or version 7.2.0:
import io
from astropy.table import Table
# read a VOParquet written by a TAP service (not astropy)
table = Table.read(parquet_bytes_or_path, format="parquet.votable")
# WARNING: No table::len::band found in metadata. Using longest string (1 characters).
Versions
import astropy
astropy.system_info()
platform
--------
platform.platform() = 'Linux-6.8.0-111-generic-x86_64-with-glibc2.35'
platform.version() = '#111~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 14 17:13:45 UTC '
platform.python_version() = '3.11.4'
packages
--------
astropy 7.2.0
numpy 2.1.3
scipy --
matplotlib 3.7.2
pandas 2.2.3
pyerfa 2.0.1.5
Description
When reading a VOParquet file produced by a conformant non-astropy writer using
format="parquet.votable", astropy emits a warning for every string column:Root cause
read_parquet_votable(parquet.py:640) reads the data payload by calling:This routes through
read_table_parquet, which checks fortable::len::<name>in the parquet file metadata for everypa.string()column. If it is not found, it scans the column data and throws out a warning.As far as I understand
table::lenis an astropy-private convention written only bywrite_parquet_tableto help round-trip numpy fixed-width string dtypes, and it is not part of the IVOA VOTable-in-Parquet specification.Therefore any VOParquet producer not using astropy as a writer will never include these keys, and the warning fires unconditionally.
Suggested fix
Since
read_parquet_votableknows it is reading a VOParquet file we could probably detect the presence ofIVOA.VOTable-Parquet.versioninread_table_parquetand skip thetable::lenwarningAlternatively, we could potentially extract the string column lengths from the embedded VOTable XML (where
arraysizeis already present on each<FIELD>) and pass them into the parquet reader so no scan is needed, but this is more effort and probably involves changing theread_table_parquetAPI.Expected behavior
No warning should be emmitted when reading a file that contains
IVOA.VOTable-Parquet.versionin its metadata and was produced by a spec-conformant writer sincetable::lenis not a VOParquet requirement.How to Reproduce
Versions