Skip to content

Bug: read_parquet_votable emits table::len warnings when reading IVOA-conformant VOParquet files #19783

@stvoutsin

Description

@stvoutsin

Description

When reading a VOParquet file produced by a conformant non-astropy writer using format="parquet.votable", astropy emits a warning for every string column:

WARNING: No table::len::band found in metadata. Using longest string (1 characters). [astropy.io.misc.parquet]

Root cause

read_parquet_votable (parquet.py:640) reads the data payload by calling:

data_table_with_no_metadata = Table.read(filename, format="parquet")

This routes through read_table_parquet, which checks for table::len::<name> in the parquet file metadata for every pa.string() column. If it is not found, it scans the column data and throws out a warning.

As far as I understand table::len is an astropy-private convention written only by write_parquet_table to help round-trip numpy fixed-width string dtypes, and it is not part of the IVOA VOTable-in-Parquet specification.
Therefore any VOParquet producer not using astropy as a writer will never include these keys, and the warning fires unconditionally.

Suggested fix

Since read_parquet_votable knows it is reading a VOParquet file we could probably detect the presence of IVOA.VOTable-Parquet.version in read_table_parquet and skip the table::len warning

Alternatively, we could potentially extract the string column lengths from the embedded VOTable XML (where arraysize is already present on each <FIELD>) and pass them into the parquet reader so no scan is needed, but this is more effort and probably involves changing the read_table_parquet API.

Expected behavior

No warning should be emmitted when reading a file that contains IVOA.VOTable-Parquet.version in its metadata and was produced by a spec-conformant writer sincetable::len is not a VOParquet requirement.

How to Reproduce

  1. Using current astropy main or version 7.2.0:
import io
from astropy.table import Table

# read a VOParquet written by a TAP service (not astropy)
table = Table.read(parquet_bytes_or_path, format="parquet.votable")
# WARNING: No table::len::band found in metadata. Using longest string (1 characters).

Versions

import astropy
astropy.system_info()
platform
--------
platform.platform() = 'Linux-6.8.0-111-generic-x86_64-with-glibc2.35'
platform.version() = '#111~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 14 17:13:45 UTC '
platform.python_version() = '3.11.4'

packages
--------
astropy              7.2.0
numpy                2.1.3
scipy                --
matplotlib           3.7.2
pandas               2.2.3
pyerfa               2.0.1.5

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions