Skip to content

Allow Arrow Capsule Interface #2680

@DisturbedOcean

Description

@DisturbedOcean

Apache Iceberg version

0.10.0 (latest release)

Please describe the bug 🐞

Due to how iceberg-python does checks in certain places, I can't use libraries such as arro3 or polars without having to do a conversion and include pyarrow as a dependency. Here is such a case in table/__init__.py:

def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT, branch: Optional[str] = MAIN_BRANCH) -> None:
        """
        Shorthand API for appending a PyArrow table to a table transaction.

        Args:
            df: The Arrow dataframe that will be appended to overwrite the table
            snapshot_properties: Custom properties to be added to the snapshot summary
            branch: Branch Reference to run the append operation
        """
        try:
            import pyarrow as pa
        except ModuleNotFoundError as e:
            raise ModuleNotFoundError("For writes PyArrow needs to be installed") from e

        from pyiceberg.io.pyarrow import _check_pyarrow_schema_compatible, _dataframe_to_data_files

        if not isinstance(df, pa.Table):
            raise ValueError(f"Expected PyArrow table, got: {df}")

Can this be updated to use the capsule interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html ?

I can create a patch if this is something that will be accepted. Sorry for the new account, due to employer issues I can't use my "regular" one.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions