Apache Iceberg version
0.10.0 (latest release)
Please describe the bug 🐞
Due to how iceberg-python does checks in certain places, I can't use libraries such as arro3 or polars without having to do a conversion and include pyarrow as a dependency. Here is such a case in table/__init__.py:
def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT, branch: Optional[str] = MAIN_BRANCH) -> None:
"""
Shorthand API for appending a PyArrow table to a table transaction.
Args:
df: The Arrow dataframe that will be appended to overwrite the table
snapshot_properties: Custom properties to be added to the snapshot summary
branch: Branch Reference to run the append operation
"""
try:
import pyarrow as pa
except ModuleNotFoundError as e:
raise ModuleNotFoundError("For writes PyArrow needs to be installed") from e
from pyiceberg.io.pyarrow import _check_pyarrow_schema_compatible, _dataframe_to_data_files
if not isinstance(df, pa.Table):
raise ValueError(f"Expected PyArrow table, got: {df}")
Can this be updated to use the capsule interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html ?
I can create a patch if this is something that will be accepted. Sorry for the new account, due to employer issues I can't use my "regular" one.
Willingness to contribute
Apache Iceberg version
0.10.0 (latest release)
Please describe the bug 🐞
Due to how
iceberg-pythondoes checks in certain places, I can't use libraries such asarro3orpolarswithout having to do a conversion and includepyarrowas a dependency. Here is such a case intable/__init__.py:Can this be updated to use the capsule interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html ?
I can create a patch if this is something that will be accepted. Sorry for the new account, due to employer issues I can't use my "regular" one.
Willingness to contribute