Apache Iceberg version
1.10.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
Problem
When creating Iceberg tables through Spark with S3 location, the table location contains double slashes (//) in the path. This behavior occurs regardless of whether the table is partitioned or not. The double slash in the path causes the OPTIMIZE operation to behave incorrectly and sometimes delete files, leading to table corruption.
Example
CREATE TABLE test.test.test (
test_rk integer
)
WITH (
compression_codec = 'ZSTD',
format = 'PARQUET',
format_version = 2,
location = 's3a://test//test_8262bea6c787' -- ⚠️ Double slash after 'test/'
)
Example 2
CREATE TABLE test.test.test (
test_rk decimal(21, 0),
test real,
test_l varchar,
test_stat real,
test_id integer
)
WITH (
compression_codec = 'ZSTD',
format = 'PARQUET',
format_version = 2,
location = 's3a://test//test',
partitioning = ARRAY['test_id']
)
| Component |
Version |
| iceberg-spark-runtime |
3.5_2.12-1.10.1 |
| iceberg-aws-bundle |
1.10.1 |
| Spark |
3.5.6 |
| Filesystem |
S3A |
simple spark query: df.format('iceberg').mode('overwrite').saveAsTable("test.test.test")
Willingness to contribute
Apache Iceberg version
1.10.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
Problem
When creating Iceberg tables through Spark with S3 location, the table location contains double slashes (
//) in the path. This behavior occurs regardless of whether the table is partitioned or not. The double slash in the path causes theOPTIMIZEoperation to behave incorrectly and sometimes delete files, leading to table corruption.Example
Example 2
simple spark query:
df.format('iceberg').mode('overwrite').saveAsTable("test.test.test")Willingness to contribute