Skip to content

Commit 4cc1f9a

Browse files
committed
adapters: handle deletion vector in delta input connector
snapshot mode works fine with deletion vectors, but for follow mode we need parse and apply deletion vector ourselves to read parquet files, same with some cases in cdc mode Signed-off-by: Swanand Mulay <73115739+swanandx@users.noreply.github.com>
1 parent c58569a commit 4cc1f9a

9 files changed

Lines changed: 978 additions & 194 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/adapters/Cargo.toml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,15 @@ default = [
2727
"with-postgres-cdc",
2828
]
2929
with-kafka = ["rdkafka"]
30-
with-deltalake = ["deltalake", "deltalake-catalog-unity"]
30+
# `parquet/async` + `parquet/object_store` serve only the deletion-vector
31+
# reader (`integrated/delta_table/deletion_vector.rs`).
32+
with-deltalake = [
33+
"deltalake",
34+
"deltalake-catalog-unity",
35+
"roaring",
36+
"parquet/async",
37+
"parquet/object_store",
38+
]
3139
with-iceberg = ["feldera-iceberg"]
3240
with-pubsub = ["google-cloud-pubsub", "google-cloud-gax"]
3341
with-avro = [
@@ -202,6 +210,7 @@ sentry = { workspace = true }
202210
zip = { workspace = true }
203211
smallvec = { workspace = true }
204212
delta_kernel = { workspace = true }
213+
roaring = { workspace = true, optional = true }
205214
flate2 = { workspace = true }
206215
etl = { workspace = true, optional = true }
207216
etl-config = { workspace = true, optional = true }

crates/adapters/src/integrated/delta_table.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
mod deletion_vector;
12
mod input;
23
mod output;
34

0 commit comments

Comments
 (0)