You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Delta output connector did not create periodic checkpoints.
While this is in itself problematic, it also meant that the connector
became slow over time, due to this delta-rs bug, which causes the
`update_incremental` function to scan the entire transaction log on
every commit:
delta-io/delta-kernel-rs#2103.
This commit:
- Introduces the `checkpoint_interval` option, which tells
the connector to configure checkpoint interval when creating
the table.
- Creates a CommitBuilder that is actually setup to create
checkpoints.
Without this fix the time to create a trivial delta commit increases from
1.5s to 6s after ~1000 commits. With the fix it remains constant at
~2s.
Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
|`mode`*| Determines how the Delta table connector handles an existing table at the target location. Options: |
76
-
|| - `append`: New updates will be appended to the existing table at the target location. |
76
+
|| - `append`: New updates will be appended to the existing table at the target location. If the table doesn't exist, it will be created. |
77
77
|| - `truncate`: Existing table at the specified location will be truncated. The connector achieves this by outputting delete actions for all files in the latest snapshot of the table. |
78
78
|| - `error_if_exists`: If a table exists at the specified location, the operation will fail. |
79
+
|`checkpoint_interval`| <p>Checkpoint interval (i.e., the number of commits after which a new checkpoint should be created) for newly created Delta tables.</p><p>The option is only available when creating the Delta table (`mode = append` and there is no existing table at the target location or `mode = truncate`). It configures the `checkpointInterval` table property, which determines the number of commits after which a new checkpoint should be created.</p><p>0 means no checkpoints are created.</p><p>Default: 10.</p>|
79
80
|`max_retries`|<p>Maximum number of retries for failed Delta Lake operations like writing Parquet files and committing transactions.</p><p>The connector performs retries on several levels: individual S3 operations, Delta Lake transaction commits, and overall operation retries. This setting controls the overall operation retries. When a write to the table fails, because of an S3 timeout or any other reason that was not resolved by lower-level retries, the connector will retry the entire operation.</p><p>When not specified, the connector performs infinite retries. When set to 0, the connector doesn't retry failed operations.</p>|
80
81
|`threads`| Number of parallel threads used by the connector. Increasing this value can improve Delta Lake write throughput by enabling concurrent writes. Default: `1`. |
Copy file name to clipboardExpand all lines: openapi.json
+7Lines changed: 7 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -7997,6 +7997,13 @@
7997
7997
"uri"
7998
7998
],
7999
7999
"properties": {
8000
+
"checkpoint_interval": {
8001
+
"type": "integer",
8002
+
"format": "int32",
8003
+
"description": "Checkpoint interval (i.e., the number of commits after which a new checkpoint should be created) for newly created Delta tables.\n\nThe option is only available when creating the Delta table (`mode = append` and there\nis no existing table at the target location or `mode = truncate`). It configures the `checkpointInterval`\ntable property, which determines the number of commits after which a new checkpoint should be created.\n\n0 means no checkpoints are created.\n\nDefault: 10.",
0 commit comments