Skip to content

Silent bootstrapping.#6241

Draft
ryzhyk wants to merge 9 commits into
transactional_bootstrappingfrom
issue6090
Draft

Silent bootstrapping.#6241
ryzhyk wants to merge 9 commits into
transactional_bootstrappingfrom
issue6090

Conversation

@ryzhyk
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk commented May 14, 2026

Fix #6090.

Add /silent_bootstrapping argument to /start and /approve calls, which disables output connectors during bootstrapping. The primary use case for this is bootstrapping the pipeline after a Feldera upgrade where the contents of views is not expected to change.

Describe Manual Test Plan

Checklist

  • Unit tests added/updated
  • Integration tests added/updated
  • Documentation updated
  • Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

Describe Incompatible Changes

ryzhyk and others added 9 commits May 14, 2026 12:00
This fixes an issue when bootstrapping a table with a PK when there
is a downstream operator attached to it that creates an integral of the
same table.

We ended up with two integrals that can both be used to replay the
same stream, one with and one without an accumulator. We used to replay
from the last registered replay source, which meant that if the second
integral was added in the modified version of the program, it was empty
and replay failed, despite the fact that the input integral could be used
for replay.

To make things worse, we report this error as the input table not being
materialized, which is simply wrong.

This commit adds a simle workaround that uses the first registered replay
source (by refusing to register new replay sources for the same stream)
and a regression test for this.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Fixes #4736

Bootstrapping used to be performed in a sequence of transactions. Depending on
the program this could be inefficient due to redundant recomputation. In
addition this produced multiple small batches of potentialy mutually canceling
outputs.

We now have all the infra needed to change this. This commit changes Z1Trace
and AccumulateZ1Trace operator to behave as splitters, i.e., they replay their
entire contents across multiple steps within the same transaction.

We also get rid of the replay_step_size knob. Instead we use the existing
splitter_chunk_size setting, which controls the number of records produced by
splitters per step.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Improve TPC-H test in `checkpoint` mode, which can be used to torture-test bootstrapping:

- Configurable number and size of test segments. With these new options we can
  scale the test up and down using the same dataset (typically TPC-H).
- Check that views are initialized after bootstrapping.

Example:

Split views into 2 groups 10M records each.

```
uv run test_tpch.py --s3-bucket feldera-qa-data --s3-prefix tpc-h-100 --s3-region us-west-1 --mode checkpoint --num-segments 2 --segment-size 10000000
```

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Remove redundant TransactionPhase::Idle status.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Bootstrapping is considered completed when all replay sources are complete and
the bootstrapping transaction has committed. The latter condition was missing.
This did not cause any issues because we only called this function between
transactions.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Fix #6091.

Views that don't participate in bootstrapping don't update their snapshots
until the first post-bootstrap transaction. As a result, a client making ad hoc
queries right after bootstrapping completes could observe empty views.

We fix this by:
1. Forcing an extra transaction after bootstrapping completes
   (this was already the case).
2. Maintaining `bootstrap_in_progress` status until the extra transaction
   commits.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Fix #6090.

Add /silent_bootstrapping argument to /start and /approve calls, which disables
output connectors during bootstrapping. The primary use case for this is
bootstrapping the pipeline after a Feldera upgrade where the contents of views
is not expected to change.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
@ryzhyk ryzhyk force-pushed the transactional_bootstrapping branch from 7148c81 to dffc5b1 Compare May 15, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants