Silent bootstrapping.#6241
Draft
ryzhyk wants to merge 9 commits into
Draft
Conversation
This fixes an issue when bootstrapping a table with a PK when there is a downstream operator attached to it that creates an integral of the same table. We ended up with two integrals that can both be used to replay the same stream, one with and one without an accumulator. We used to replay from the last registered replay source, which meant that if the second integral was added in the modified version of the program, it was empty and replay failed, despite the fact that the input integral could be used for replay. To make things worse, we report this error as the input table not being materialized, which is simply wrong. This commit adds a simle workaround that uses the first registered replay source (by refusing to register new replay sources for the same stream) and a regression test for this. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Fixes #4736 Bootstrapping used to be performed in a sequence of transactions. Depending on the program this could be inefficient due to redundant recomputation. In addition this produced multiple small batches of potentialy mutually canceling outputs. We now have all the infra needed to change this. This commit changes Z1Trace and AccumulateZ1Trace operator to behave as splitters, i.e., they replay their entire contents across multiple steps within the same transaction. We also get rid of the replay_step_size knob. Instead we use the existing splitter_chunk_size setting, which controls the number of records produced by splitters per step. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Improve TPC-H test in `checkpoint` mode, which can be used to torture-test bootstrapping: - Configurable number and size of test segments. With these new options we can scale the test up and down using the same dataset (typically TPC-H). - Check that views are initialized after bootstrapping. Example: Split views into 2 groups 10M records each. ``` uv run test_tpch.py --s3-bucket feldera-qa-data --s3-prefix tpc-h-100 --s3-region us-west-1 --mode checkpoint --num-segments 2 --segment-size 10000000 ``` Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Remove redundant TransactionPhase::Idle status. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Bootstrapping is considered completed when all replay sources are complete and the bootstrapping transaction has committed. The latter condition was missing. This did not cause any issues because we only called this function between transactions. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Fix #6091. Views that don't participate in bootstrapping don't update their snapshots until the first post-bootstrap transaction. As a result, a client making ad hoc queries right after bootstrapping completes could observe empty views. We fix this by: 1. Forcing an extra transaction after bootstrapping completes (this was already the case). 2. Maintaining `bootstrap_in_progress` status until the extra transaction commits. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Fix #6090. Add /silent_bootstrapping argument to /start and /approve calls, which disables output connectors during bootstrapping. The primary use case for this is bootstrapping the pipeline after a Feldera upgrade where the contents of views is not expected to change. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
7148c81 to
dffc5b1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #6090.
Add /silent_bootstrapping argument to /start and /approve calls, which disables output connectors during bootstrapping. The primary use case for this is bootstrapping the pipeline after a Feldera upgrade where the contents of views is not expected to change.
Describe Manual Test Plan
Checklist
Breaking Changes?
Mark if you think the answer is yes for any of these components:
Describe Incompatible Changes