Skip to content

Commit 20db2c6

Browse files
Leonid Ryzhykryzhyk
authored andcommitted
Transactions.
This commit adds transaction support and closes issues: - #3925 - #4106 - #4108 - #4107 - #4109 - #4111 - #4112 I had to squash individual commit because for some reason the github merge queue rejects this PR otherwise. [dbsp] Micro/macro-step scheduling. Closes #4106. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Consensus. Factor the synchronization primitive that allows multiple workers to agree on a Boolean value into a separate commit. Previously this was only used to determine when a recursive circuit terminates, but we will have additional applications for this coming up, such as having workers agree on the completion of a transaction. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Generalize zset iterator. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Use async functions in operator traits. We now require rust 1.87, where async trait support is stable, so we can have simpler operator trait declarations. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] The Accumulator operator. Closes #4108. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Expose microsteps via DbspHandle. Closes #4109. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Wait for worker consensus to complete step. It's possible that some of the workers have flushed their operators but others need to perform additional steps. Wait for a global consensus before declaring the current transaction completed. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Remove unused declaration. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] trace and integrate_trace with accumulator. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Join with accumulator. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Upsert and aggregate. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Distinct with accumulators. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Add accumulators to group transformers. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] chain_aggregate with accumulator. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Improved debug output. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Rolling aggregate with accumulators. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] asof-join with accumulators. No splitting yet. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] join: use spines instead of batchers. Accumulate future updates in spines instead of batchers to take advantage of persistence and background merging. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Specialize join for untimed inputs. Join in the root circuit doesn't need to sort outputs by timestamp and is easier to implement as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Implement join as a splitter. This initial implementation preserves state in the operator struct. We will switch to using the `stream!` macro in the future to simplify the implementation. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Exchange: implement flush API. The exchange operator is flushed once all producers in all parallel threads are flushed. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] gather operator: implement flush API. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Split output of join. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] StreamingBinaryOperator. A wrapper that allows splitter operators to be implemented as async streams, avoiding the stack ripping problem. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] accumulate_apply2. A version of `apply2` that evaluates its accumulated inputs once per clock cycle. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Change NestedGenerator to support microsteps. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] TypedBatch::consolidate. `consolidate` method for TypedBatch<DynTrace>. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Fix Debug for SpineSnapshot. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Split output of aggregate. Re-implement aggregate as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Strongly typed SpineSnapshot. Introduce a version of TypedBatch that wraps SpineSnapshot. Implement `concat` and `consolidate` for `TypedBatch<SpineSnapshot>`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] accumulate_output operator. A version of `output` that accumulates output batches and outputs them once per clock cycles. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Implement DistinctIncremental as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Builder::num_tuples. Add a builder method that returns the number of tuples pushed to the builder. Used by splitters to determine when to produce the next output batch. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] DistinctIncrementalTotal. Implement DistinctIncrementalTotal as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] ChildCircuit's with arbitrary timestamps. We used to require ChildCircuit to have a timestamp type that is equal to Parent::Timestamp::Nested. We are going to introduce subcircuits that run on the same logical clock as the parent. This commit adds support for child circuits with arbitrary timestamps. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Remove duplicate import_stream implementation. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Backport bug fix to AccumulateZ1Trace. Backport #4262 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] non_incremental. Introduce an operator that allows evaluating non-incrmental operators in an incremental circuit. With the introduction of transactions it is no longer safe to combine different types of operators in the same circuit. This commit adds an operator that can run a non-incremental fragment in a special child circuit that gets evaluated once per clock tick on accumulated inputs. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] accumulate_integrate, accumulate_differentiate. Use non_incremental to implement accumulate_integrate, accumulate_differentiate. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Distinct tests. Use `non_incremental` to implement tests for the distinct operator that compare incremental and non-incremental implementations. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] GroupTransformer. Implement GroupTransformer as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Adapt replay tests to the microstep API. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] PartitionedTreeAggregate. Reimplement PartitionedTreeAggregate as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] PartitionedRollingAggregate. Reimplement PartitionedRollingAggregate as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Typo in a comment. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Macrostep Z1. The `MacrostepZ1` operator is similar to Z1; however it delays its output to the end of a macrostep, i.e., until the `flush` method is called by the scheduler. It latches the value received right after `flush` and outputs it at every microstep until the next `flush`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Reimplement window as a splitter. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Fix join test. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Typo in assertion. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Transaction API. Add API to the controller to start and commit a transaction. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] use accumulate_output in catalog.rs Controller only retrieves output at the end of a transaction. We therefore must use the accumulate_output operator. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Add type argument to WithSnapshot. Make batch type a type argument of `WithSnapshot`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Simplify the SerCollectionHandle API. Do not distinguish between handles that return individual batches and multiple batches (since all handles can not return multiple batches). The caller can use `concat` to merge outputs from multiple workers and apply `consolidate` to the resulting spine snapshot to convert it into a batch if necessary. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> DevTweaks::splitter_chunk_size_records. Make chunk size configurable. Useful for testing splitter implementations with small chunk sizes. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> TypedBatch::merge_batches A strongly typed version of `merge_batches`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Transaction tests: join operator. Test `join` with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Antijoin tests Test antijoin with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Distinct tests. Test distinct with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Aggregate tests. Test aggregate operators with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Group operator tests. Test group operators with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Rolling aggregate tests. Test rolling aggregates with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Rolling aggregate: merge output batches. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Fix macrostep_delay. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] transaction test. Test for the transaction API. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] impl Debug for Command. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Fix commit_transaction. - Wake up circuit thread when committing a transaction. - Continuously step the circuit in commit mode. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Transaction IDs. Improve transaction API to return transaction ids. Transaction id's can be used to distinguish different transaction instances to avoid committing a transaction started by a different client, but they are currently optional to use and can be ignored. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] fix lir test. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Doc fix Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [fxp] Use new API in tests. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] accumulate_differentiate for Z-sets. A version of `differentiate` that produces output once per transaction. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [sql] Compiler fixes for transactions. Switch to using the new transactional DBSP API in the compiler. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [sqllib] Fixup string interner test. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [doc] fix openapi spec Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com> [ci] add docs to pre-commit Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com> [dbsp] MacrostepGenerator. A version of Generator that passes a flag to the generator function when `flush` has been called. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Accumulator::eval_owned. Optimized Accumulator::eval implementation if the operator happens to receive an owned input. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Trace::insert_arc. Allows adding batches wrapped in an Arc to a trace without cloning them. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Batch::from_batch optimized for Arcs. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Implement join as StreamingBinaryOperator. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Remove confusing OrdXXX type aliases. We used to inconsistently alias VecXXX or FallbackXXX as OrdXXX. Only the latter is correct. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Better way to add timestamp to batches. This commit eliminates some unnecessary batch cloning and reduces memory usage, especially for large transactions. We sometimes need to convert an untimed batch into a timed batch when adding it to a spine; however when doing this in a top-level circuit, the timestamp is just `()`, so the batch can be added as is. The imlementation of this optimization was imperfect, sometimes causing the batch to be converted into a different kind of batch (fron non-indexed to indexed z-set) in the process. This commit solves this issue by having the batch itself specify its timed version. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Don't checkpoint connectors mid-transaction. The controller normally pauses all connectors once they become checkpointable when preparing to checkpoint the pipeline. This doesn't work well with transactions, because usually a transaction needs to ingest specific inputs before committing. We therefore add a condition that delays pausing connectors until the transaction is finished. It will also cause previously paused connectors to unpause if a transaction starts while the pipeline is preparing to checkpoint. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Reduce Delta output chunk size. Buffering 1M updates per connector can use up a lot of memories with a large number of connectors. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [ci] apply automatic fixes Signed-off-by: feldera-bot <feldera-bot@feldera.com> [dbsp+adapters] Transaction progress tracking. Infra to track the progress of transaction commits. For now, we periodically write progress to the log. Proper API/UI support to follow. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Report input/output batch size distribution. We reported the total number of input/output records for various operators, but now how they are distributed across input/output batches. This commit adds such metrics, which can be used for example to catch situations where very large batches are produced by operators. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Spine::insert_arc spills large batches. The initial implementation never pushed a batch to storage. This was done for performance reasons, since adding an Arc<Batch> to a spine is basically free, but blindly adding large batches can easily lead to very high memory usage, so we add logic to push large batches to storage similar to how we do it in `Spine::insert`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Fix: join produces tiny output batches. This fixes the following performance regression in the join operator with transactions: * The operator produces `>splitter_chunk_size` (unconsolidated) updates. * Consolidates updates into a batch. * The resulting batch can be very small, e.g., because join projects away most columns, e.g., 1 record int he extreme case. * The operator ends up producing one record per microstep, potentially generating an unnecessarily large number of tiny steps. The fix uses a batcher to pre-consolidate inputs before deciding to emit a batch. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] accumulate_concat operators. This addresses an interesting performance glitch in the antijoin operator, which showed up when evaluating long chains of left joins. The antijoin consiste of a linear and non-linear components whose outputs get subtracted in the end. Normally, the outputs nearly cancel out but if they are produced during different steps (as part of a transaction) this happens too late, generating a lot of work downstream. To solve this, we introduce a new concatenation operator that accumulates its inputs and evaluates them once per transaction. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] stream_antijoin. Implement non-incremental antijoin and use it to test incremental antijoin. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Bug in JoinTrace. Handle the situation when the delta input (the left input) arrives before the final version of the right input correctly by waiting for flush before processing inputs. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] circuit_builder: debugging code. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] AccumulateDelayTrace operator. An operator that returns a version of a trace delayed by one clock tick. This didn't use to require a separate operator, we just returned the output of the Z1 part of the integrator; however, that output is delayed by exactly one step, not an entire clock cycle. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Comments. TODO: add assertions for these conditions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Don't count empty buffered batches. Counting empty batches in the output buffer leads to a confusing behavior where the number of buffered batches continuous to grow but the number of buffered records remains at 0. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Doc formatting fix. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Proptest for the join operator. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Fix processed record accounting. When the output buffer is empty, report it as flushed to update processed record count on the endpoint without actually flushing it. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Comments and cleanup. Cleanup the transactions PR. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Simplify transaction state tracking in scheduler. Merge multiple flags in one enum. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [benchmarks] Remove tpc-h benchmarks. It will be replaced with a more comprehensive benchmark + test in `python/tests/integration_tests`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Simplify accumulate_differentiate. Reimplement accumultate_differentiate without using a non-iterative subcircuit. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Maintain output buffer in storage. We maintain output buffers as spines, which enables us to use background mergers to compact changes. However, since output buffers are maintained by output threads outside of the DBSP runtime, they didn't have access to the pipeline's storage and ended up completing all merges in memory. This could easily lead to OOM for very large buffers. To address this, we introduce the notion of an auxiliary thread that runs inside a DBSP runtime and has access to its storage backend, and we run output threads as auxiliary threads. One caveat with this is that the pipeline is required to release the storage backend as part of the `stop()` operation and therefore `stop` needs to wait for all output threads to terminate. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Convert output buffer to a snapshot. Once we're done accumulating changes in the output buffer and are about to send it to the output endpoint, it no longer makes sense to perform background merges on the buffer. We therefore convert it into a snapshot to avoid wasting resources. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Track idle time of a circuit. Add a metric that tracks the amount of time the circuit isn't executing a step. There are two sources of idle time: - The local circuit waiting for other workers to complete a step. - The entire multithreaded circuit waiting for the client to trigger a step. The new metric doesn't distinguish between the two; we'll need to add more instrumentation for that. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Join: iterate over the smaller collection. The implementation of join assumed that the delta input on the left is usually small and typically smaller that the full integral on the right. It therefore make sense to iterate over every key in delta and lookup matching keys in the integral. This assumption doesn't hold during backrfill when the integral can be empty, while the delta contains the entire contents of the left collection. In order to handle this case efficiently, we switch to choosing the side to iterate over dynamically based on the sizes of the inputs. Note that an alternative approach is to use seek_key on both sides to skip over keys missing on the other side; however in that case we wouldn't be able to use seek_key_exact. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [sql] accumulate_differentiate -> differentiate. We now have two forms of differentiate in DBSP: - `differentiate`: differentiates stream values across steps - `accumulate_differentiate`: accumulates changes within a transaction and differentiates accumulated changes across transactions. In incremental circuits generated by the compiler, differentiation can occur in two cases: - Differentiate the NOW stream to convert it into a stream of changes - Differentiate the output of a Generator used to inject a constant value in the circuit. In both cases, `differentiate` is the correct operator to use, not `accumulate_differentiate`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [py] Integration test for the now() function. Join now with a table with and without transactions. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [py] Constant table test. The compiler compiles `where t.company_id in <long list of constant values>` queries into a join with a constant table, created using a DBSP Generator operator to produce a stream of constant value followed by `differentiate` to convert it into a change stream. This test checks that the constant table is created correctly. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Fix completed record accounting. We adjust completed input accounting to transactions. Without transactions, every input is fully processed in the same step it is pushed to the pipeline. With transactions, an input is considered processed when the transaction during which it was ingested commits. The fix doesn't advance completed record count until the transaction commits. This commit also fixes an unrelated deadlock, which I think I observed before, but it became more reliably reproducible now. The lock that protects input endpoint state was acquired recursively in the following call chain: - input_chain (acquire read-lock to iterate over endpoint) - calls queue() on each endpoint - delta connector queue() method calls InpusConsumer::extended, which acquires the lock again. Although crossbeam::RwLock docs claim that taking the read-lock recursively panics, it actually works fine unless some other thread tries to grab the write lock in which case both threads will deadlock. To fix this, I switched to using parking_log::RwLock, which has a special read_recursive method. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [ci] apply automatic fixes Signed-off-by: feldera-bot <feldera-bot@feldera.com> [dbsp] Test for NOW as a chain_aggregate. This test shows how to implement the real-time clock as a chain_aggregate. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Stats for accumulator and output operators. Add input/output batch stats to AccumulateOutput and Accumulator operators. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Exchange wait times. This commit adds a new metric to the Exchange operator, tracking the amount of time this operator spends waiting for data from all peers. It's not yet clear how useful this metric is. We measure the wait time as the time since the sender part of exchange delivered messages to all peers until the receive part received all messages. However, this doesn't mean that the circuit was blocked during this time, since it may be evaluating other operators, so the total weight time across all exchanges can be several times higher than the total time the circuit spent blocked. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] cargo dox fix. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] Revert the empty output check. This check is no longer necessary (or correct) with the transactions API. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [fda] Enable transaction tests. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [py] Fix checkpoint(). The `Pipeline.checkpoint()` method returned checkpoint sequence number in one path and request status in the other. This commit fixes it to return sequence number consistently. Also add `is_complete()` method to check if the pipeline is complete without blocking. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [adapters] When starting from a checkpoint use new connector config. We used to pick up the checkpointed input and output connector configs. This was ok before we had backfill avoidance, since the only difference between the user-specified and checkpointed config could be in the HTTP connectors added dynamically after the pipeline was started. However, with backfill avoidance the checkpointed config may no longer apply as tables, views, and connectors could be added or deleted in the new program. We therefore ignore checkpointed configs and use the latest configs specified in the new pipeline. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [py] Add dev_tweaks to RuntimeConfig. Add the recently added dev_tweaks field to the class RuntimeConfig. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [py] Pipeline.modify. A method to modify the pipeline. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [py] TPC-H scale tests. A framework to run TPC-H on a large dataset with checkpoints and restarts and with output connectors. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] rustdoc fix. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [sql] Fixup profiling test. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [sql] Correctly GC input map with waterline. Call `integrate_trace_retain_values` on the output of `add_input_map_with_waterline` instead of `accumulate_integrate_trace_retain_values`. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [dbsp] Give AccumulateZ1Trace a distinct persistent id. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Cargo.lock Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> Dummy commit to nudge github. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com> [ci] apply automatic fixes Signed-off-by: feldera-bot <feldera-bot@feldera.com> Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
1 parent f795d0f commit 20db2c6

File tree

215 files changed

+14149
-5375
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

215 files changed

+14149
-5375
lines changed

.pre-commit-config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,15 @@ repos:
4545
language: system
4646
files: \.rs$
4747
pass_filenames: false
48+
# The reason this runs after the openapi update is because the openapi
49+
# generation adds documentation to e.g., the rest-api crate.
50+
- id: cargo-doc
51+
name: Cargo Documentation
52+
description: Generate documentation with warnings treated as errors
53+
entry: bash -c 'RUSTDOCFLAGS="-D warnings" cargo doc --no-deps'
54+
language: rust
55+
files: \.rs$
56+
pass_filenames: false
4857
- repo: https://github.com/astral-sh/ruff-pre-commit
4958
rev: v0.9.10
5059
hooks:

Cargo.lock

Lines changed: 35 additions & 32 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ xxhash-rust = "0.8.6"
254254
zip = "0.6.2"
255255
zstd = "0.12.0"
256256
backtrace = "0.3.75"
257+
parking_lot = "0.12.4"
257258

258259
[workspace.metadata.release]
259260
release = false

benchmark/feldera-sql/benchmarks/tpc-h/.gitignore

Lines changed: 0 additions & 2 deletions
This file was deleted.

benchmark/feldera-sql/benchmarks/tpc-h/generate.bash

Lines changed: 0 additions & 12 deletions
This file was deleted.

benchmark/feldera-sql/benchmarks/tpc-h/queries/q1.sql

Lines changed: 0 additions & 33 deletions
This file was deleted.

0 commit comments

Comments
 (0)