Commit 20db2c6
Transactions.
This commit adds transaction support and closes issues:
- #3925
- #4106
- #4108
- #4107
- #4109
- #4111
- #4112
I had to squash individual commit because for some reason the github
merge queue rejects this PR otherwise.
[dbsp] Micro/macro-step scheduling.
Closes #4106.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Consensus.
Factor the synchronization primitive that allows multiple workers to
agree on a Boolean value into a separate commit. Previously this was
only used to determine when a recursive circuit terminates, but we will
have additional applications for this coming up, such as having workers
agree on the completion of a transaction.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Generalize zset iterator.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Use async functions in operator traits.
We now require rust 1.87, where async trait support is stable, so we can
have simpler operator trait declarations.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] The Accumulator operator.
Closes #4108.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Expose microsteps via DbspHandle.
Closes #4109.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Wait for worker consensus to complete step.
It's possible that some of the workers have flushed their operators but
others need to perform additional steps. Wait for a global consensus
before declaring the current transaction completed.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Remove unused declaration.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] trace and integrate_trace with accumulator.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Join with accumulator.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Upsert and aggregate.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Distinct with accumulators.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Add accumulators to group transformers.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] chain_aggregate with accumulator.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Improved debug output.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Rolling aggregate with accumulators.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] asof-join with accumulators.
No splitting yet.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] join: use spines instead of batchers.
Accumulate future updates in spines instead of batchers to take
advantage of persistence and background merging.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Specialize join for untimed inputs.
Join in the root circuit doesn't need to sort outputs by timestamp and
is easier to implement as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Implement join as a splitter.
This initial implementation preserves state in the operator struct. We
will switch to using the `stream!` macro in the future to simplify the
implementation.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Exchange: implement flush API.
The exchange operator is flushed once all producers in all parallel
threads are flushed.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] gather operator: implement flush API.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Split output of join.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] StreamingBinaryOperator.
A wrapper that allows splitter operators to be implemented as async
streams, avoiding the stack ripping problem.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] accumulate_apply2.
A version of `apply2` that evaluates its accumulated inputs once per
clock cycle.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Change NestedGenerator to support microsteps.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] TypedBatch::consolidate.
`consolidate` method for TypedBatch<DynTrace>.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Fix Debug for SpineSnapshot.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Split output of aggregate.
Re-implement aggregate as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Strongly typed SpineSnapshot.
Introduce a version of TypedBatch that wraps SpineSnapshot.
Implement `concat` and `consolidate` for `TypedBatch<SpineSnapshot>`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] accumulate_output operator.
A version of `output` that accumulates output batches and outputs them
once per clock cycles.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Implement DistinctIncremental as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Builder::num_tuples.
Add a builder method that returns the number of tuples pushed to the
builder. Used by splitters to determine when to produce the next output
batch.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] DistinctIncrementalTotal.
Implement DistinctIncrementalTotal as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] ChildCircuit's with arbitrary timestamps.
We used to require ChildCircuit to have a timestamp type that is equal
to Parent::Timestamp::Nested. We are going to introduce subcircuits that
run on the same logical clock as the parent. This commit adds support
for child circuits with arbitrary timestamps.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Remove duplicate import_stream implementation.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Backport bug fix to AccumulateZ1Trace.
Backport #4262
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] non_incremental.
Introduce an operator that allows evaluating non-incrmental operators in
an incremental circuit. With the introduction of transactions it is no
longer safe to combine different types of operators in the same circuit.
This commit adds an operator that can run a non-incremental fragment in
a special child circuit that gets evaluated once per clock tick on
accumulated inputs.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] accumulate_integrate, accumulate_differentiate.
Use non_incremental to implement accumulate_integrate,
accumulate_differentiate.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Distinct tests.
Use `non_incremental` to implement tests for the distinct operator that
compare incremental and non-incremental implementations.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] GroupTransformer.
Implement GroupTransformer as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Adapt replay tests to the microstep API.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] PartitionedTreeAggregate.
Reimplement PartitionedTreeAggregate as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] PartitionedRollingAggregate.
Reimplement PartitionedRollingAggregate as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Typo in a comment.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Macrostep Z1.
The `MacrostepZ1` operator is similar to Z1; however it delays its output
to the end of a macrostep, i.e., until the `flush` method is called by the
scheduler. It latches the value received right after `flush` and outputs it
at every microstep until the next `flush`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Reimplement window as a splitter.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Fix join test.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Typo in assertion.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Transaction API.
Add API to the controller to start and commit a transaction.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] use accumulate_output in catalog.rs
Controller only retrieves output at the end of a transaction. We
therefore must use the accumulate_output operator.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Add type argument to WithSnapshot.
Make batch type a type argument of `WithSnapshot`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Simplify the SerCollectionHandle API.
Do not distinguish between handles that return individual batches and
multiple batches (since all handles can not return multiple batches).
The caller can use `concat` to merge outputs from multiple workers and
apply `consolidate` to the resulting spine snapshot to convert it into a
batch if necessary.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
DevTweaks::splitter_chunk_size_records.
Make chunk size configurable. Useful for testing splitter
implementations with small chunk sizes.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
TypedBatch::merge_batches
A strongly typed version of `merge_batches`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Transaction tests: join operator.
Test `join` with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Antijoin tests
Test antijoin with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Distinct tests.
Test distinct with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Aggregate tests.
Test aggregate operators with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Group operator tests.
Test group operators with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Rolling aggregate tests.
Test rolling aggregates with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Rolling aggregate: merge output batches.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Fix macrostep_delay.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] transaction test.
Test for the transaction API.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] impl Debug for Command.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Fix commit_transaction.
- Wake up circuit thread when committing a transaction.
- Continuously step the circuit in commit mode.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Transaction IDs.
Improve transaction API to return transaction ids. Transaction id's can
be used to distinguish different transaction instances to avoid
committing a transaction started by a different client, but they are
currently optional to use and can be ignored.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] fix lir test.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Doc fix
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[fxp] Use new API in tests.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] accumulate_differentiate for Z-sets.
A version of `differentiate` that produces output once per transaction.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[sql] Compiler fixes for transactions.
Switch to using the new transactional DBSP API in the compiler.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[sqllib] Fixup string interner test.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[doc] fix openapi spec
Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
[ci] add docs to pre-commit
Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
[dbsp] MacrostepGenerator.
A version of Generator that passes a flag to the generator function when `flush` has been called.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Accumulator::eval_owned.
Optimized Accumulator::eval implementation if the operator happens to
receive an owned input.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Trace::insert_arc.
Allows adding batches wrapped in an Arc to a trace without cloning them.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Batch::from_batch optimized for Arcs.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Implement join as StreamingBinaryOperator.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Remove confusing OrdXXX type aliases.
We used to inconsistently alias VecXXX or FallbackXXX as OrdXXX. Only
the latter is correct.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Better way to add timestamp to batches.
This commit eliminates some unnecessary batch cloning and reduces memory
usage, especially for large transactions.
We sometimes need to convert an untimed batch into a timed batch when
adding it to a spine; however when doing this in a top-level circuit,
the timestamp is just `()`, so the batch can be added as is. The
imlementation of this optimization was imperfect, sometimes causing the
batch to be converted into a different kind of batch (fron non-indexed
to indexed z-set) in the process. This commit solves this issue by
having the batch itself specify its timed version.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Don't checkpoint connectors mid-transaction.
The controller normally pauses all connectors once they become
checkpointable when preparing to checkpoint the pipeline. This doesn't
work well with transactions, because usually a transaction needs to
ingest specific inputs before committing. We therefore add a condition
that delays pausing connectors until the transaction is finished. It
will also cause previously paused connectors to unpause if a transaction
starts while the pipeline is preparing to checkpoint.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Reduce Delta output chunk size.
Buffering 1M updates per connector can use up a lot of memories with a
large number of connectors.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[ci] apply automatic fixes
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
[dbsp+adapters] Transaction progress tracking.
Infra to track the progress of transaction commits. For now, we
periodically write progress to the log. Proper API/UI support to follow.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Report input/output batch size distribution.
We reported the total number of input/output records for various
operators, but now how they are distributed across input/output batches.
This commit adds such metrics, which can be used for example to catch
situations where very large batches are produced by operators.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Spine::insert_arc spills large batches.
The initial implementation never pushed a batch to storage. This was
done for performance reasons, since adding an Arc<Batch> to a spine is
basically free, but blindly adding large batches can easily lead to very
high memory usage, so we add logic to push large batches to storage
similar to how we do it in `Spine::insert`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Fix: join produces tiny output batches.
This fixes the following performance regression in the join operator
with transactions:
* The operator produces `>splitter_chunk_size` (unconsolidated) updates.
* Consolidates updates into a batch.
* The resulting batch can be very small, e.g., because join projects
away most columns, e.g., 1 record int he extreme case.
* The operator ends up producing one record per microstep, potentially
generating an unnecessarily large number of tiny steps.
The fix uses a batcher to pre-consolidate inputs before deciding to emit
a batch.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] accumulate_concat operators.
This addresses an interesting performance glitch in the antijoin
operator, which showed up when evaluating long chains of left joins.
The antijoin consiste of a linear and non-linear components whose
outputs get subtracted in the end. Normally, the outputs nearly cancel
out but if they are produced during different steps (as part of a
transaction) this happens too late, generating a lot of work downstream.
To solve this, we introduce a new concatenation operator that
accumulates its inputs and evaluates them once per transaction.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] stream_antijoin.
Implement non-incremental antijoin and use it to test incremental
antijoin.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Bug in JoinTrace.
Handle the situation when the delta input (the left input) arrives before the final
version of the right input correctly by waiting for flush before processing inputs.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] circuit_builder: debugging code.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] AccumulateDelayTrace operator.
An operator that returns a version of a trace delayed by one clock tick.
This didn't use to require a separate operator, we just returned the
output of the Z1 part of the integrator; however, that output is delayed
by exactly one step, not an entire clock cycle.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Comments.
TODO: add assertions for these conditions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Don't count empty buffered batches.
Counting empty batches in the output buffer leads to a confusing
behavior where the number of buffered batches continuous to grow but the
number of buffered records remains at 0.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Doc formatting fix.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Proptest for the join operator.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Fix processed record accounting.
When the output buffer is empty, report it as flushed to update processed
record count on the endpoint without actually flushing it.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Comments and cleanup.
Cleanup the transactions PR.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Simplify transaction state tracking in scheduler.
Merge multiple flags in one enum.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[benchmarks] Remove tpc-h benchmarks.
It will be replaced with a more comprehensive benchmark + test in
`python/tests/integration_tests`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Simplify accumulate_differentiate.
Reimplement accumultate_differentiate without using a non-iterative
subcircuit.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Maintain output buffer in storage.
We maintain output buffers as spines, which enables us to use background
mergers to compact changes. However, since output buffers are maintained
by output threads outside of the DBSP runtime, they didn't have access
to the pipeline's storage and ended up completing all merges in memory.
This could easily lead to OOM for very large buffers.
To address this, we introduce the notion of an auxiliary thread that
runs inside a DBSP runtime and has access to its storage backend, and we
run output threads as auxiliary threads.
One caveat with this is that the pipeline is required to release the
storage backend as part of the `stop()` operation and therefore `stop`
needs to wait for all output threads to terminate.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Convert output buffer to a snapshot.
Once we're done accumulating changes in the output buffer and are about
to send it to the output endpoint, it no longer makes sense to perform
background merges on the buffer. We therefore convert it into a snapshot
to avoid wasting resources.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Track idle time of a circuit.
Add a metric that tracks the amount of time the circuit isn't executing
a step.
There are two sources of idle time:
- The local circuit waiting for other workers to complete a step.
- The entire multithreaded circuit waiting for the client to trigger a step.
The new metric doesn't distinguish between the two; we'll need to add
more instrumentation for that.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Join: iterate over the smaller collection.
The implementation of join assumed that the delta input on the left is
usually small and typically smaller that the full integral on the right.
It therefore make sense to iterate over every key in delta and lookup
matching keys in the integral. This assumption doesn't hold during
backrfill when the integral can be empty, while the delta contains the
entire contents of the left collection. In order to handle this case
efficiently, we switch to choosing the side to iterate over dynamically
based on the sizes of the inputs.
Note that an alternative approach is to use seek_key on both sides to
skip over keys missing on the other side; however in that case we
wouldn't be able to use seek_key_exact.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[sql] accumulate_differentiate -> differentiate.
We now have two forms of differentiate in DBSP:
- `differentiate`: differentiates stream values across steps
- `accumulate_differentiate`: accumulates changes within a transaction
and differentiates accumulated changes across transactions.
In incremental circuits generated by the compiler, differentiation can occur in two
cases:
- Differentiate the NOW stream to convert it into a stream of changes
- Differentiate the output of a Generator used to inject a constant
value in the circuit.
In both cases, `differentiate` is the correct operator to use, not
`accumulate_differentiate`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[py] Integration test for the now() function.
Join now with a table with and without transactions.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[py] Constant table test.
The compiler compiles `where t.company_id in <long list of constant values>` queries
into a join with a constant table, created using a DBSP Generator operator to produce
a stream of constant value followed by `differentiate` to convert it into a change
stream.
This test checks that the constant table is created correctly.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Fix completed record accounting.
We adjust completed input accounting to transactions.
Without transactions, every input is fully processed in the same step it
is pushed to the pipeline. With transactions, an input is considered
processed when the transaction during which it was ingested commits. The
fix doesn't advance completed record count until the transaction
commits.
This commit also fixes an unrelated deadlock, which I think I observed
before, but it became more reliably reproducible now. The lock that
protects input endpoint state was acquired recursively in the following
call chain:
- input_chain (acquire read-lock to iterate over endpoint)
- calls queue() on each endpoint
- delta connector queue() method calls InpusConsumer::extended, which
acquires the lock again.
Although crossbeam::RwLock docs claim that taking the read-lock
recursively panics, it actually works fine unless some other thread
tries to grab the write lock in which case both threads will deadlock.
To fix this, I switched to using parking_log::RwLock, which has a
special read_recursive method.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[ci] apply automatic fixes
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
[dbsp] Test for NOW as a chain_aggregate.
This test shows how to implement the real-time clock as a
chain_aggregate.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Stats for accumulator and output operators.
Add input/output batch stats to AccumulateOutput and Accumulator
operators.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Exchange wait times.
This commit adds a new metric to the Exchange operator, tracking the
amount of time this operator spends waiting for data from all peers.
It's not yet clear how useful this metric is. We measure the wait time
as the time since the sender part of exchange delivered messages to all
peers until the receive part received all messages. However, this
doesn't mean that the circuit was blocked during this time, since it may
be evaluating other operators, so the total weight time across all
exchanges can be several times higher than the total time the circuit
spent blocked.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] cargo dox fix.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] Revert the empty output check.
This check is no longer necessary (or correct) with the transactions
API.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[fda] Enable transaction tests.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[py] Fix checkpoint().
The `Pipeline.checkpoint()` method returned checkpoint sequence number in
one path and request status in the other. This commit fixes it to return
sequence number consistently.
Also add `is_complete()` method to check if the pipeline is complete
without blocking.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[adapters] When starting from a checkpoint use new connector config.
We used to pick up the checkpointed input and output connector configs.
This was ok before we had backfill avoidance, since the only difference
between the user-specified and checkpointed config could be in the HTTP
connectors added dynamically after the pipeline was started. However,
with backfill avoidance the checkpointed config may no longer apply as
tables, views, and connectors could be added or deleted in the new
program.
We therefore ignore checkpointed configs and use the latest configs
specified in the new pipeline.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[py] Add dev_tweaks to RuntimeConfig.
Add the recently added dev_tweaks field to the class RuntimeConfig.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[py] Pipeline.modify.
A method to modify the pipeline.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[py] TPC-H scale tests.
A framework to run TPC-H on a large dataset with checkpoints and
restarts and with output connectors.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] rustdoc fix.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[sql] Fixup profiling test.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[sql] Correctly GC input map with waterline.
Call `integrate_trace_retain_values` on the output of
`add_input_map_with_waterline` instead of
`accumulate_integrate_trace_retain_values`.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[dbsp] Give AccumulateZ1Trace a distinct persistent id.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Cargo.lock
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Dummy commit to nudge github.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
[ci] apply automatic fixes
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>1 parent f795d0f commit 20db2c6
File tree
215 files changed
+14149
-5375
lines changed- benchmark/feldera-sql/benchmarks/tpc-h
- queries
- crates
- adapterlib/src
- errors
- adapters
- src
- controller
- integrated/delta_table
- server
- static_compile
- dbsp
- benches
- gdelt
- examples
- dist
- tutorial
- proptest-regressions/operator/dynamic
- src
- algebra/zset
- circuit
- schedule
- mono
- operator
- communication
- dynamic
- aggregate
- communication
- group
- time_series
- radix_tree
- profile
- storage/buffer_cache
- time
- trace
- cursor
- layers
- ord
- fallback
- file
- merge_batcher
- vec
- spine_async
- test
- utils
- fda
- fxp/src
- nexmark
- benches/nexmark
- src
- queries
- sqllib/src
- python
- feldera
- tests/workloads
- sql-to-dbsp-compiler
- SQL-compiler/src
- main/java/org/dbsp/sqlCompiler
- circuit/operator
- compiler
- backend
- dot
- rust
- visitors/outer/monotonicity
- test
- java/org/dbsp/sqlCompiler/compiler/sql
- streaming
- tools
- resources
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
215 files changed
+14149
-5375
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
48 | 57 | | |
49 | 58 | | |
50 | 59 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
| 257 | + | |
257 | 258 | | |
258 | 259 | | |
259 | 260 | | |
| |||
This file was deleted.
This file was deleted.
This file was deleted.
0 commit comments