Commit 4189b8a
Dynamic dispatch
This commit contains a rewrite of the DBSP library to use dynamic
dispatch.
Background
==========
DBSP operators manipulate batches of records that represent keys, values, and weights
in various collections. Each record is represented as a concrete Rust type, e.g., a struct
or a tuple. Operators are parameterized by the types of their input and output batches,
which are in turn parameterized by the types of keys, values, and weights in the batch.
This means that, had we used static typing (which is the default in Rust), the Rust
compiler would generate a specialized implementation of each operator for each combination
of input and output types that occur in the program. In an earlier version of this
library, this approach led to very long compilation times for even medium-sized programs.
The solution we adopt relies on dynamic dispatch instead. We bundle primitive operations
on types (comparison, cloning, etc.) as object-safe traits and expose them to operators
as trait object. This way we are able to achieve faster compilation without sacrificing
much performance.
Implementing this approach involves some machinery. First, Trait objects cannot be
efficiently combined into any kind of containers (vectors, sets, or even tuples or structs)
without wrapping each object in a `Box` or some other heap allocated type, leading to many
small allocations. To avoid this, we represent such containers as separate trait objects.
For instance, options (`Option<T>`), 2-tuples (`(T1, T2)`), vectors (`Vec<T>`), and sets
(`BTreeSet<T>`) are modeled as traits.
Second, this design introduces unsafe code that cannot be nicely encapsulated. By
replacing concrete types with trait objects, we lose the ability to distinguish between trait
objects backed by different types. Operations like comparison and cloning must downcast
their arguments to the same concrete type. This can be done safely by checking the type id's of
the arguments, but this check is expensive in Rust when performed on the critical
path. We therefore elide this check in release builds, making all such operations unsafe.
Third, operators require the ability to create new values, which in turn requires passing
a **factory** object for each type the operator may need to instantiate. Most operators
(as well as low-level data structures they use internally like batches and traces) require
multiple factory objects.
Finally, since trait objects don't have a statically known size, they cannot be returned by
value (without boxing them, which we don't want to do in most cases). We therefore pass mutable
references for functions to write return values to. This requires the caller to allocate
space for the return value using a factory object.
Likewise, trait objects cannot be passed by value; therefore functions that would normally
take an owned value instead take a mutable reference, which they are allowed to use without
cloning, (see, e.g., `ClonableTrait::move_to`, which moves the value, leaving the default
value in its place.
Trait hierarchy
===============
Traits and types used to implement dynamic dispatch are found in `/src/dynamic`.
We encapsulate common operations in a hierarchy of object-safe traits, i.e., traits for which
the compiler can build vtables and expose them as trait objects (`dyn <Trait>`).
Basic traits
============
The following traits must be implemented for all values that DBSP can compute on:
* `AsAny` - convert a reference to a type to `&dyn Any`.
* `Clonable` - an object-safe version of `Clone`.
* `Comparable` - an object-safe version of `Ord`.
* `SerializeDyn` - an object-safe version of `rkyv::Serialize`.
* `DeserializableDyn` - an object-safe version of `rkyv::Deserialize`.
* `Data` - the lowest common denominator for all types that DBSP can compute on.
Combines all of the above traits, along with `Debug`, `SizeOf`, and `Send`. This
is an object-safe version of `DBData`.
In addition, the `Erase` trait is provided to convert concrete types into trait objects.
Trait object traits
===================
The following traits are meant to be implemented by **trait objects** rather than regular
types, i.e., an instance of one of these traits is a `dyn Trait` (which is a type in Rust).
The reason we need these traits is ergonomics, since not all useful behaviors can be
captured nicely in an object-safe way. For instance `Eq`, `Ord`, and `Clone` traits
are not object-safe, so we cannot expose them directly via vtables. We instead introduce
`Comparable` and `Clonable` traits mentioned above. These traits define an unsafe
API that accepts the second argument as a raw byte pointer (`*mut u8`). These are not
meant to be used directly. Instead, we use them to implement `Eq`, `Ord`, and `Clone` for
trait objects
* `DowncastTrait` - cast a trait object reference to a concrete type.
* `ClonableTrait` - trait for trait objects whose inner type can be cloned and moved.
* `ArchiveTrait` - trait for trait objects that can be serialized and deserialized with `rkyv`.
* `DataTrait` - the lowest common denominator for all trait objects that DBSP can compute on.
Combines all of the above traits, along with `Data`. This
is the trait object version of `crate::DBData`.
Creating trait objects
======================
There are three ways to create a trait object:
* The `Factory` trait can be used to allocate an instance of a concrete type with the default
value on the heap or on the stack and return a reference to it as a trait object.
* The `Erase` trait converts a reference to a concrete type into a trait object.
* `DeserializableDyn` and `DeserializeDyn` deserialize values of the concrete type from
a byte array or from an archived representation respectively and returns them as trait
objects.
Derived traits
==============
The traits described above are applicable to all DBSP values. The following traits specify
extended functionality available for certain types and their combinations.
* `DynWeight` - types with the addition operation and a zero element. This is an object-safe
version of `DBWeight`.
* `DynOpt` - a dynamically typed version of the `Option` type.
* `DynPair` - a dynamically typed version of a two-tuple `(T1, T2)`.
* `DynVec` - a dynamically typed vector.
* `DynSet` - a dynamically typed B-tree set.
* `DynPairs` - a vector of pairs.
* `DynWeightedPairs` - a vector of key-value pairs, where the value behaves as weight,
meaning that tuples with the same key can be consolidated by adding their weights.
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>1 parent dfa7e85 commit 4189b8a
File tree
300 files changed
+42816
-33672
lines changed- crates
- adapters/src
- format
- json
- parquet
- static_compile
- test
- dbsp
- benches
- gdelt
- ldbc-graphalytics
- examples
- dist
- tutorial
- src
- algebra
- zset
- circuit
- dynamic
- operator
- aggregate
- communication
- dynamic
- aggregate
- communication
- group
- time_series
- radix_tree
- group
- time_series
- radix_tree
- storage
- backend
- monoio_impl
- bin
- buffer_cache
- file
- time
- trace
- cursor
- layers
- column_layer
- erased
- erased_key_leaf
- erased_leaf
- file
- column_layer
- ordered
- ordered
- ord
- file
- merge_batcher
- vec
- persistent
- tests
- test
- utils
- consolidation
- tests
- tuple
- feldera-storage
- proptest-regressions
- backend
- glommio_impl
- monoio_impl
- buffer_cache
- nexmark
- benches/nexmark
- src
- generator
- queries
- scripts
- sql-to-dbsp-compiler
- SQL-compiler/src
- main/java/org/dbsp/sqlCompiler
- circuit/operator
- compiler
- backend/rust
- frontend
- visitors
- inner/monotone
- outer
- ir
- type
- test/java/org/dbsp/sqlCompiler/compiler/sql
- lib
- hashing/src
- readers/src
- sqllib/src
- sqlvalue/src
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
300 files changed
+42816
-33672
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | 5 | | |
7 | 6 | | |
8 | 7 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
134 | | - | |
135 | 134 | | |
136 | 135 | | |
137 | 136 | | |
| |||
177 | 176 | | |
178 | 177 | | |
179 | 178 | | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | 179 | | |
184 | | - | |
185 | 180 | | |
186 | | - | |
187 | 181 | | |
188 | 182 | | |
189 | 183 | | |
| |||
288 | 282 | | |
289 | 283 | | |
290 | 284 | | |
291 | | - | |
292 | 285 | | |
293 | 286 | | |
294 | 287 | | |
| |||
514 | 507 | | |
515 | 508 | | |
516 | 509 | | |
517 | | - | |
518 | 510 | | |
519 | 511 | | |
520 | 512 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
| 58 | + | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
67 | | - | |
| 66 | + | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| 157 | + | |
| 158 | + | |
157 | 159 | | |
158 | 160 | | |
159 | 161 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
18 | | - | |
| 17 | + | |
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
| |||
376 | 375 | | |
377 | 376 | | |
378 | 377 | | |
379 | | - | |
380 | 378 | | |
381 | 379 | | |
382 | 380 | | |
383 | 381 | | |
384 | 382 | | |
385 | 383 | | |
386 | 384 | | |
387 | | - | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
388 | 388 | | |
389 | | - | |
| 389 | + | |
390 | 390 | | |
391 | 391 | | |
392 | 392 | | |
393 | 393 | | |
394 | | - | |
395 | | - | |
396 | | - | |
| 394 | + | |
397 | 395 | | |
398 | 396 | | |
399 | 397 | | |
| |||
482 | 480 | | |
483 | 481 | | |
484 | 482 | | |
485 | | - | |
| 483 | + | |
486 | 484 | | |
487 | 485 | | |
488 | 486 | | |
| |||
502 | 500 | | |
503 | 501 | | |
504 | 502 | | |
505 | | - | |
| 503 | + | |
506 | 504 | | |
507 | 505 | | |
508 | 506 | | |
| |||
518 | 516 | | |
519 | 517 | | |
520 | 518 | | |
521 | | - | |
| 519 | + | |
522 | 520 | | |
523 | 521 | | |
524 | 522 | | |
| |||
588 | 586 | | |
589 | 587 | | |
590 | 588 | | |
591 | | - | |
| 589 | + | |
592 | 590 | | |
593 | 591 | | |
594 | | - | |
| 592 | + | |
595 | 593 | | |
596 | 594 | | |
597 | 595 | | |
| |||
600 | 598 | | |
601 | 599 | | |
602 | 600 | | |
603 | | - | |
| 601 | + | |
604 | 602 | | |
605 | 603 | | |
606 | 604 | | |
| |||
611 | 609 | | |
612 | 610 | | |
613 | 611 | | |
614 | | - | |
| 612 | + | |
615 | 613 | | |
616 | 614 | | |
617 | 615 | | |
| |||
620 | 618 | | |
621 | 619 | | |
622 | 620 | | |
623 | | - | |
| 621 | + | |
624 | 622 | | |
625 | 623 | | |
626 | 624 | | |
| |||
631 | 629 | | |
632 | 630 | | |
633 | 631 | | |
634 | | - | |
| 632 | + | |
635 | 633 | | |
636 | 634 | | |
637 | 635 | | |
| |||
640 | 638 | | |
641 | 639 | | |
642 | 640 | | |
643 | | - | |
| 641 | + | |
644 | 642 | | |
645 | 643 | | |
646 | 644 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
469 | 469 | | |
470 | 470 | | |
471 | 471 | | |
472 | | - | |
| 472 | + | |
473 | 473 | | |
474 | 474 | | |
475 | 475 | | |
| |||
0 commit comments