Skip to content

Add support for bitmap based filters#6007

Open
gz wants to merge 3 commits intomainfrom
roaring
Open

Add support for bitmap based filters#6007
gz wants to merge 3 commits intomainfrom
roaring

Conversation

@gz
Copy link
Copy Markdown
Contributor

@gz gz commented Apr 8, 2026

This PR adds three things:

1st commit: A bitmap filter based on the roaring crate with

  • Necessary plumbing so we can have multiple filters in front of our files and batches
  • Necessary plumbing so we can decide if we should use the roaring filter or fall back to the bloom filter
    • This needed us to consistently have the min/max for every per batch
    • A heuristic that estimates when it's better to use roaring than bloom (see 2nd commit): High-level answer is that it's better when the set of keys in the batch is dense

2nd commit: A benchmark that validates the predictor function we use to choose between roaring and bloom filters (2k lines, this code can be mostly ignored for the PR review)

3rd commit: Code that adjusts storage files by setting an incompatible feature flag

Describe Manual Test Plan

Ran many pipelines

Checklist

  • Unit tests added/updated

Breaking Changes?

Ideally no

Describe Incompatible Changes

File format has new incompatible feature, old versions will refuse to read the files with a roaring filter.
Users should not downgrade once they go beyond this version without clearing storage.

@gz gz changed the title Roaring Add support for bitmap based filter Apr 8, 2026
@gz gz changed the title Add support for bitmap based filter Add support for bitmap based filters Apr 8, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft: two design questions, see inline.

@gz
Copy link
Copy Markdown
Contributor Author

gz commented Apr 8, 2026

datagen

CREATE TABLE simple (
    payload [ BIGINT | INTEGER ] NOT NULL PRIMARY KEY
) WITH (
  'materialized' = 'true',
  'connectors' = '[{"name":"data","transport":{"name":"datagen","config":{"workers":9,"plan":[{"fields":{"payload":{"strategy":"[ increment | uniform ]"}}}]}}}]'
);

original is v0.281.0 and roaring branch is the current platform runtime on 92e4a95. enable_roaring=true was set on all runs.

Bench Summary

Workload Original Throughput Roaring Branch Throughput Δ Throughput Original Mem Roaring Branch Mem Δ Mem Original Storage Roaring Branch Storage Δ Storage
BIGINT + increment 4.21M/s 4.69M/s +11.3% 12.33 GiB 8.88 GiB -28.0% 22.22 GiB 25.20 GiB +13.4%
BIGINT + uniform 2.31M/s 2.29M/s -0.8% 7.49 GiB 7.70 GiB +2.9% 21.18 GiB 19.63 GiB -7.3%
INTEGER + increment 4.34M/s 4.71M/s +8.4% 11.86 GiB 8.58 GiB -27.7% 22.38 GiB 22.85 GiB +2.1%
INTEGER + uniform 2.18M/s 2.16M/s -1.2% 7.03 GiB 7.19 GiB +2.3% 11.95 GiB 12.58 GiB +5.3%
  • increment workloads improved on the roaring branch: better throughput and much lower memory.
  • uniform workloads in the noise on throughput, memory.

Filter Hit/Miss Comparison:

Workload Original Roaring Branch Read
BIGINT + increment Range: 0 hits / 4.340B misses, 100% miss rate. Bloom: 0/0. Roaring: 0/0. Range: 0 / 4.268B, 100% miss rate. Bloom: 0/0. Roaring: 0/0. Useful filtering is all in the range filter for both versions.
BIGINT + uniform Bloom: 185.6K hits / 1.845B misses, 99.9899% miss rate. Range: 1,177 misses on 1.845B passes, 0.000064% miss rate. Bloom: 152.6K / 1.506B, 99.9899% miss rate. Range: 1,121 misses on 1.507B passes, 0.000074% miss rate. Bloom is doing almost all of the useful rejection.
INTEGER + increment Range: 0 / 3.649B, 100% miss rate. Bloom: 0/0. Roaring: 0/0. Range: 0 / 3.413B, 100% miss rate. Bloom: 0/0. Roaring: 0/0. Same pattern as BIGINT + increment.
INTEGER + uniform Bloom: 15.64M hits / 785.0M misses, 98.05% miss rate. Range: 437 misses on 800.7M passes, 0.000055% miss rate. Bloom: 16.56M / 789.0M, 97.94% miss rate. Range: 414 misses on 805.5M passes, 0.000051% miss rate. Same pattern as BIGINT + uniform.

Takeaways:

  • higher miss rate is better here it means the filter is rejecting more lookups.
  • In these runs, the measurable rejection work came from range for increment and bloom for uniform; roaring hit/miss counters themselves stayed 0 in the final samples.
  • throuhgput increase most likely from building the bitmap filters faster than the bloom filters for increment workloads

postgres

original is v0.281.0 and roaring branch is the platform runtime on 92e4a95.

I loaded all four Postgres tables to 200,000,000 rows each. It's very slow.

Bench Summary

Workload Original Tput Roaring Branch Tput Δ Tput Original Mem Roaring Branch Mem Δ Mem Original Storage Roaring Branch Storage Δ Storage
INT random 1.080M/s 1.075M/s -0.49% 24.43 GiB 24.17 GiB -1.09% 3.79 GiB 3.85 GiB +1.59%
BIGINT random 1.052M/s 1.058M/s +0.53% 25.09 GiB 25.13 GiB +0.14% 3.88 GiB 3.89 GiB +0.42%
INT sequence 1.081M/s 1.087M/s +0.56% 24.57 GiB 24.68 GiB +0.45% 3.81 GiB 3.62 GiB -5.05%
BIGINT sequence 1.069M/s 1.099M/s +2.77% 25.00 GiB 24.86 GiB -0.58% 3.96 GiB 3.40 GiB -14.03%

Filter Read
Miss rate here is misses / (hits + misses), so higher is better for the filter.

Workload Original Roaring Branch
INT random Bloom 11.4K / 112.38M, miss 99.9898%, bloom size 452.8 MiB. Range 112.39M / 44, miss 0.0000%. Bloom 11.1K / 109.45M, miss 99.9898%, bloom size 452.8 MiB. Range 109.46M / 43, miss 0.0000%.
BIGINT random Bloom 11.4K / 113.54M, miss 99.9900%, bloom size 452.8 MiB. Range 113.55M / 36, miss 0.0000%. Bloom 10.6K / 106.68M, miss 99.9900%, bloom size 443.3 MiB. Range 106.69M / 55, miss 0.0001%.
INT sequence Bloom counters 0/0, bloom size 455.9 MiB. Range 0 / 94.05M, miss 100.0000%. Bloom counters 0/0, roaring counters 0/0, roaring size 186.5 MiB. Range 0 / 121.21M, miss 100.0000%.
BIGINT sequence Bloom counters 0/0, bloom size 446.3 MiB. Range 0 / 168.18M, miss 100.0000%. Bloom counters 0/0, roaring counters 0/0, roaring size 186.5 MiB. Range 0 / 96.21M, miss 100.0000%.

Takeaways:

  • The postgres input connector is not able to saturate the circuit (buffered records 0).
  • Random workloads are basically flat. One is slightly worse, one slightly better.
  • Sequence workloads show roaring roaring branch helps with filter size.
  • For sequence keys, the roaring branch switches filter representation from bloom to roaring and cuts filter size a lot, observed rejection already in the range filter. Roaring hit/miss counters stayed 0/0.
  • For random keys, bloom is doing the rejection at ~99.99% miss rate in both versions.

deltalake

original is v0.281.0 and roaring branch is the platform runtime on the current manager.

CREATE TABLE int_random_keys (
    id INT NOT NULL PRIMARY KEY
) WITH (
  'materialized' = 'true',
  'connectors' = '[{"name":"data","transport":{"name":"delta_table_input","config":{"uri":"file:///mnt/data/deltalake/$table","mode":"snapshot","transaction_mode":"none"}}}]'
);

These were 300-second fda bench runs, and all four runs had enable_roaring=true.
int_random_keys: delta-rs table filled with random u32 keys
int_sequence_keys: delta-rs table built with sequentially filling keys

Bench Summary

Workload Original Tput Roaring Branch Tput Δ Tput Original Mem Roaring Branch Mem Δ Mem Original Storage Roaring Branch Storage Δ Storage Δ State Amp
INT random 2.332M/s 2.326M/s -0.25% 8.96 GiB 8.77 GiB -2.07% 13.87 GiB 13.85 GiB -0.14% +0.10%
INT sequence 2.748M/s 3.336M/s +21.41% 9.70 GiB 7.94 GiB -18.10% 15.19 GiB 18.19 GiB +19.77% -1.30%

For fixed 300s runs, absolute storage at stop time is influenced by how much work got processed in that window.

Filter Miss Table

Miss rate is misses / (hits + misses), so higher is better.

Workload Version Bloom hits / misses Bloom Miss Rate Roaring hits / misses Roaring Miss Rate Range hits / misses Range Miss Rate
INT random original (v0.281.0) 140.2K / 1.387B 99.9899% 0 / 0 N/A 1.387B / 962 0.0001%
INT random roaring branch 150.9K / 1.495B 99.9899% 0 / 0 N/A 1.495B / 768 0.0001%
INT sequence original (v0.281.0) 158.4K / 1.566B 99.9899% 0 / 0 N/A 1.567B / 16.842M 1.0637%
INT sequence roaring branch 0 / 0 N/A 0 / 2.480B 100.0000% 2.480B / 27.664M 1.1030%

Read

  • INT random is effectively flat. Both versions stay bloom-based, and the filter behavior is almost identical.
  • INT sequence is the real win for the roaring branch: +21.4% throughput and -18.1% memory.
  • On INT sequence, original stays bloom-based, while the roaring branch switches fully to roaring and shows a 100% roaring miss rate in the final sample.
  • Absolute storage is higher on the roaring branch for INT sequence, but it also processed much more data in the same 300s window; normalized state-amplification is slightly better.

kafka

CREATE TABLE kafka_int_sequence_keys (
    id INT NOT NULL PRIMARY KEY
) WITH (
  'materialized' = 'true',
  'connectors' = '[{"name":"data","transport":{"name":"kafka_input","config":{"topic":"$topic","bootstrap.servers":"127.0.0.1:9092","start_from":"earliest","resume_earliest_if_data_expires":true,"poller_threads":10,"log_level":"info"}},"format":{"name":"avro","config":{"update_format":"raw","schema":"{\"type\":\"record\",\"name\":\"KafkaInt\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"}]}","skip_schema_id":true}}}]'
);
  • two topics: kafka_int_sequence_keys, kafka_int_random_keys
  • This only ran at ~800k/s. Not maxing out the circuit

Bench Summary

Workload Original Tput Roaring Branch Tput Δ Tput Original Mem Roaring Branch Mem Δ Mem Original Storage Roaring Branch Storage Δ Storage Δ State Amp
INT random 0.766M/s 0.761M/s -0.67% 5.59 GiB 5.81 GiB +4.07% 3.56 GiB 3.63 GiB +1.89% +2.58%
INT sequence 0.775M/s 0.775M/s +0.12% 5.70 GiB 5.47 GiB -3.96% 4.11 GiB 3.49 GiB -14.95% -15.05%

Filter Miss Table

Workload Version Bloom hits / misses Bloom Miss Rate Roaring hits / misses Roaring Miss Rate Range hits / misses Range Miss Rate
INT random original (v0.281.0) 19.7K / 195.218M 99.9899% 0 / 0 N/A 195.238M / 38 0.000019%
INT random roaring branch 19.7K / 193.705M 99.9898% 0 / 0 N/A 193.725M / 38 0.000020%
INT sequence original (v0.281.0) 19.4K / 192.655M 99.9899% 0 / 0 N/A 192.674M / 46.957M 19.5954%
INT sequence roaring branch 0 / 0 N/A 0 / 139.045M 100.0000% 139.045M / 34.087M 19.6883%

Main read:

  • INT sequence is where roaring helps on Kafka+Avro: throughput is bottlenecked on kafka, memory drops a bit and storage/state amplification drop a lot.
  • The filter behavior matches that: random stays bloom-based; sequence flips from bloom on original to roaring on the roaring branch.

@gz gz force-pushed the roaring branch 3 times, most recently from a1e6733 to 84819c1 Compare April 9, 2026 03:09
@gz gz requested review from blp and ryzhyk April 9, 2026 03:10
@gz gz marked this pull request as ready for review April 9, 2026 03:10
}

let mut intermediate = factories.keys_factory().default_box();
let mut merged_cursor = CursorList::new(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some input from @ryzhyk that this CursorList here does the preserves the same semantics as the previous code which used SpineCursor

@gz
Copy link
Copy Markdown
Contributor Author

gz commented Apr 9, 2026

This is currently configured as an enabled by default feature with option to disable it in dev-tweaks.
Alternative: disabled for now, enable selectively for some users.

Thoughts?

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three blockers. The new dev_tweaks knob needs docs in docs.feldera.com, the roaring path should start default-off rather than enabled-by-default with a kill switch, and the [ci] apply automatic fixes commit needs to be squashed because dirty history is still a hard no.

pub fn enable_roaring(&self) -> bool {
// Roaring is enabled by default, but `enable_roaring = false` remains
// available as a kill switch while the feature is still being tuned.
self.enable_roaring.unwrap_or(true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the feature-flag question from the PR thread: this should start default-off. An incompatible storage-format feature that is still being tuned wants a real rollout gate, not an opt-out kill switch after it is already on for everyone. enable_roaring = true should selectively enable it; the default should stay false until we have more production mileage.

@gz gz force-pushed the roaring branch 2 times, most recently from 0631ab4 to 73d0445 Compare April 9, 2026 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants