Skip to content

Range filter for batches#5937

Merged
gz merged 4 commits intomainfrom
min-max-batch
Mar 31, 2026
Merged

Range filter for batches#5937
gz merged 4 commits intomainfrom
min-max-batch

Conversation

@gz
Copy link
Copy Markdown
Contributor

@gz gz commented Mar 27, 2026

Describe Manual Test Plan

Tested manually with a few pipelines.

Checklist

  • Unit tests added/updated
  • Integration tests added/updated
  • Documentation updated
  • Changelog updated

Describe Incompatible Changes

No incompatible changes.

@gz gz requested review from blp and ryzhyk March 27, 2026 03:29
@gz gz changed the title Min max batch Range filter for batches Mar 27, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blockers: (1) two unsafe blocks in reader.rs missing // SAFETY: comments (see inline), and (2) docs.feldera.com/docs/operations/metrics.md should be updated with entries for the four new range filter metrics (range_filter_size_bytes, range_filter_hits_count, range_filter_misses_count, range_filter_hit_rate_percent). Looking at past PRs (e.g., the transaction metrics PR), the pattern is a manual edit to metrics.md alongside the code change.

@gz gz force-pushed the min-max-batch branch 2 times, most recently from 27cc389 to e987d2e Compare March 27, 2026 05:53
@gz
Copy link
Copy Markdown
Contributor Author

gz commented Mar 27, 2026

some benchmarking revealed that this optimization alone does not help much for e.g., delta lake connectors because keys get ingested mostly at random (maybe some z ordering or liquid clustering stuff would help but who knows)

it helps for connectors that ingest in semi linear orders (e.g., datagen -- maybe postgres/kafka as well)
if the bloom filter can be skipped almost 100% of the time, it's ~30% faster -- this is promising because having the min-max allows us to use bitmaps which can also save a lot of time compared to bloom filters during lookups

Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane. Thanks!

Given this, we can easily experiment with range-partitioned merges.

@gz gz force-pushed the min-max-batch branch from e987d2e to 82ce4b8 Compare March 30, 2026 20:38
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commits look good — CachePadded for filter stats is a nice improvement. Two unsafe blocks in key_range() still lack // SAFETY: comments; those remain a blocker (see inline).

This is a very cheap filter that can speed up
ingest significantly.

The idea is to track the min/max for every batch.
For seek_key_exact if the value we're seeking
is not in side of the range we skip the batch.

Some ingest heavy benchmarks:

just bloom filter:

╭────────────────────────┬───────────┬──────────┬───────╮
│ Metric                 │ Value     │ Lower    │ Upper │
├────────────────────────┼───────────┼──────────┼───────┤
│ Throughput (records/s) │ 2695496   │ -        │ -     │
│ Memory                 │ 14.28 GiB │ 1.78 GiB │ -     │
│ Storage                │ 79.45 GiB │ 110 B    │ -     │
│ Uptime [ms]            │ 302742    │ -        │ -     │
│ State Amplification    │ 0.43      │ -        │ -     │
╰────────────────────────┴───────────┴──────────┴───────╯

range+bloom filter:

╭────────────────────────┬────────────┬──────────┬───────╮
│ Metric                 │ Value      │ Lower    │ Upper │
├────────────────────────┼────────────┼──────────┼───────┤
│ Throughput (records/s) │ 4035088    │ -        │ -     │
│ Memory                 │ 23.4 GiB   │ 2.42 GiB │ -     │
│ Storage                │ 112.51 GiB │ 110 B    │ -     │
│ Uptime [ms]            │ 303292     │ -        │ -     │
│ State Amplification    │ 0.41       │ -        │ -     │
╰────────────────────────┴────────────┴──────────┴───────╯

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
@gz gz force-pushed the min-max-batch branch from 82ce4b8 to a4bce2e Compare March 31, 2026 00:34
@gz gz requested a review from blp March 31, 2026 00:44
@gz
Copy link
Copy Markdown
Contributor Author

gz commented Mar 31, 2026

@blp I'd appreciate a short re-review of the part that moves the key range computation into the writer and passes it to the reader when finalizing a batch (as opposed to maintaining min/max throughout the batch building phase)

e.g, I'm worried about this part and if it's sane: are there potential problems when the index block is multiple blocks (can this even happen) and would this still work (I think so but best if you have a look).

https://github.com/feldera/feldera/pull/5937/changes#diff-97717ef456cc9703aba9455926b297ae94212b5388f0bfb59de83bda1445fcf6R289

@gz
Copy link
Copy Markdown
Contributor Author

gz commented Mar 31, 2026

on the plus side, we found an unrelated bug in FileWSet by adding some more tests to this

gz added 2 commits March 30, 2026 18:05
We add stats for the range filter. This lead to some refactoring:

Since we now have two filters (with another one on the way)
we consolidate the stats into a single struct that can be
re-used across filters.

It also revealed a performance issue with the current filter
stats. Because this function is extremly hot adding the
hit and miss atomics in CachePadded led to a 25% increase
for the ingest benchmark.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
We didn't load the weight into cursor.diff which caused
some newly written tests to fail

We did an anlaysis and concluded that this method,
while wrong, luckily wasn't used anywhere relevant in real
circuits.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
@gz gz force-pushed the min-max-batch branch from 7b5b007 to 228f64d Compare March 31, 2026 01:05
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the new commits ("Better stats for filters", "Fix FileWSet negation") — both look good. The // SAFETY: comment blocker remains: the two unsafe fn key_range() definitions added in reader.rs (one on DataBlock, one on IndexBlock) still have no // SAFETY: comments on the function definitions themselves. The call-site comments in TreeNode::key_range are present, but per Feldera policy each unsafe fn definition needs its own // SAFETY: annotation explaining the invariant. Holding REQUEST_CHANGES until those are added.

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsidering my blocker. I was wrong to hold this up — I checked the rest of reader.rs and no existing unsafe fn definition carries a // SAFETY: doc comment. The convention here is to document at call sites, and the two new call sites in TreeNode::key_range already have the comments (// SAFETY: Unsafe because of serialization). My blocker was inconsistent with the file's established pattern. LGTM.

@gz gz force-pushed the min-max-batch branch from 8b799fa to 1ca1c6e Compare March 31, 2026 06:11
- Remove legacy maybe_contains_key
- Reader no longer "owns" bloom filter directly
- Filtering is handled by BatchFilters (which in the future can support
  multiple different filter impls)

This is supposed to make it easier to customize filters in a follow
up PR.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
@gz gz force-pushed the min-max-batch branch from 1ca1c6e to 8c1b432 Compare March 31, 2026 06:21
@gz gz added this pull request to the merge queue Mar 31, 2026
Merged via the queue into main with commit c3d05d1 Mar 31, 2026
1 check passed
@gz gz deleted the min-max-batch branch March 31, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants