Skip to content

[SQL] Optimization to pull filters above joins#6485

Open
mihaibudiu wants to merge 1 commit into
feldera:mainfrom
mihaibudiu:issue6279
Open

[SQL] Optimization to pull filters above joins#6485
mihaibudiu wants to merge 1 commit into
feldera:mainfrom
mihaibudiu:issue6279

Conversation

@mihaibudiu

@mihaibudiu mihaibudiu commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Fixes #6279

There are calcite optimizations to pull predicates above joins if they only depend on one side of a join. But Calcite optimizes each view separately. This PR introduces an optimization which operates on the graph produced by stitching all views together. It does find a few predicates that can be pulled, most of them of the shape "IS NOT NULL".

Describe Manual Test Plan

Ran all Java tests manually.

Checklist

  • Unit tests added/updated

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization. Algorithm (substitute NoExpression and search for it post-inlining) is clever and looks sound for inner joins. A few things worth tightening before merge — see inline. Main asks: more tests (negative case, key-only push-to-both, multi-view), update the stale class javadoc, and split the unrelated logger-level change out (or call it out in the message).

import java.util.List;

/** Combine a {@link DBSPJoinOperator} followed by a {@link DBSPFilterOperator} into a
* {@link DBSPJoinFilterMapOperator}. Similar expansion for {@link DBSPLeftJoinOperator}. */

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class javadoc is now stale — this visitor no longer just combines Join + Filter into JoinFilterMap; it also pulls filters above the join. Please update the description to mention the new pull-above-join transformation and clarify which join kinds it applies to (only DBSPJoinOperator here — DBSPLeftJoinOperator and DBSPStarJoinOperator still fall through to the JoinFilterMap rewrite below). The Similar expansion for DBSPLeftJoinOperator sentence is misleading given the new code only covers inner joins.

DBSPClosureExpression joinFunction = join.getClosureFunction();
projection.apply(joinFunction);
DBSPClosureExpression filterFunction = operator.getClosureFunction();
if (projection.hasIoMap()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why gate the whole pull-above-join transformation on projection.hasIoMap()? The optimization's correctness only requires that, after substituting NoExpression for one side, the resulting predicate be NoExpression-free. That property does not require the join function to be a pure projection — a non-projection join function (e.g. a cast on a key column) could still produce a side-independent predicate. Is the projection check just a conservative correctness condition because applyAfter + reduction is only reliable for projections, or is there a deeper reason? Worth a comment, and possibly worth relaxing later.

left.field(1),
new NoExpression(joinFunction.parameters[2].getType()))
.closure(left);
final DBSPClosureExpression leftFilterPredicate = filterFunction.applyAfter(compiler, leftProjection, Maybe.YES);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe.YES here forces applyAfter to inline-and-reduce the composition (taking the shouldInline = true branch), which is what makes the NoExpression search meaningful — without reduction the NoExpression would always remain syntactically present inside a let. That's correct, but it's a subtle and load-bearing argument; please add a one-line comment saying why YES is needed (so a future cleanup doesn't switch it to MAYBE).

.append("Pulled predicate before join ")
.appendSupplier(filterFunction::toString)
.newline();
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test gap: the only test (issue6279) covers exactly one case (predicate on the left side of an inner join). Given how much logic this PR introduces, the unit tests should also cover:

  • predicate that depends on BOTH sides → must NOT be pulled (assert a DBSPFilterOperator does remain),
  • predicate that depends only on the key column → pushed to BOTH sides (assert two filters appear, one above each input),
  • predicate over a LEFT JOIN that touches the right (nullable) side → must NOT be pulled (regression for the LEFT-join soundness concern above),
  • NULL-handling: e.g. WHERE y IS NULL and WHERE y IS NOT NULL on the inner-join case (you mentioned in the description that IS NOT NULL is the common shape being pulled),
  • a non-projection join function (e.g. with a cast) so the hasIoMap early-out is exercised.

Mihai — your usual principle is "docs need backing tests" and the same applies here. One trivial happy path isn't enough scaffolding to prevent regressions of this optimization.

@@ -1,5 +1,7 @@
package org.dbsp.sqlCompiler.compiler.sql.simple;

import org.dbsp.sqlCompiler.circuit.DBSPCircuit;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import — DBSPCircuit is not referenced anywhere in this file.

ccs.visit(new CircuitVisitor(ccs.compiler) {
@Override
public void postorder(DBSPFilterOperator node) {
Assert.fail("Shoudl have been eliminated");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ShoudlShould.

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-APPROVE on 320a2406d.

The rewrite addresses essentially every point I raised on 70315bd5:

  • Class javadoc now mentions both pull-above-join and the existing JoinFilterMap fusion.
  • hasIoMap() gate is gone entirely — the transformation is now driven purely by whether the substituted predicate still mentions NoExpression. Cleaner than what I asked for.
  • withInputs(..., true)withInputs(..., false).
  • Unused DBSPCircuit import and the Shoudl typo are fixed.
  • Test gap is closed: issue6279 now covers (a) left-side pull, (b) right-side pull, (c) both-sides key-only pull on inner join, (d) LEFT JOIN with key-only predicate pulled to both sides, and crucially (e) the LEFT JOIN + predicate-on-right negative case that asserts exactly one DBSPLeftJoinFilterMapOperator survives and nothing was incorrectly hoisted.

The LEFT JOIN regression in particular — the one I was most worried about silently miscompiling — is now nailed down by the post-order assertion. Maybe.YES is still load-bearing in applyAfter(...); that's the only original concern not annotated with a comment, but the new test coverage proves the invariant it depends on. Good enough.

Nothing else to flag. Approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL] Optimization to pull filters above joins across LOCAL views

2 participants