[SQL] Optimization to pull filters above joins#6485
Conversation
mythical-fred
left a comment
There was a problem hiding this comment.
Nice optimization. Algorithm (substitute NoExpression and search for it post-inlining) is clever and looks sound for inner joins. A few things worth tightening before merge — see inline. Main asks: more tests (negative case, key-only push-to-both, multi-view), update the stale class javadoc, and split the unrelated logger-level change out (or call it out in the message).
| import java.util.List; | ||
|
|
||
| /** Combine a {@link DBSPJoinOperator} followed by a {@link DBSPFilterOperator} into a | ||
| * {@link DBSPJoinFilterMapOperator}. Similar expansion for {@link DBSPLeftJoinOperator}. */ |
There was a problem hiding this comment.
The class javadoc is now stale — this visitor no longer just combines Join + Filter into JoinFilterMap; it also pulls filters above the join. Please update the description to mention the new pull-above-join transformation and clarify which join kinds it applies to (only DBSPJoinOperator here — DBSPLeftJoinOperator and DBSPStarJoinOperator still fall through to the JoinFilterMap rewrite below). The Similar expansion for DBSPLeftJoinOperator sentence is misleading given the new code only covers inner joins.
| DBSPClosureExpression joinFunction = join.getClosureFunction(); | ||
| projection.apply(joinFunction); | ||
| DBSPClosureExpression filterFunction = operator.getClosureFunction(); | ||
| if (projection.hasIoMap()) { |
There was a problem hiding this comment.
Why gate the whole pull-above-join transformation on projection.hasIoMap()? The optimization's correctness only requires that, after substituting NoExpression for one side, the resulting predicate be NoExpression-free. That property does not require the join function to be a pure projection — a non-projection join function (e.g. a cast on a key column) could still produce a side-independent predicate. Is the projection check just a conservative correctness condition because applyAfter + reduction is only reliable for projections, or is there a deeper reason? Worth a comment, and possibly worth relaxing later.
| left.field(1), | ||
| new NoExpression(joinFunction.parameters[2].getType())) | ||
| .closure(left); | ||
| final DBSPClosureExpression leftFilterPredicate = filterFunction.applyAfter(compiler, leftProjection, Maybe.YES); |
There was a problem hiding this comment.
Maybe.YES here forces applyAfter to inline-and-reduce the composition (taking the shouldInline = true branch), which is what makes the NoExpression search meaningful — without reduction the NoExpression would always remain syntactically present inside a let. That's correct, but it's a subtle and load-bearing argument; please add a one-line comment saying why YES is needed (so a future cleanup doesn't switch it to MAYBE).
| .append("Pulled predicate before join ") | ||
| .appendSupplier(filterFunction::toString) | ||
| .newline(); | ||
| return; |
There was a problem hiding this comment.
Test gap: the only test (issue6279) covers exactly one case (predicate on the left side of an inner join). Given how much logic this PR introduces, the unit tests should also cover:
- predicate that depends on BOTH sides → must NOT be pulled (assert a
DBSPFilterOperatordoes remain), - predicate that depends only on the key column → pushed to BOTH sides (assert two filters appear, one above each input),
- predicate over a LEFT JOIN that touches the right (nullable) side → must NOT be pulled (regression for the LEFT-join soundness concern above),
- NULL-handling: e.g.
WHERE y IS NULLandWHERE y IS NOT NULLon the inner-join case (you mentioned in the description thatIS NOT NULLis the common shape being pulled), - a non-projection join function (e.g. with a cast) so the
hasIoMapearly-out is exercised.
Mihai — your usual principle is "docs need backing tests" and the same applies here. One trivial happy path isn't enough scaffolding to prevent regressions of this optimization.
| @@ -1,5 +1,7 @@ | |||
| package org.dbsp.sqlCompiler.compiler.sql.simple; | |||
|
|
|||
| import org.dbsp.sqlCompiler.circuit.DBSPCircuit; | |||
There was a problem hiding this comment.
Unused import — DBSPCircuit is not referenced anywhere in this file.
| ccs.visit(new CircuitVisitor(ccs.compiler) { | ||
| @Override | ||
| public void postorder(DBSPFilterOperator node) { | ||
| Assert.fail("Shoudl have been eliminated"); |
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
mythical-fred
left a comment
There was a problem hiding this comment.
Re-APPROVE on 320a2406d.
The rewrite addresses essentially every point I raised on 70315bd5:
- Class javadoc now mentions both pull-above-join and the existing JoinFilterMap fusion.
hasIoMap()gate is gone entirely — the transformation is now driven purely by whether the substituted predicate still mentionsNoExpression. Cleaner than what I asked for.withInputs(..., true)→withInputs(..., false).- Unused
DBSPCircuitimport and theShoudltypo are fixed. - Test gap is closed:
issue6279now covers (a) left-side pull, (b) right-side pull, (c) both-sides key-only pull on inner join, (d) LEFT JOIN with key-only predicate pulled to both sides, and crucially (e) the LEFT JOIN + predicate-on-right negative case that asserts exactly oneDBSPLeftJoinFilterMapOperatorsurvives and nothing was incorrectly hoisted.
The LEFT JOIN regression in particular — the one I was most worried about silently miscompiling — is now nailed down by the post-order assertion. Maybe.YES is still load-bearing in applyAfter(...); that's the only original concern not annotated with a comment, but the new test coverage proves the invariant it depends on. Good enough.
Nothing else to flag. Approved.
Fixes #6279
There are calcite optimizations to pull predicates above joins if they only depend on one side of a join. But Calcite optimizes each view separately. This PR introduces an optimization which operates on the graph produced by stitching all views together. It does find a few predicates that can be pulled, most of them of the shape "IS NOT NULL".
Describe Manual Test Plan
Ran all Java tests manually.
Checklist