Skip to content

fix: NOT IN with NULL subquery returns wrong results under SortMergeJoin#22810

Draft
nathanb9 wants to merge 1 commit into
apache:mainfrom
nathanb9:fix-notin-null-aware-sortmergejoin
Draft

fix: NOT IN with NULL subquery returns wrong results under SortMergeJoin#22810
nathanb9 wants to merge 1 commit into
apache:mainfrom
nathanb9:fix-notin-null-aware-sortmergejoin

Conversation

@nathanb9
Copy link
Copy Markdown
Contributor

@nathanb9 nathanb9 commented Jun 7, 2026

Problem

NOT IN (subquery) is a null-aware anti join: when the subquery yields a NULL the predicate is never TRUE, so the query must return zero rows. With prefer_hash_join = false and multiple partitions, the planner routed the null-aware anti join to SortMergeJoinExec, which is not null-aware, so it returned wrong results. HashJoin (the default) was already correct.

Proof

set datafusion.optimizer.prefer_hash_join = false;
create table t1(x int) as values (1);
create table t2(y int) as values (NULL);
select x from t1 where x not in (select y from t2);

Expected 0 rows (the subquery contains a NULL). Before this change it returned 1. With prefer_hash_join = true it correctly returned 0 rows. EXPLAIN showed the wrong config selecting SortMergeJoinExec: join_type=LeftAnti.

Solution

The planner already requires null-aware joins to use the CollectLeft HashJoin, and the HashJoin branch guards on !null_aware. The SortMergeJoin branch was missing the same guard, so this adds && !*null_aware to it. Null-aware anti joins now fall through to the CollectLeft HashJoin regardless of prefer_hash_join. SortMergeJoinExec has no null_aware parameter and cannot honor these semantics.

Added a regression test in subquery.slt (under prefer_hash_join = false) covering both a null-containing subquery (zero rows) and a null-free subquery (normal anti join). All 61 SortMergeJoin unit tests pass.

@github-actions github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant