refactor: remove opt_filter in GroupsAccumulator::merge_batch#22816
refactor: remove opt_filter in GroupsAccumulator::merge_batch#22816haohuaijin wants to merge 6 commits into
opt_filter in GroupsAccumulator::merge_batch#22816Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
d569c35 to
2e667ba
Compare
|
Thanks for working on it! Assuming the assumption is correct, that However, since this is a public API change, I suggest to proceed after getting a few more +1s. |
| if self.mode.input_mode() == AggregateInputMode::Raw { | ||
| acc.update_batch(args, &[0], Some(&false_filter), total_groups)?; | ||
| } else { | ||
| acc.merge_batch(args, &[0], Some(&false_filter), total_groups)?; | ||
| } |
There was a problem hiding this comment.
Note that aggregate_arguments depends on the aggregate mode. In Raw mode it has raw input arguments, which fit update_batch. In Partial input modes it has state fields, which normally fit merge_batch. If this path is ever reached in a Partial input mode, update_batch could receive the wrong shape of arrays. For example, AVG takes one raw input array in update_batch, but two state arrays in merge_batch.
I could not reproduce this through normal SQL/sqllogictest. The usual SQL plan initializes empty grouping sets in the Partial stage, so the Final stage gets a non-empty state row and does not hit this path. see the test case added in dcc9c27
this is only place that use the merge_batch with not None opt_filter(added recently).
There was a problem hiding this comment.
i check more on this part, and found the Final / FinalPartitioned build their group-by via as_final(), which hard-codes has_grouping_set: false. So init_empty_grouping_sets hits its !has_grouping_set() early-return and never call the merge_batch.
i update the docs in bccad62
Which issue does this PR close?
Rationale for this change
the
opt_filteronGroupsAccumulator::merge_batchis a dead parameter. AggregateFILTERclauses only apply to raw input rows in the update phase (update_batch).merge_batchcombines already pre-aggregated states, so there is no per-row filtering to do —opt_filteris meaningless there.The code confirms this:
row_hash.rs) always passedNone.correlation.rsassertedopt_filter.is_none(), and Sparkavgused_opt_filter.What changes are included in this PR?
opt_filterfrommerge_batchin the trait and all implementations (built-in aggregates,physical-expr-common,functions-aggregate-common, Spark, and FFI).merge_batchhas noopt_filterbecause filtering happens in the update phase.row_hash.rsto always useupdate_batchwith an all-false filter instead of branching tomerge_batch.update_batchalways takes raw argument types (whataggregate_argumentsprovides), and since every row is filtered out the data never matters — this is simpler and more correct.Are these changes tested?
Yes. Existing aggregate tests cover this and were updated to the new signature. The
first_lasttests were adjusted (with comments) to match the merge behavior without a filter, and the FFI and Spark tests were updated too.Are there any user-facing changes?
Yes — this is a breaking change to the public
GroupsAccumulatortrait:opt_filteris removed frommerge_batch. Custom implementations and direct callers must update their signatures.