Skip to content

refactor(hash-aggr): Forward port the soft limit optimization to the new hash aggregation impl#22824

Open
2010YOUY01 wants to merge 1 commit into
apache:mainfrom
2010YOUY01:hash-aggr-soft-limit
Open

refactor(hash-aggr): Forward port the soft limit optimization to the new hash aggregation impl#22824
2010YOUY01 wants to merge 1 commit into
apache:mainfrom
2010YOUY01:hash-aggr-soft-limit

Conversation

@2010YOUY01
Copy link
Copy Markdown
Contributor

@2010YOUY01 2010YOUY01 commented Jun 8, 2026

Which issue does this PR close?

Rationale for this change

Part of rewriting hash aggregation into several dedicated streams.

In the first step #22729, PartialHashAggregateStream and FinalHashAggregateStream has been split from the old GroupsHashAggregateStream, but both stream only have basic implementation, no optimizations and extra features like spilling.
* it's incremental migration, so old impl won't change, we plan to delete it once migration is finished

This PR forward ports the below optimization to the new implementation:

The optimizer part don't have to move, ported changes are only inside aggregate operator.

What changes are included in this PR?

Extends PartialHashAggregateStream and FinalHashAggregateStream to apply the optimization. See code comment at datafusion/physical-plan/src/aggregates/hash_aggregate.rs for the background.

Are these changes tested?

Yes, the original test in #8038 is only at ExecutionPlan level, they're still passing after the change.
This PR added new test coverage: check explain analyze to ensure the implementation actually respects this soft limit at runtime.

Are there any user-facing changes?

@github-actions github-actions Bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants