Skip to content

Refactor post-aggregate clause rebasing and validation with shared helpers#22817

Open
kosiew wants to merge 8 commits into
apache:mainfrom
kosiew:rebase-validation-02-22684
Open

Refactor post-aggregate clause rebasing and validation with shared helpers#22817
kosiew wants to merge 8 commits into
apache:mainfrom
kosiew:rebase-validation-02-22684

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Jun 8, 2026

Which issue does this PR close?

Closes #22684

Rationale for this change

SqlToRel::aggregate contained several implementations of the same post-aggregate workflow across SELECT, HAVING, QUALIFY, ORDER BY, and DISTINCT ON:

  1. Rebase expressions against aggregate output expressions.
  2. Validate rebased expressions at the aggregate boundary.
  3. Apply clause-specific diagnostics via CheckColumnsMustReferenceAggregatePurpose.

Maintaining this logic in multiple places increases the risk of behavioral drift during future planner changes. This change consolidates the shared mechanics into private helper functions while preserving existing SQL semantics, alias handling, sort behavior, and clause-specific error messages.

What changes are included in this PR?

  • Added private helper functions for post-aggregate expression processing:

    • rebase_and_validate_post_aggregate_exprs
    • rebase_and_validate_optional_post_aggregate_expr
    • rebase_and_validate_post_aggregate_sort_exprs
    • order_by_select_alias_expr
  • Migrated SELECT projection post-aggregate rebasing and validation to shared helper logic.

  • Migrated HAVING and QUALIFY post-aggregate rebasing and validation to shared helper logic.

  • Migrated DISTINCT ON post-aggregate rebasing and validation to shared helper logic while continuing to validate against aggregate outputs plus valid SELECT aliases.

  • Migrated ORDER BY post-aggregate rebasing and validation to a dedicated helper that:

    • Preserves existing SELECT-alias remapping behavior.
    • Preserves sort direction and NULL ordering via SortExpr::with_expr.
    • Continues validating with ORDER BY-specific aggregate-boundary diagnostics.
  • Simplified aggregate planning code by removing duplicated rebasing and validation logic.

Are these changes tested?

Yes.

Added SQLLogicTest coverage in aggregate.slt for:

  • SELECT projection aggregate-boundary validation errors.
  • HAVING aggregate-boundary validation errors.
  • QUALIFY aggregate-boundary validation errors.
  • ORDER BY aggregate-boundary validation errors.
  • DISTINCT ON aggregate-boundary validation errors.
  • ORDER BY using a top-level SELECT aggregate alias after aggregation.
  • Preservation of ORDER BY sort direction and NULL ordering when remapping a SELECT alias.
  • DISTINCT ON using a top-level SELECT aggregate alias after aggregation.

Are there any user-facing changes?

No intended user-facing behavior changes.

This PR is a refactor that preserves existing aggregate planning semantics while adding regression coverage for clause-specific validation diagnostics and alias-handling behavior.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 8 commits June 8, 2026 14:53
- Added focused test cases for diagnostics in aggregate queries:
- SELECT invalid diagnostic
- HAVING invalid diagnostic
- QUALIFY invalid diagnostic
- ORDER BY invalid diagnostic
- DISTINCT ON invalid diagnostic
- ORDER BY aggregate alias valid case
- DISTINCT ON aggregate alias valid case
…gregate expressions

- Introduced `rebase_and_validate_post_aggregate_exprs` to streamline the SELECT projection rebase/validation process.
- Updated SELECT projection to utilize the new helper, while leaving HAVING, QUALIFY, ORDER BY, and DISTINCT ON unchanged.
…ype in select.rs

- Renamed `valid_exprs` to `valid_column_exprs` in `datafusion/sql/src/select.rs`
- Shortened collect type from `Vec<Expr>` to `Vec<_>`
- In `datafusion/sqllogictest/test_files/aggregate.slt`:
- Replaced repeated CTEs with one test table: `post_aggregate_clause_validation`
- Maintained the same diagnostics and outputs
- Added cleanup with `DROP TABLE`
…ggregate expressions

- Added rebase_and_validate_optional_post_aggregate_expr helper
- Migrated HAVING clause to use the new helper
- Migrated QUALIFY clause to use the new helper
- Kept ORDER BY / DISTINCT ON implementation unchanged
- Implemented rebase_and_validate_post_aggregate_exprs for DISTINCT ON on_exprs
- Maintained validation using all_valid_exprs
- Kept ORDER BY logic unchanged
…r and reorder ORDER BY logic

- Introduced a private helper function `rebase_and_validate_post_aggregate_sort_exprs`
- Reorganized the ORDER BY handling:
- Moved rebase logic
- Updated SELECT-alias remapping
- Rebuilt SortExpr
- Ensured validation with OrderBy
- Maintained existing logic for all_valid_exprs
- Kept DISTINCT ON/HAVING/QUALIFY handling unchanged from previous migrations
- Updated optional helper to return explicit Ok(Some(rebased_expr))
- Introduced private order_by_select_alias_expr
- Modified ORDER BY helper to utilize named alias-remap helper
- Enhanced all_valid_exprs alias extraction with match usage
- Condensed ORDER BY call-site comment for clarity
@github-actions github-actions Bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Jun 8, 2026
@kosiew kosiew marked this pull request as ready for review June 8, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Share post-aggregate clause rebasing and validation logic in SQL planner

1 participant