Skip to content

ResolveGroupingFunction does not unwrap Alias nodes #21411

@timsaucer

Description

@timsaucer

Describe the bug

The ResolveGroupingFunction analyzer rule does not recognize grouping() calls when they are wrapped in an Alias node. This means that grouping(col).alias("name") fails at physical planning with:

This feature is not implemented: physical plan is not yet implemented for GROUPING aggregate function

The same query succeeds when the grouping() expression is not aliased.

To Reproduce

Using SQL this works correctly because the SQL planner applies the alias at a different stage:

SELECT a, SUM(b) AS s, GROUPING(a) AS g
FROM t
GROUP BY ROLLUP(a)

But when constructing the equivalent logical plan programmatically (e.g., via the DataFrame API), wrapping the grouping() expression in Expr::Alias(...) before passing it to LogicalPlanBuilder::aggregate() causes the ResolveGroupingFunction rule to skip it. The rule appears to pattern-match on Expr::AggregateFunction but does not recurse into Expr::Alias(Alias { expr: Expr::AggregateFunction(...), .. }).

This test should pass:

    #[tokio::test]
    async fn test_grouping_function_alias() -> Result<()> {
        let ctx = SessionContext::default();
        let rb = record_batch!(("a", Int32, [1, 1, 2]), ("b", Int32, [10, 20, 30]))?;
        let df = ctx.read_batch(rb)?;

        fn check_results(results: &Vec<RecordBatch>) {
            // We have no guarantee on ordering of the batches
            for result in results {
                let s_array = result.column(1).as_any().downcast_ref::<Int64Array>().unwrap();
                let expected_val = match s_array.value(0) {
                    30 => { 0 },
                    60 => { 1 },
                    _ => {
                        panic!("unexpected value {}", s_array.value(0)) }
                };
                let expected = create_array!(Int32, [expected_val]) as ArrayRef;
                assert_eq!(&expected, result.column(2));
            }
        }

        let results = df.clone().aggregate(vec![
            Expr::GroupingSet(GroupingSet::Rollup(vec![col("a")])),
        ], vec![
            sum(col("b")).alias("s"),
            grouping(col("a"))
        ])?.collect().await?;
        check_results(&results);


        let results = df.clone().aggregate(vec![
            Expr::GroupingSet(GroupingSet::Rollup(vec![col("a")])),
        ], vec![
            sum(col("b")).alias("s"),
            grouping(col("a")).alias("g")
        ])?.collect().await?;
        check_results(&results);
        
        Ok(())
    }

Expected behavior

grouping(col).alias("name") should work identically to an unaliased grouping(col) — the analyzer should unwrap Alias nodes when searching for grouping() calls to rewrite.

Additional context

The relevant rule is ResolveGroupingFunction in datafusion/optimizer/src/analyzer/resolve_grouping_function.rs. The fix likely involves matching on Expr::Alias(Alias { expr, .. }) and recursing into the inner expression, then re-wrapping the rewritten result in the alias.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions