Skip to content

fix: map() fails when keys are literals and values are column expressions#22784

Open
nathanb9 wants to merge 2 commits into
apache:mainfrom
nathanb9:fix-map-literal-keys
Open

fix: map() fails when keys are literals and values are column expressions#22784
nathanb9 wants to merge 2 commits into
apache:mainfrom
nathanb9:fix-map-literal-keys

Conversation

@nathanb9
Copy link
Copy Markdown
Contributor

@nathanb9 nathanb9 commented Jun 6, 2026

Which issue does this PR close?

Closes #22781

Rationale for this change

map(['a','b'], [col, col * 10]) fails at execution time with "map requires key and value lists to have the same length" when keys are all literals and values contain column references.

Root cause: literal keys evaluate to ColumnarValue::Scalar(FixedSizeList[N]) (length N) while column values evaluate to ColumnarValue::Array (length batch_size). The length check compares N != batch_size and errors.

What changes are included in this PR?

In make_map_batch: when one argument is scalar and the other is array, expand the scalar to match batch_size via ScalarValue::to_array_of_size(number_rows) before the length comparison.

Are these changes tested?

Yes.

  • Unit test: scalar keys + array values in map.rs
  • SLT test: SELECT map(['a','b'], [column1, column1 * 10]) FROM (VALUES (1), (2), (3))

Are there any user-facing changes?

No. Previously-failing queries now execute correctly.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 6, 2026
@nathanb9 nathanb9 marked this pull request as ready for review June 6, 2026 04:47
@nathanb9
Copy link
Copy Markdown
Contributor Author

nathanb9 commented Jun 6, 2026

CI failure seems like a transient docker failure

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanb9
Thanks for the fix. This looks good to me overall. I left a couple of small suggestions that could make the behavior a bit easier to maintain and keep covered in tests.

// so that the rest of the logic handles uniform array inputs.
// This handles cases like: map(['a','b'], [col1, col2]) where keys are all
// literals (scalar FixedSizeList) but values contain column references.
let (keys_arg, values_arg) = if !can_evaluate_to_const {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scalar expansion logic is repeated for both keys and values. It might be worth pulling it into a small helper so the invariant lives in one place and future mixed scalar/array handling is less likely to drift.

For example:

fn expand_if_scalar(arg: &ColumnarValue, rows: usize) -> Result<ColumnarValue> {
    match arg {
        ColumnarValue::Scalar(s) => Ok(ColumnarValue::Array(s.to_array_of_size(rows)?)),
        ColumnarValue::Array(a) => Ok(ColumnarValue::Array(Arc::clone(a))),
    }
}

# because scalar keys (FixedSizeList) had length=N (number of keys) while
# array values had length=batch_size.
query ?
SELECT map(['a','b'], [column1, column1 * 10]) FROM (VALUES (1), (2), (3)) t;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice coverage for literal keys with column values. Since the implementation expands either side when mixed, it may also be useful to add the symmetric SQL-visible case with array keys and scalar values, such as map([column1, column1 * 10], ['a','b']) or an equivalent version with unique keys.

}

#[test]
fn test_make_map_scalar_keys_array_values() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the SLT is sufficient, dont need to duplicate here

Comment on lines +905 to +908
# Regression test: map with literal keys and column-reference values
# Previously failed with "map requires key and value lists to have the same length"
# because scalar keys (FixedSizeList) had length=N (number of keys) while
# array values had length=batch_size.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Regression test: map with literal keys and column-reference values
# Previously failed with "map requires key and value lists to have the same length"
# because scalar keys (FixedSizeList) had length=N (number of keys) while
# array values had length=batch_size.
# mixed scalar/array inputs

better to keep it short and simple

Comment on lines +71 to +74
// If we have a mix of scalar and array args, expand the scalar to an array
// so that the rest of the logic handles uniform array inputs.
// This handles cases like: map(['a','b'], [col1, col2]) where keys are all
// literals (scalar FixedSizeList) but values contain column references.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If we have a mix of scalar and array args, expand the scalar to an array
// so that the rest of the logic handles uniform array inputs.
// This handles cases like: map(['a','b'], [col1, col2]) where keys are all
// literals (scalar FixedSizeList) but values contain column references.
// if we can't evaluate to const (inputs are not both scalar) then ensure they
// are expanded to arrays which following logic expects

simplifying comment a bit

};
let values_arg = match values_arg {
ColumnarValue::Scalar(s) => {
ColumnarValue::Array(s.to_array_of_size(number_rows)?)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if we could refactor the make map logic as a whole to not have this confusing expansion to array but still wrap in columnarvalue, but i suppose that would require a larger refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

map() fails with 'key and value lists must have the same length' when keys are literals and values are column expressions

3 participants