fix: map() fails when keys are literals and values are column expressions#22784
fix: map() fails when keys are literals and values are column expressions#22784nathanb9 wants to merge 2 commits into
Conversation
|
CI failure seems like a transient docker failure |
| // so that the rest of the logic handles uniform array inputs. | ||
| // This handles cases like: map(['a','b'], [col1, col2]) where keys are all | ||
| // literals (scalar FixedSizeList) but values contain column references. | ||
| let (keys_arg, values_arg) = if !can_evaluate_to_const { |
There was a problem hiding this comment.
This scalar expansion logic is repeated for both keys and values. It might be worth pulling it into a small helper so the invariant lives in one place and future mixed scalar/array handling is less likely to drift.
For example:
fn expand_if_scalar(arg: &ColumnarValue, rows: usize) -> Result<ColumnarValue> {
match arg {
ColumnarValue::Scalar(s) => Ok(ColumnarValue::Array(s.to_array_of_size(rows)?)),
ColumnarValue::Array(a) => Ok(ColumnarValue::Array(Arc::clone(a))),
}
}| # because scalar keys (FixedSizeList) had length=N (number of keys) while | ||
| # array values had length=batch_size. | ||
| query ? | ||
| SELECT map(['a','b'], [column1, column1 * 10]) FROM (VALUES (1), (2), (3)) t; |
There was a problem hiding this comment.
Nice coverage for literal keys with column values. Since the implementation expands either side when mixed, it may also be useful to add the symmetric SQL-visible case with array keys and scalar values, such as map([column1, column1 * 10], ['a','b']) or an equivalent version with unique keys.
| } | ||
|
|
||
| #[test] | ||
| fn test_make_map_scalar_keys_array_values() { |
There was a problem hiding this comment.
i think the SLT is sufficient, dont need to duplicate here
| # Regression test: map with literal keys and column-reference values | ||
| # Previously failed with "map requires key and value lists to have the same length" | ||
| # because scalar keys (FixedSizeList) had length=N (number of keys) while | ||
| # array values had length=batch_size. |
There was a problem hiding this comment.
| # Regression test: map with literal keys and column-reference values | |
| # Previously failed with "map requires key and value lists to have the same length" | |
| # because scalar keys (FixedSizeList) had length=N (number of keys) while | |
| # array values had length=batch_size. | |
| # mixed scalar/array inputs |
better to keep it short and simple
| // If we have a mix of scalar and array args, expand the scalar to an array | ||
| // so that the rest of the logic handles uniform array inputs. | ||
| // This handles cases like: map(['a','b'], [col1, col2]) where keys are all | ||
| // literals (scalar FixedSizeList) but values contain column references. |
There was a problem hiding this comment.
| // If we have a mix of scalar and array args, expand the scalar to an array | |
| // so that the rest of the logic handles uniform array inputs. | |
| // This handles cases like: map(['a','b'], [col1, col2]) where keys are all | |
| // literals (scalar FixedSizeList) but values contain column references. | |
| // if we can't evaluate to const (inputs are not both scalar) then ensure they | |
| // are expanded to arrays which following logic expects |
simplifying comment a bit
| }; | ||
| let values_arg = match values_arg { | ||
| ColumnarValue::Scalar(s) => { | ||
| ColumnarValue::Array(s.to_array_of_size(number_rows)?) |
There was a problem hiding this comment.
it would be nice if we could refactor the make map logic as a whole to not have this confusing expansion to array but still wrap in columnarvalue, but i suppose that would require a larger refactor
Which issue does this PR close?
Closes #22781
Rationale for this change
map(['a','b'], [col, col * 10])fails at execution time with "map requires key and value lists to have the same length" when keys are all literals and values contain column references.Root cause: literal keys evaluate to
ColumnarValue::Scalar(FixedSizeList[N])(length N) while column values evaluate toColumnarValue::Array(length batch_size). The length check compares N != batch_size and errors.What changes are included in this PR?
In
make_map_batch: when one argument is scalar and the other is array, expand the scalar to matchbatch_sizeviaScalarValue::to_array_of_size(number_rows)before the length comparison.Are these changes tested?
Yes.
map.rsSELECT map(['a','b'], [column1, column1 * 10]) FROM (VALUES (1), (2), (3))Are there any user-facing changes?
No. Previously-failing queries now execute correctly.