fix: Handle array of strings columns in Athena materialization#6324
Merged
ntkathole merged 3 commits intoJun 3, 2026
Conversation
86036b7 to
c58ca8a
Compare
c58ca8a to
cf2f0c4
Compare
7 tasks
ntkathole
reviewed
Apr 25, 2026
ntkathole
reviewed
Apr 25, 2026
d21d32c to
ac81649
Compare
franciscojavierarceo
approved these changes
May 13, 2026
ac81649 to
66173d7
Compare
828cd23 to
4030d4b
Compare
ntkathole
approved these changes
Jun 3, 2026
Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>
Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>
Signed-off-by: Alan Gauthier <alan.gauthier@jobteaser.com>
4030d4b to
066c9a2
Compare
franciscojavierarceo
pushed a commit
that referenced
this pull request
Jun 13, 2026
# [0.64.0](v0.63.0...v0.64.0) (2026-06-13) ### Bug Fixes * Add async_supported property to RedisOnlineStore ([9b088fe](9b088fe)) * Add missing feast init templates to operator CRD and enhance persistence documentation ([1941d4d](1941d4d)) * Allow to publish from reference branch ([5458ec8](5458ec8)) * API calls list ([4203eb7](4203eb7)) * **bigquery:** Enable list inference for parquet loads in offline_write_batch ([9243497](9243497)), closes [#5845](#5845) * Bump grpcio dependencies ([07b4782](07b4782)) * **compute-engine/local:** Honor field_mapping on join keys in dedup + join nodes ([#6395](#6395)) ([bd01824](bd01824)) * **dynamodb:** Avoid tag race condition by using diff-based tag updates ([#6479](#6479)) ([bad2b7d](bad2b7d)), closes [#6418](#6418) * **dynamodb:** Fix mypy type for _build_projection_expression return ([217b4da](217b4da)) * Fix intermittent async test failures for DynamoDB and Redis ([63c5eb1](63c5eb1)) * Fix mongodb blog title ([57d28d4](57d28d4)) * Fix shared SQL registry crash - avoid unnecessary UDF deserialization in proto cache building ([ac588d7](ac588d7)) * Fix SparkRetrievalJob.persist() failing for SparkSource ([209d7cd](209d7cd)) * Fixed formatting and image for mongo blog ([#6377](#6377)) ([f8389fb](f8389fb)) * Fixes for ray source ([7f592a4](7f592a4)) * **go:** skip registry refresh when cache_ttl_seconds <= 0 ([97ed40c](97ed40c)) * Handle array of strings columns in Athena materialization ([#6324](#6324)) ([4ed0278](4ed0278)) * make milvus VARCHAR max_length configurable, remove hardcoded 512 limit ([3b98c22](3b98c22)) * **operator:** Set appProtocol: grpc on registry gRPC Service ([#6367](#6367)) ([c9ae2b4](c9ae2b4)) * PyJWT 2.10+ added validation that rejects empty HMAC keys ([e756ffe](e756ffe)) * RemoteOnlineStore sends all features in a single HTTP request ([8f187dd](8f187dd)) * Remove registry proto dump to enforce RBAC and add permission checks to Commit/Refresh RPCs ([328431f](328431f)) * Remove selector migration job - no longer needed ([51c325e](51c325e)) * replace broken .claude skill symlink with correct relative path ([4541690](4541690)) * Replace selector label strip patch with migration Job for upgrade-safe selector uniqueness ([00dea50](00dea50)) * Scope feature view name conflict check to current project in file-based registry ([#6369](#6369)) ([a4fde83](a4fde83)), closes [#6209](#6209) * **snowflake:** Stop double-quoting connection identifiers ([#6462](#6462)) ([e914d59](e914d59)) * **spark:** S3/GCS PyArrow filesystem resolution for staging paths ([#6442](#6442)) ([ae50414](ae50414)) * **trino:** Clean up temporary entity tables after retrieval ([#6381](#6381)) ([d86b13d](d86b13d)), closes [#6306](#6306) * Update go-feature-server base image to Go 1.25 and fix operator Dockerfile COPY permissions ([86ef0bc](86ef0bc)) ### Features * [Backend] Data Quality Monitoring with native compute, multi-backend support, REST API, CLI ([#6202](#6202)) ([5458c37](5458c37)) * Add apache flink compute engine ([#6476](#6476)) ([9636d6a](9636d6a)) * Add demo noteboooks for users ([e362173](e362173)) * Add enabled/disabled toggle for feature views ([#6401](#6401)) ([5f1fa0d](5f1fa0d)), closes [#6395](#6395) * Add Label View to init template ([ec272d5](ec272d5)) * Add mTLS support to remote registry gRPC client ([#6474](#6474)) ([c9602d8](c9602d8)) * Add Prometheus gauges for FeatureStore installation telemetry ([#6354](#6354)) ([1b681b7](1b681b7)) * Adds registry REST API endpoints for managing entities, data sources, and feature views ([#6413](#6413)) ([f77bd1d](f77bd1d)) * Allow CRUD on entities, data sources, and feature views from UI ([#6412](#6412)) ([2321c07](2321c07)) * Allow default openlineage configuration ([#6467](#6467)) ([276b6df](276b6df)) * **bigquery:** Support DATE-type event timestamp columns ([#6362](#6362)) ([753dee5](753dee5)), closes [#2530](#2530) * **cli:** Add `feast projects delete` command (closes [#5095](#5095)) ([#6318](#6318)) ([1a4b96c](1a4b96c)) * Data Quality Monitoring added in feast UI ([#6422](#6422)) ([fa271be](fa271be)) * **dynamodb:** Use ProjectionExpression when requested_features is set ([0adc906](0adc906)), closes [#6058](#6058) * Enhance DataSource and FeatureView modals with error handling and submission states ([96d7169](96d7169)) * Expose registry endpoints on feature server for MCP access ([f77981c](f77981c)) * Feast First-Class LabelView Implementation ([#6292](#6292)) ([c0e7e5d](c0e7e5d)) * Feast-MLflow Integration ([#6235](#6235)) ([7279c75](7279c75)) * Operational metrics for offline store and SOX metrics for both ([#6340](#6340)) ([65b1b80](65b1b80)) * Pre-compute feature service ([8011550](8011550)) * REST API-backed UI for RBAC compatibility and per-page lazy loading ([#6414](#6414)) ([6ae80af](6ae80af)) * Support non-string map key types ([#6382](#6382)) ([#6383](#6383)) ([728aa2e](728aa2e)) * Update FeatureStore CRD with DRA Fields ([01241e4](01241e4)) ### Performance Improvements * Cache feature view resolution in get_online_features to reduce per-request overhead ([55c2f18](55c2f18)) * Optimize feature serving latency with batched async Redis, cached checks fix ([103809a](103809a)) * Replace MessageToDict with optimized custom dict builder ([#6015](#6015)) ([9902064](9902064))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it
Fixes two related bugs that cause
TypeErrorandValueErrorwhen materializingfeature views with array-typed columns (e.g.
Array(String),Array(Int64)) usingthe Athena offline store.
Arrow/Athena deserializes array columns as
numpy.ndarray(object dtype) instead ofplain Python lists. This breaks two code paths in
type_map.py:_convert_scalar_values_to_proto:pd.isnull(ndarray)returns an array of bools,and
not <array>raisesValueError: The truth value of an empty array is ambiguous.→ Already guarded by
_is_array_likein newer Feast versions; no change needed here._convert_list_values_to_proto(generic list path):proto_type(val=ndarray)passesthe raw numpy array to the protobuf constructor, which only accepts Python lists →
TypeError: bad argument type for built-in operation. Additionally, Arrow nullablecolumns can yield
Noneelements inside the ndarray, which protobuf repeated fieldsalso reject.
_validate_collection_item_types:Noneelements inside an ndarray failed thetype(item) in valid_typescheck before reaching the sanitization step.Changes
feast/type_map.pyAdd module-level
_LIST_NONE_DEFAULTSdict mapping each listValueTypeto atype-appropriate zero/empty default value used to replace
Noneelements:STRING_LIST,UUID_LIST,TIME_UUID_LIST,DECIMAL_LIST→""BYTES_LIST→b""INT32_LIST,INT64_LIST→0FLOAT_LIST,DOUBLE_LIST→0.0BOOL_LIST→FalseUNIX_TIMESTAMP_LIST→NULL_TIMESTAMP_INT_VALUEAdd module-level
_sanitize_list_value(value, feast_value_type)helper that:.tolist()on anynumpy.ndarrayto produce a plain Python list(empty ndarray →
None, treated as a missing row)Noneelements with the type-appropriate default from_LIST_NONE_DEFAULTSNoneand for scalar valuesApply sanitization upfront in
_convert_list_values_to_proto: bothvaluesandsampleare normalised via_sanitize_list_valuebefore any type-checking or protoconversion, removing the need for per-path ndarray handling.
Remove the old
_to_proto_safe_list/_DROP_NONE/_LIST_TYPE_NONE_REPLACEMENTmodule-level helpers, which have been superseded by the above.
Skip
Noneelements in_validate_collection_item_types—Noneentries arevalid in nullable Arrow columns and are sanitized upstream; raising a
TypeErroronthem before that point was incorrect.
Testing
Added
TestArrowArrayStringListMaterializationinsdk/python/tests/unit/test_type_map.pycovering:test_sanitize_list_value_ndarraytest_sanitize_list_value_empty_ndarrayNone(missing row)test_sanitize_list_value_ndarray_with_noneNoneelements in STRING_LIST replaced with""test_sanitize_list_value_plain_listtest_sanitize_list_value_plain_list_with_noneNonein plain STRING_LIST list replaced with""test_sanitize_list_value_numeric_none_replacedNonein numeric/bool lists replaced with zero defaulttest_sanitize_list_value_bytes_none_replacedNonein BYTES_LIST replaced withb""test_sanitize_list_value_scalar_passthroughtest_string_list_from_ndarraypython_values_to_proto_valuestest_string_list_from_empty_ndarrayValueErrortest_string_list_from_ndarray_with_none_elementsNonein ndarray no longer raisesTypeErrortest_string_list_null_row_produces_empty_protoNonerows produce emptyProtoValuetest_mixed_batch_simulating_athena_chunkWhich issues this PR fixes
Fixes #6325
Does this PR introduce a user-facing change?
Yes — materialization of array-typed feature columns from Athena no longer fails with
TypeErrororValueErrorwhen a batch contains empty arrays,Nonerows, orNoneelements inside arrays.
Noneelements inside an array are now stored as thetype-appropriate zero/empty value (e.g.
""for strings,0for integers).