[kafka_actions] Fix Avro deserialization for schemas with logical types#23224
Conversation
fastavro.schemaless_reader returns Python-native types for Avro logical types (decimal.Decimal, datetime, date, uuid.UUID, bytes), but json.dumps() cannot serialize these. This caused "Failed to deserialize Avro message" errors for any schema using bytes/decimal, timestamp-millis, date, or uuid logical types — including when using Schema Registry. Add a custom JSON encoder (_AvroJSONEncoder) that handles all fastavro logical type return values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6b532c959
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| def default(self, obj): | ||
| if isinstance(obj, decimal.Decimal): | ||
| return float(obj) |
There was a problem hiding this comment.
Avoid lossy float conversion for Avro decimals
_AvroJSONEncoder.default now converts decimal.Decimal to float, which silently loses precision for many valid Avro decimal values (especially high-precision/large-scale amounts) and can change numeric values in emitted JSON. This commit fixes deserialization failures, but in financial or ID-like decimal fields it can now produce incorrect data rather than an explicit error; serializing decimals as strings would preserve exact values.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
float() silently loses precision for high-precision Avro decimal values. str() preserves exact values, and the filter's _coerce_types already handles str-to-number conversion for > / < comparisons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
fastavro.schemaless_readerreturns Python-native types for Avro logical types (decimal.Decimal,datetime.datetime,datetime.date,uuid.UUID,bytes), butjson.dumps()in_deserialize_avrocannot serialize any of them"Failed to deserialize Avro message: Object of type Decimal is not JSON serializable"errors for schemas usingbytes/decimal,timestamp-millis,date, oruuidlogical types — both with inline schemas and Schema Registry_AvroJSONEncodercustom encoder that handles all fastavro logical type return valuesTest plan
test_avro_logical_types_decimal_timestamp_uuidpassesbytes/decimalfields (e.g.enriched_spend_events)🤖 Generated with Claude Code