Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
The current entity_key serDe (version 2) is below:
def serialize_entity_key(
entity_key: EntityKeyProto, entity_key_serialization_version=1
) -> bytes:
"""
Serialize entity key to a bytestring so it can be used as a lookup key in a hash table.
We need this encoding to be stable; therefore we cannot just use protobuf serialization
here since it does not guarantee that two proto messages containing the same data will
serialize to the same byte string[1].
[1] https://developers.google.com/protocol-buffers/docs/encoding
"""
sorted_keys, sorted_values = zip(
*sorted(zip(entity_key.join_keys, entity_key.entity_values))
)
output: List[bytes] = []
for k in sorted_keys:
output.append(struct.pack("<I", ValueType.STRING))
output.append(k.encode("utf8"))
for v in sorted_values:
val_bytes, value_type = _serialize_val(
v.WhichOneof("val"),
v,
entity_key_serialization_version=entity_key_serialization_version,
)
output.append(struct.pack("<I", value_type))
output.append(struct.pack("<I", len(val_bytes)))
output.append(val_bytes)
return b"".join(output)
e.g, for sorted_keys = {tuple: 1} item_id and sorted_values = {tuple: 1} int64_val: 1\n will give output:
[b'\x02\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']
This makes deserialization not doable. In order to deserialize we can append the "length" of value to the join_key, such as for the same test key and value we can get the output:
[b'\x02\x00\x00\x00', b'\x07\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']
Then we can deserialize the bytes to proto.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
The current entity_key serDe (version 2) is below:
e.g, for
sorted_keys = {tuple: 1} item_idandsorted_values = {tuple: 1} int64_val: 1\nwill give output:[b'\x02\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']This makes deserialization not doable. In order to deserialize we can append the "length" of value to the
join_key, such as for the same test key and value we can get the output:[b'\x02\x00\x00\x00', b'\x07\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']Then we can deserialize the bytes to proto.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.