Is your feature request related to a problem?
There are complaints about in-database representations being unreadable #633 . It makes debugging and auditing harder.
Many databases or storage solutions do not support arbitrary-length field or property names.
What may work on one backend one day may stop working on the next.
Describe the solution you'd like
The current approach (as in Neo4j) is to sanitize the names. This involves a lot of extra prefixing or bijective encoding and decoding steps that turns a simple field or property name into a long mess. It also results in incompatible data depending on how deeply wrapped the systems are.
Preferred
- Only allow [a-z0-9_]+ (lowercase alphanumeric and underscores).
- Limit field/property names to 32 characters.
- AWS and some others support only up to 63 chars
- Chroma supports only up to 36 chars
- Reserve underscore-prefixed names for system.
This will allow for easily readable and filterable metadata, compatible with (nearly?) all storage backends. No unbounded growth of metadata keys. At most one translation layer between database and source (like replacing underscores with hyphens; or the DB itself reserves underscore prefix, in which case we'd add a simple prefix).
Filters
All system-defined metadata (external/query-facing --> generic internal):
- timestamp --> _timestamp
- custom_system_key --> _custom_system_key
- m.custom_user_key --> custom_user_key
Describe alternatives you've considered
Each database supports different name limitations.
- Makes migration more difficult.
- Harder to compose different components without shared understanding of limitations.
Additional context
No response
Is your feature request related to a problem?
There are complaints about in-database representations being unreadable #633 . It makes debugging and auditing harder.
Many databases or storage solutions do not support arbitrary-length field or property names.
What may work on one backend one day may stop working on the next.
Describe the solution you'd like
The current approach (as in Neo4j) is to sanitize the names. This involves a lot of extra prefixing or bijective encoding and decoding steps that turns a simple field or property name into a long mess. It also results in incompatible data depending on how deeply wrapped the systems are.
Preferred
This will allow for easily readable and filterable metadata, compatible with (nearly?) all storage backends. No unbounded growth of metadata keys. At most one translation layer between database and source (like replacing underscores with hyphens; or the DB itself reserves underscore prefix, in which case we'd add a simple prefix).
Filters
All system-defined metadata (external/query-facing --> generic internal):
Describe alternatives you've considered
Each database supports different name limitations.
Additional context
No response