feat: add existence filter for optional fields#2777
feat: add existence filter for optional fields#2777kishorenc merged 19 commits intotypesense:v31from
Conversation
|
Fixes #790 |
35f8efe to
01f45c5
Compare
| } | ||
| } else { | ||
| // _exists: all docs minus the missing list | ||
| // Get the complement of missing ids to get the existing ids. |
There was a problem hiding this comment.
@ozanarmagan We should also implement iterative logic in case of enable_lazy_filter like we evaluate integer filterr.
You can refer to this test for details. The crux of the iterative logic is to return the seq_ids in between of the actual matches of the iterator. So if the iterator matches 0, 2, 5, ... The matches for not equals will be 1, 3, 4 ...
There was a problem hiding this comment.
Ready to review again
|
Any chance we can expand this for arrays, so we can filter for empty/non-empty arrays? Or should we create a separate issue for the next iteration? |
@alangmartini Could you create another issue for this? I will address that in another PR. |
|
@kishorenc PR is ready for your review. |
|
A bit of manual + automated review but I have gone through each issue carefully and verified that proposed fixes are logical. I have attached a patch. @happy-san please review and confirm. Issues found
Fix summary
|
|
@ozanarmagan I have listed the tests that will surface each issue:
{
"fields": [
{
"name": "field",
"type": "string"
},
{
"name": "optional_field",
"type": "string",
"optional": true,
"optional_index": true
}
]
}Add a document like, {
"field": "foo",
"optional_field": "bar"
}Check that {
"field": "baz"
}
{
"fields": [
{
"name": "field_exists",
"type": "string"
},
{
"name": "field",
"type": "string"
}
]
}try passing a filter like iter_exists.reset();
ASSERT_EQ(filter_result_iterator_t::valid, iter_exists.validity);
for (uint32_t i = 0; i < validate_ids.size(); i++) {
ASSERT_EQ(filter_result_iterator_t::valid, iter_exists.validity);
ASSERT_EQ(expected[i], iter_exists.is_valid(validate_ids[i]));
if (expected[i] == 1) {
iter_exists.next();
}
ASSERT_EQ(seq_ids[i], iter_exists.seq_id);
}
ASSERT_EQ(filter_result_iterator_t::invalid, iter_exists.validity); |
|
The file I have attached to my previous comment already has the patch that contains both the code and the tests for these issues. |
…-optional-index
…ypesense into v31-optional-index
happy-san
left a comment
There was a problem hiding this comment.
@ozanarmagan I've left some comments for lazy evaluation path of missing filter. The logic is supposed to be similar to id field evaluation for _missing lazy filter and for !_missing lazy filter, the logic will be similar to != lazy numeric filter.
Let me know if I can clarify anything further.
| /// Resets the iterator state from the given id list. | ||
| void reset_from_id_list(id_list_t* source); | ||
|
|
||
| /// Computes the full result from the given id list. | ||
| void compute_result_from_id_list(id_list_t* source); | ||
|
|
||
| /// Resets the iterator state for missing filters. | ||
| void reset_missing_iterator(); | ||
|
|
||
| /// Advances the iterator state for missing filters. | ||
| void advance_missing_iterator(); | ||
|
|
||
| /// Computes the full result for missing filters. | ||
| void compute_missing_result(); |
There was a problem hiding this comment.
Let's remove these methods. The logic for missing filter iterator will be similar to that of id filter iterator.
Change Summary
Add support for filtering documents by whether optional fields are missing
using
field: _missingandfield: !_missingsyntax.This is enabled per field via the new
track_missing_valuesschema property.Usage
track_missing_valuesenabled:{ "name": "products", "fields": [ {"name": "title", "type": "string"}, {"name": "color", "type": "string", "optional": true, "track_missing_values": true}, {"name": "points", "type": "int32"} ] }2a. Filter for documents where the field is missing:
2b. Filter for documents where the field is present:
2c. Combine with other filters:
PR Checklist