Skip to content

feat: add existence filter for optional fields#2777

Merged
kishorenc merged 19 commits intotypesense:v31from
ozanarmagan:v31-optional-index
Apr 10, 2026
Merged

feat: add existence filter for optional fields#2777
kishorenc merged 19 commits intotypesense:v31from
ozanarmagan:v31-optional-index

Conversation

@ozanarmagan
Copy link
Copy Markdown
Contributor

@ozanarmagan ozanarmagan commented Feb 14, 2026

Change Summary

Add support for filtering documents by whether optional fields are missing
using field: _missing and field: !_missing syntax.

This is enabled per field via the new track_missing_values schema property.

Usage

  1. Create a collection with an optional indexed field that has
    track_missing_values enabled:
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "color", "type": "string", "optional": true, "track_missing_values": true},
    {"name": "points", "type": "int32"}
  ]
}

2a. Filter for documents where the field is missing:

GET /collections/products/documents/search?q=*&filter_by=color: _missing

2b. Filter for documents where the field is present:

GET /collections/products/documents/search?q=*&filter_by=color: !_missing

2c. Combine with other filters:

filter_by=color: _missing && points: >10
filter_by=color: !_missing || rating: _missing

PR Checklist

@ozanarmagan
Copy link
Copy Markdown
Contributor Author

Fixes #790

Comment thread test/collection_filtering_test.cpp Outdated
Comment thread test/collection_filtering_test.cpp Outdated
Comment thread test/collection_filtering_test.cpp
Comment thread test/collection_filtering_test.cpp Outdated
Comment thread test/collection_filtering_test.cpp
Comment thread test/collection_filtering_test.cpp
Comment thread test/collection_filtering_test.cpp Outdated
Comment thread include/index.h
Comment thread src/filter_result_iterator.cpp Outdated
}
} else {
// _exists: all docs minus the missing list
// Get the complement of missing ids to get the existing ids.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozanarmagan We should also implement iterative logic in case of enable_lazy_filter like we evaluate integer filterr.

You can refer to this test for details. The crux of the iterative logic is to return the seq_ids in between of the actual matches of the iterator. So if the iterator matches 0, 2, 5, ... The matches for not equals will be 1, 3, 4 ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to review again

@alangmartini
Copy link
Copy Markdown
Collaborator

Any chance we can expand this for arrays, so we can filter for empty/non-empty arrays? Or should we create a separate issue for the next iteration?

@ozanarmagan ozanarmagan requested a review from happy-san March 2, 2026 02:36
@ozanarmagan
Copy link
Copy Markdown
Contributor Author

Any chance we can expand this for arrays, so we can filter for empty/non-empty arrays? Or should we create a separate issue for the next iteration?

@alangmartini Could you create another issue for this? I will address that in another PR.

Copy link
Copy Markdown
Contributor

@happy-san happy-san left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest looks good!

Comment thread src/filter_result_iterator.cpp
@happy-san
Copy link
Copy Markdown
Contributor

@kishorenc PR is ready for your review.

@kishorenc
Copy link
Copy Markdown
Member

A bit of manual + automated review but I have gone through each issue carefully and verified that proposed fixes are logical. I have attached a patch. @happy-san please review and confirm.

2777_review_patch.patch

Issues found

  1. optional_index metadata was updated from partial update payloads instead of the final merged document state. In mixed update batches this could incorrectly mark an unchanged optional field as missing, breaking field: _exists / field: !_exists.
  2. optional_index: true was accepted on index: false fields. That schema produced no backing existence index, so _exists / !_exists could return incorrect results.
  3. _exists parsing used substring matching, so ordinary string filters containing _exists (for example title: pre_exists_post) were misparsed as existence filters.
  4. field::field_from_json() did not preserve optional_index, so JSON-to-field reconstruction silently dropped the flag.
  5. Lazy _exists iterators could not be reset or materialized correctly. compute_iterators() and related paths treated the missing-id list as the final result set, so lazy _exists searches could return wrong hits.

Fix summary

  • Existence bookkeeping now uses each record's final document state (new_doc for updates, doc for inserts) and updates the missing-field index in one place.
  • Schema validation now rejects optional_index unless the field is both optional: true and index: true.
  • Existence parsing now only triggers on exact _exists / !_exists tokens, while normal string filters continue to work.
  • field::field_from_json() now carries optional_index through correctly.
  • Lazy _exists iteration now has dedicated reset and materialization logic, so _exists uses the complement of the missing-id list consistently in iterator and search paths.
  • Added focused regression tests for mixed-batch updates, string literals containing _exists, invalid optional_index schemas, JSON field reconstruction, lazy iterator materialization, and lazy search hits.

@happy-san
Copy link
Copy Markdown
Contributor

@ozanarmagan I have listed the tests that will surface each issue:

  1. Have a schema like,
{
  "fields": [
    {
      "name": "field",
      "type": "string"
    },
    {
      "name": "optional_field",
      "type": "string",
      "optional": true,
      "optional_index": true
    }
  ]
}

Add a document like,

{
  "field": "foo",
  "optional_field": "bar"
}

Check that filter_by: optional_field: !_exists should match no document.
Update the document with:

{
  "field": "baz"
}

filter_by: optional_field: !_exists should still match no document.
2. A simple test that should fail when creating a field with optional_index: true and index: false should work.
3. Using the schema ,

{
  "fields": [
    {
      "name": "field_exists",
      "type": "string"
    },
    {
      "name": "field",
      "type": "string"
    }
  ]
}

try passing a filter like field_exists: foo or field: value_exists.
4. A test like https://github.com/typesense/typesense/blob/v31/test/collection_join_test.cpp#L6013-L6021 will surface this issue. We must add this test when a new option is added in the field.
5. Add

    iter_exists.reset();
    ASSERT_EQ(filter_result_iterator_t::valid, iter_exists.validity);

    for (uint32_t i = 0; i < validate_ids.size(); i++) {
        ASSERT_EQ(filter_result_iterator_t::valid, iter_exists.validity);
        ASSERT_EQ(expected[i], iter_exists.is_valid(validate_ids[i]));

        if (expected[i] == 1) {
            iter_exists.next();
        }
        ASSERT_EQ(seq_ids[i], iter_exists.seq_id);
    }
    ASSERT_EQ(filter_result_iterator_t::invalid, iter_exists.validity);

after this line

@kishorenc
Copy link
Copy Markdown
Member

The file I have attached to my previous comment already has the patch that contains both the code and the tests for these issues.

Copy link
Copy Markdown
Contributor

@happy-san happy-san left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozanarmagan I've left some comments for lazy evaluation path of missing filter. The logic is supposed to be similar to id field evaluation for _missing lazy filter and for !_missing lazy filter, the logic will be similar to != lazy numeric filter.

Let me know if I can clarify anything further.

Comment thread src/filter_result_iterator.cpp Outdated
Comment thread src/filter.cpp Outdated
Comment thread src/filter.cpp Outdated
Comment thread src/filter.cpp
Comment thread test/collection_manager_test.cpp Outdated
Comment thread include/filter_result_iterator.h Outdated
Comment on lines +335 to +348
/// Resets the iterator state from the given id list.
void reset_from_id_list(id_list_t* source);

/// Computes the full result from the given id list.
void compute_result_from_id_list(id_list_t* source);

/// Resets the iterator state for missing filters.
void reset_missing_iterator();

/// Advances the iterator state for missing filters.
void advance_missing_iterator();

/// Computes the full result for missing filters.
void compute_missing_result();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these methods. The logic for missing filter iterator will be similar to that of id filter iterator.

Comment thread src/filter_result_iterator.cpp Outdated
Comment thread src/filter_result_iterator.cpp Outdated
Comment thread src/filter_result_iterator.cpp Outdated
Comment thread test/collection_filtering_test.cpp Outdated
@ozanarmagan ozanarmagan requested a review from happy-san April 6, 2026 13:19
@kishorenc kishorenc merged commit e6fd315 into typesense:v31 Apr 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants