Skip to content

feat: Add OnlineStore for Aerospike#6532

Open
vkagamlyk wants to merge 18 commits into
feast-dev:masterfrom
vkagamlyk:feat/aerospike-online-store
Open

feat: Add OnlineStore for Aerospike#6532
vkagamlyk wants to merge 18 commits into
feast-dev:masterfrom
vkagamlyk:feat/aerospike-online-store

Conversation

@vkagamlyk

Copy link
Copy Markdown

What this PR does / why we need it

Adds a first-class Aerospike online store integration at
feast.infra.online_stores.aerospike_online_store, backed by the official
aerospike>=19 Python client and targeting Aerospike server 6.0+
(developed and tested against CE 8.x).

Aerospike is a low-latency distributed key-value store widely deployed for
real-time ML feature serving. This integration gives Feast users a managed path
to Aerospike for online retrieval.

Storage layout

Each entity is stored as a single Aerospike record keyed by the serialized
entity key. Multiple feature views for the same entity share one record using
Map CDT bins:

namespace: <config.namespace>              # default: "feast"
set:       <set_name_template>             # default: "{project}_{collection_suffix}" → e.g. my_repo_latest
key:       bytearray(serialize_entity_key(entity_key))
bins:
  features   : map<fv_name, map<feature_name, native_value>>
  event_ts   : map<fv_name, epoch_ms>
  created_ts : epoch_ms                    # optional, single top-level value
record TTL: config.ttl_seconds             # record-level (-1 = never, 0 → never, unset → namespace default)

Default: one set per project ({project}_latest). Per-feature-view isolation
is opt-in via set_overrides (see below).

Writes use Aerospike batch_write with Map CDT map_put_items /
map_put ops so concurrent writers touching different feature views on the same
entity never clobber each other. Reads use batch_operate with server-side
Map-get projection when requested_features is provided.

Implementation highlights

  • Sync and async paths — implements online_write_batch, online_read,
    online_write_batch_async, online_read_async, initialize(), and
    close(). Async methods wrap the blocking client via run_in_executor.
    Native asyncio client adoption is tracked as a follow-up.
  • requested_features projection — read path builds a Map CDT
    map_get_by_key_list nested into the feature-view submap via cdt_ctx_map_key
    so only requested columns are returned from the server.
  • Per-record error surfacingbatch_write / batch_operate only raise
    when the whole request is rejected; this store inspects per-record result
    codes and raises on transient failures (RECORD_NOT_FOUND and
    OP_NOT_APPLICABLE are treated as missing features).
  • Binary key safety — user keys are passed to the Aerospike C client as
    bytearray, not bytes. The client hashes only the first byte of bytes
    keys during digest computation, causing silent collisions; bytearray uses
    the intended binary-key digest.
  • Batch-read orderingbatch_operate preserves input order; responses
    are paired back via zip rather than re-parsing br.key[2].
  • MAP_KEY_ORDERED — Map CDTs are created with ordered-map policy for
    O(log N) lookups on wide feature views and on the update() background scan.
  • KEY_DIGEST writes — writes use the client default digest policy; the
    serialized entity key is not stored redundantly on the server.
  • Composite entity keys — entities go through serialize_entity_key; no
    schema changes for multi-entity keys.
  • TTL sentinelsttl_seconds: unset → namespace default, 0 → never
    expire, >0 → explicit seconds.
  • Per-FV namespace / set overridesnamespace_overrides and
    set_overrides pin individual feature views to different namespaces or sets
    without splitting the project across stores. update() and teardown()
    honour resolved (namespace, set) pairs.
  • Prewriting hook — optional import-string-resolved callable on
    online_write_batch for PII masking, encryption, coercion, etc.
  • Timeout surfaceread_timeout_ms, write_timeout_ms,
    batch_total_timeout_ms (per-batch total deadline including retries), and
    optional socket_timeout_ms (per-attempt, propagates to all three policies).
  • Enterprise-only featuresuser / password auth and TLS
    (tls / tls_ca_file) are plumbed through the config but are config-only
    in v1
    on Aerospike CE (not exercised in GitHub Actions). Documented in the
    reference page.

Tests

  • ~57 pure-Python unit tests (no Docker required) covering:

    • Datetime helpers, TTL sentinels, timeout policy wiring
    • Raw-doc → proto conversion, result ordering
    • _build_batch_writes shape: Map bins, TTL meta, bytearray keys,
      MAP_KEY_ORDERED policy, empty-batch short-circuit
    • online_read projection, not-found / op-not-applicable / transient errors
    • Per-record write error surfacing
    • update(tables_to_delete=...) scan coalescing by (namespace, set)
    • teardown truncates all resolved pairs and closes the client
    • Namespace/set overrides on read, write, update, teardown
    • Prewriting hook resolution, caching, transform, and short-circuit
    • Async wrappers delegate to sync path; initialize / close lifecycle
    • Config-type guards reject non-Aerospike configs
  • 3 Docker-backed integration tests via testcontainers (Aerospike CE 8 —
    aerospike/aerospike-server:8.0.0.10_1):

    • test_aerospike_online_features — end-to-end write + read + projection +
      missing-key → None across int, string, and composite entity keys
    • test_aerospike_cross_fv_map_cdt_upsert — two feature views on the same
      entity coexist (Map CDT upsert semantics)
    • test_aerospike_update_strips_dropped_feature_viewupdate() background
      scan removes a dropped FV slot while the surviving FV stays intact
  • Universal online-store suite wiring — ships AerospikeOnlineStoreCreator
    plus a FULL_REPO_CONFIGS module for opt-in via
    FULL_REPO_CONFIGS_MODULE=feast.infra.online_stores.aerospike_online_store.aerospike_repo_configuration.
    Default-matrix registration is deferred to a follow-up PR (MongoDB, Couchbase,
    Cassandra, MySQL pattern).

Documentation

  • docs/reference/online-stores/aerospike.md — reference page (config, data
    model, CE vs EE matrix, overrides, prewriting hook, tuning).
  • docs/reference/online-stores/README.md + docs/SUMMARY.md — linked.
  • docs/roadmap.md — Aerospike marked [x] under Online Stores.
  • docs/how-to-guides/online-server-performance-tuning.md — Aerospike row and
    tuning subsection.
  • examples/online_store/aerospike_overrides_and_hooks/ — sample hook +
    override config.

feast-operator

Adds aerospike to supported online store types in feast-operator CRDs and
bundled manifests.

Out of scope for v1 (follow-ups)

  • Feature-view versioning (raw FV names used as Map keys — matches MongoDB).
  • Native asyncio Aerospike client (currently run_in_executor wrappers).
  • Override async_supported on AerospikeOnlineStore so the feature server
    routes to online_read_async / online_write_batch_async (methods exist;
    property still inherits the base read=False, write=False default).
  • Participation in the default universal-suite CI matrix (opt-in only today).
  • Vector search / Aerospike Vector Search integration.

Known follow-ups before upstream PR

  • Remove or retain preview hint in aerospike.md per feast-dev convention
    (reviewer note: same pattern as MongoDB preview banner).
  • Optional: add feast-benchmarks Aerospike sweep results to PR / blog once
    upstream benchmarks PR lands.

Checks

  • Tests passing locally (pytest unit + Docker integration tests).
  • Commits signed off (git commit -s).
  • PR title follows conventional commits format.

Testing strategy

  • Unit tests
  • Integration tests (testcontainers / Aerospike CE 8)
  • Testing is not required for this change

Misc

Aerospike client dependency is gated behind the aerospike optional extra
(pip install 'feast[aerospike]'), so existing installs that don't need it pay
no import cost.

vkagamlyk added 18 commits June 16, 2026 15:41
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
The Aerospike Python client mishandles bytes user keys (hashes only the first byte), collapsing all entities onto the same digest. Wrap keys in bytearray on write and read. Also pair BatchRecord responses with original input keys via zip rather than trusting br.key[2], which the client returns in a different representation on reads.

Add two integration tests: cross-FV Map CDT coexistence and update(tables_to_delete=...) background scan.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
…r-record errors

Two review blockers rolled into one commit because they share the same code path.

1. Server-side projection. online_read now builds a map_get_by_key_list op nested into the feature-view submap via cdt_ctx_map_key when requested_features is provided, instead of fetching the whole FV slot and filtering in Python. For wide feature views this ships only the requested columns over the wire. The response shape (flat [k,v,k,v] list vs. dict) is normalized through _normalize_projected_features.

2. Per-record error surfacing. Both batch_write and batch_operate only raise when the whole request is rejected; partial failures (single-partition timeout, replica quorum miss) are otherwise silent and present downstream as missing features. online_read now distinguishes RECORD_NOT_FOUND (2) and OP_NOT_APPLICABLE (26, = nested ctx miss when FV slot is absent) from transient errors, which are raised. online_write_batch inspects every per-record result code after the batch call.

Unit tests cover all four paths: projected read, not-found, op-not-applicable (nested ctx miss), and a simulated TIMEOUT that must raise. The docker-backed cross-FV and update() integration tests still pass, so server-side projection is verified end-to-end against a real Aerospike server.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
…nd add socket_timeout_ms

Review feedback: total_timeout_ms was ambiguous (users read it as a global/end-to-end timeout) and the timeout surface was missing socket_timeout, which is the per-attempt trigger that lets max_retries actually fire within the total budget.

* total_timeout_ms -> batch_total_timeout_ms. Now explicitly named after the Aerospike batch policy it maps to, matches read_timeout_ms / write_timeout_ms in framing (each targets one policy scope).

* Add socket_timeout_ms (optional). Applies uniformly to read, write and batch policies when set. Leaves the Aerospike client default in place when unset.

BREAKING CHANGE: total_timeout_ms is renamed to batch_total_timeout_ms. Config files using the old name must be updated. No default value change.

Docs updated (reference + perf-tuning guide) with a short explainer on the per-attempt vs total deadline distinction. Two new unit tests pin the policy wiring: socket_timeout_ms propagates to all three scopes, and is omitted (not injected as None) when unset.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
…oped client

Cheap-win cleanups flagged in review, all touching the same small patch of write-path and lifecycle code.

* Map CDTs are now created with MAP_KEY_ORDERED. map_get_by_key / map_remove_by_key on an ordered map are O(log N) in the map size instead of O(N); matters on reads of wide feature views and on the update() background scan (which walks every record in the project's set).

* Writes drop POLICY_KEY_SEND and rely on the client default (POLICY_KEY_DIGEST). The serialized entity key is no longer stored alongside each record, saving per-record storage the read path never consumes (batch_operate preserves request order; results are paired back by zip in online_read).

* _client moves from a class attribute to an instance attribute (set in __init__). Previously two AerospikeOnlineStore instances could share the cached client through class state until one wrote self._client. With the instance attribute the state is always per-instance from construction.

* Drop MongoDB references from class docstrings and comments (they referred to how the storage layout was derived rather than documenting current behavior). Also rewrite the _build_batch_writes docstring to describe the policies applied on the write path.

Unit test assertions for the write-path record are updated: bw.policy is now None (client default applies) and map ops carry map_policy={'map_order': MAP_KEY_ORDERED}. All three docker-backed integration tests still pass end-to-end (cross-FV upsert, update() background scan, full feature-store round-trip), so the read/write shape survives the ordering and policy changes against a real server.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Adds three configuration knobs to AerospikeOnlineStoreConfig:

- namespace_overrides: pin individual feature views to a different
  Aerospike namespace (e.g. RAM-only vs. SSD-backed) without splitting
  the project across stores.
- set_overrides: place a feature view in its own set so admin ops on
  it (truncate, scan-based deletes during `feast apply`) do not touch
  records of other views.
- prewriting_hook: import-string-resolved callable invoked once per
  online_write_batch with the rows about to be written, returning the
  rows that actually go on the wire. Resolved and cached on first use;
  returning [] short-circuits the wire call.

Read, write, update and teardown paths all honour the per-FV ns/set
resolution. update() groups dropped feature views by their resolved
(ns, set) pair and issues one background scan per group. teardown()
truncates every unique (ns, set) pair the project may have written to,
including the store-level default.

Adds 22 unit tests for the new behaviour and updates 3 existing call
sites of _build_batch_writes for the new namespace= parameter. Adds a
sample hook module under examples/online_store/aerospike_overrides_and_hooks/
and corresponding sections in docs/reference/online-stores/aerospike.md.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Adding aerospike to the feast-operator enum shifted the allowlisted
SecretRef entry in api/v1/featurestore_types.go by one line.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
@vkagamlyk vkagamlyk force-pushed the feat/aerospike-online-store branch from 8e819d7 to 2c37c0f Compare June 16, 2026 22:43
@vkagamlyk vkagamlyk marked this pull request as ready for review June 16, 2026 22:45
@vkagamlyk vkagamlyk requested a review from a team as a code owner June 16, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant