Skip to content

feat(cassandra): Add multi-DC support via per-datacenter execution profiles#6434

Open
singhhimanshu0811 wants to merge 1 commit into
feast-dev:masterfrom
singhhimanshu0811:feat/cassandra-multi-dc-execution-profiles
Open

feat(cassandra): Add multi-DC support via per-datacenter execution profiles#6434
singhhimanshu0811 wants to merge 1 commit into
feast-dev:masterfrom
singhhimanshu0811:feat/cassandra-multi-dc-execution-profiles

Conversation

@singhhimanshu0811
Copy link
Copy Markdown

@singhhimanshu0811 singhhimanshu0811 commented May 24, 2026

##Which issue(s) this PR fixes
Fixes #6433

Summary

Adds a datacenters config list to CassandraOnlineStoreConfig as an alternative to the flat hosts field, enabling proper multi-datacenter support.

Each datacenter entry gets a named Cassandra execution profile (keyed by local_dc), so pipeline code can route reads/writes to a specific DC by passing execution_profile="<dc-name>" to the driver. The default profile is pinned to load_balancing.local_dc, or the first DC when load_balancing is absent.

Changes

  • New CassandraDatacenterConfig inner model with local_dc, hosts, replication_factor (informational), replication_strategy (informational)
  • New datacenters: Optional[List[CassandraDatacenterConfig]] field on CassandraOnlineStoreConfig
  • New Branch B in _get_session() that builds per-DC execution profiles and connects with all DC hosts merged; original hosts/secure_bundle_path path (Branch A) is completely untouched
  • _dc_execution_profiles: List[str] attribute exposes registered profile names
  • Updated docs/reference/online-stores/cassandra.md with a multi-DC feature_store.yaml example

Motivation

The flat hosts + single local_dc model forces all hosts to belong to the same datacenter. Users with multi-DC clusters had to create separate feature store projects per DC. With this change:

  • A single feature store config can span multiple DCs
  • The Cassandra driver routes to the correct DC via execution profiles
  • Per-DC DCAwareRoundRobinPolicy prevents reads from spilling into unintended regions
  • Advanced pipelines can explicitly target a DC (e.g. read from the region where data has already been replicated)

Backward compatibility

Fully backward compatible — datacenters is optional and the original code path is untouched.

@singhhimanshu0811 singhhimanshu0811 requested a review from a team as a code owner May 24, 2026 18:20
…ofiles

Introduce a `datacenters` config list as an alternative to the flat
`hosts` field. Each entry specifies the contact points, local_dc, and
optional replication metadata for one datacenter.

On connect, a named Cassandra execution profile is registered per DC
(profile key = local_dc), enabling callers to route driver-level queries
to a specific datacenter. The default profile is pinned to
`load_balancing.local_dc`, or the first DC when load_balancing is absent.

All existing `hosts` / `secure_bundle_path` configs are fully backward
compatible — the original code path is untouched.

Also updates docs/reference/online-stores/cassandra.md with a multi-DC
feature_store.yaml example.

Signed-off-by: singhhimanshu0811
email : itssinghhimanshu@gmail.com
Signed-off-by: Himanshu Singh <himanshu.singh@walmart.com>
@singhhimanshu0811 singhhimanshu0811 force-pushed the feat/cassandra-multi-dc-execution-profiles branch from 1ad4040 to 20539cf Compare May 24, 2026 18:24
@ntkathole ntkathole changed the title feat(cassandra): add multi-DC support via per-datacenter execution profiles feat(cassandra): Add multi-DC support via per-datacenter execution profiles May 25, 2026
Copy link
Copy Markdown
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #6434 — Multi-DC Cassandra Support

Thanks for the contribution, @singhhimanshu0811! This directly addresses the gap we discussed around single-DC limitations in the Cassandra online store config. Overall the approach is solid — using per-DC execution profiles is the right driver-level mechanism. I have several items ranging from bugs to design suggestions:


1. Bug: Mutable class-level default for _dc_execution_profiles

_dc_execution_profiles: List[str] = []

This is a mutable class attribute shared across all instances. If multiple CassandraOnlineStore instances are created (e.g., in tests or multi-project setups), they'll all append to the same list. This same pattern issue exists for _prepared_statements in the original code, but introducing a new one should be avoided.

Fix: Initialize in _get_session or use a property pattern:

_dc_execution_profiles: List[str]

def _get_session(self, config: RepoConfig):
    ...
    if online_store_config.datacenters:
        self._dc_execution_profiles = []  # reset per connection
        ...

2. Bug: Missing mutual exclusivity validation

The docstring says:

Mutually exclusive with hosts and secure_bundle_path.

But there's no actual validation enforcing this. If a user sets both datacenters and hosts, Branch B runs (because it checks datacenters first) and silently ignores hosts. This will confuse users.

Fix: Add a Pydantic model_validator or an explicit check at the top of _get_session:

if online_store_config.datacenters and (
    online_store_config.hosts or online_store_config.secure_bundle_path
):
    raise CassandraInvalidConfig(
        "Cassandra config: 'datacenters' is mutually exclusive with "
        "'hosts' and 'secure_bundle_path'"
    )

3. Design: Profiles are created but never used by Feast operations

The PR creates named execution profiles and exposes them via _dc_execution_profiles, but none of the existing Feast operations (online_read, online_write_batch, update) pass execution_profile= to the driver. All queries will use EXEC_PROFILE_DEFAULT — effectively still single-DC behavior.

This means:

  • The profiles are registered but dead code from Feast's perspective
  • Users can't actually route reads/writes to a specific DC without modifying Feast internals

I understand this may be intended as a foundational PR with follow-up work to add target_dc parameters to the Feast API. If so, please mention this explicitly in the PR description and/or open a tracking issue for the follow-up. Otherwise, consider adding at minimum an optional execution_profile pass-through in _read_rows_by_entity_keys and _write_rows_concurrently.


4. Design: replication_factor and replication_strategy are purely informational

These fields are documented as "informational" and Feast doesn't use them. This raises questions:

  • If they're never consumed by code, why are they in the config? They'll confuse users who expect Feast to act on them.
  • If the intent is future keyspace creation support, consider deferring these fields to that PR to keep this one focused.

At minimum, add a log warning if these are set:

if dc.replication_factor or dc.replication_strategy:
    logger.info(
        "Cassandra multi-DC: replication_factor/replication_strategy are "
        "informational only; keyspace must be pre-created."
    )

5. Suggestion: Duplicate local_dc naming is confusing

In the config structure, local_dc appears in two places with different semantics:

  • datacenters[*].local_dc — the datacenter name (identifier)
  • load_balancing.local_dc — which DC is the default

Consider renaming the per-DC field to just name or dc_name for clarity:

datacenters:
  - name: dc1        # clearer than local_dc
    hosts: [...]
  - name: dc2
    hosts: [...]
load_balancing:
    local_dc: dc1    # this selects the default

The field is called local_dc because that's what the driver's DCAwareRoundRobinPolicy parameter is named, but at the config level it reads as "this datacenter's local_dc" which is semantically odd.


6. Suggestion: Empty datacenters list edge case

What happens if datacenters: [] (empty list)? The truthiness check if online_store_config.datacenters: will be False, so it falls through to Branch A. But Branch A will then fail with E_CASSANDRA_NOT_CONFIGURED because hosts is also None. The error message will be misleading.

Fix: Validate that datacenters contains at least one entry if set:

if online_store_config.datacenters is not None and len(online_store_config.datacenters) == 0:
    raise CassandraInvalidConfig("Cassandra 'datacenters' list must not be empty")

7. Tests needed

There are no unit tests for the new code path. Please add tests covering:

  • Config parsing with datacenters field
  • Mutual exclusivity validation (once added)
  • Default DC selection (from load_balancing.local_dc vs. first-in-list fallback)
  • Error case: load_balancing.local_dc doesn't match any DC entry
  • Integration with the existing CassandraOnlineStoreCreator test fixture (even if it's a single-DC test, it validates backward compat)

8. Minor: Redundant if not self._session: in Branch A

After Branch B was inserted, the existing code still has:

if not self._session:
    # configuration consistency checks
    ...

This if not self._session: is now always True when reached (because we already returned early at line 294 if self._session: return). It's harmless but misleading — the original code had it as the single branch, now it's dead logic. Consider removing the if wrapper for clarity (keeping the body).


Summary

Item Severity Action needed
Mutable class-level list Bug Must fix
Missing mutual exclusivity check Bug Must fix
Profiles unused by Feast ops Design gap Document plan or implement
Informational-only fields Design Consider removing or add warning
local_dc naming confusion Suggestion Consider rename
Empty list edge case Suggestion Add validation
No tests Required Add unit tests
Redundant if not self._session Minor Cleanup

The core idea is good and aligns with a real production need. Looking forward to the next iteration!

@singhhimanshu0811
Copy link
Copy Markdown
Author

Hey @ntkathole thanks for replying back. I'll revert back with updated pr in 1-2 days, meanwhile could you review these once
for point 1 , I was wondering if we could solve the prepared statement with this pr only ,
for 3, i was planning to do it later, but now that I have some time, I can resolve in this pr only. will do this.
for 4, i ve had few questions, why is there no provision to create keyspace in the code? perhaps we can have a _create_keyspace function?
The rest, are great points, will resolve them in new push.

@singhhimanshu0811
Copy link
Copy Markdown
Author

singhhimanshu0811 commented May 25, 2026

@ntkathole for solving issue 3, basically incorporating profiles in read and write, i am thinking something like this

online_store:
type: cassandra
keyspace: feast_keyspace
datacenters:
- name: dc1
hosts: [192.168.1.1, 192.168.1.2]
- name: dc2
hosts: [10.0.0.1]
routing:
read_dc: dc2 # reads go to dc2
write_dc: dc1 # writes go to dc1

basically along with datacenters, adding a routing header, so user can decide at start time, that in this feature store session, which dc they would like to read from or which dc they would like to write from. does this work?

and of course there would be validation that routing header params are legal, i.e in datacenter header params

@ntkathole
Copy link
Copy Markdown
Member

@singhhimanshu0811

  • I would recommend making _create_keyspace a separate PR.
  • For routing config - yes, I think that covers common use cases, - Make both read_dc and write_dc fields optional with a default value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In Cassandra online store, for each feature store you can have hosts from multiple local datacenters

2 participants