feat(cassandra): Add multi-DC support via per-datacenter execution profiles#6434
Conversation
…ofiles Introduce a `datacenters` config list as an alternative to the flat `hosts` field. Each entry specifies the contact points, local_dc, and optional replication metadata for one datacenter. On connect, a named Cassandra execution profile is registered per DC (profile key = local_dc), enabling callers to route driver-level queries to a specific datacenter. The default profile is pinned to `load_balancing.local_dc`, or the first DC when load_balancing is absent. All existing `hosts` / `secure_bundle_path` configs are fully backward compatible — the original code path is untouched. Also updates docs/reference/online-stores/cassandra.md with a multi-DC feature_store.yaml example. Signed-off-by: singhhimanshu0811 email : itssinghhimanshu@gmail.com Signed-off-by: Himanshu Singh <himanshu.singh@walmart.com>
1ad4040 to
20539cf
Compare
ntkathole
left a comment
There was a problem hiding this comment.
Review of PR #6434 — Multi-DC Cassandra Support
Thanks for the contribution, @singhhimanshu0811! This directly addresses the gap we discussed around single-DC limitations in the Cassandra online store config. Overall the approach is solid — using per-DC execution profiles is the right driver-level mechanism. I have several items ranging from bugs to design suggestions:
1. Bug: Mutable class-level default for _dc_execution_profiles
_dc_execution_profiles: List[str] = []This is a mutable class attribute shared across all instances. If multiple CassandraOnlineStore instances are created (e.g., in tests or multi-project setups), they'll all append to the same list. This same pattern issue exists for _prepared_statements in the original code, but introducing a new one should be avoided.
Fix: Initialize in _get_session or use a property pattern:
_dc_execution_profiles: List[str]
def _get_session(self, config: RepoConfig):
...
if online_store_config.datacenters:
self._dc_execution_profiles = [] # reset per connection
...2. Bug: Missing mutual exclusivity validation
The docstring says:
Mutually exclusive with
hostsandsecure_bundle_path.
But there's no actual validation enforcing this. If a user sets both datacenters and hosts, Branch B runs (because it checks datacenters first) and silently ignores hosts. This will confuse users.
Fix: Add a Pydantic model_validator or an explicit check at the top of _get_session:
if online_store_config.datacenters and (
online_store_config.hosts or online_store_config.secure_bundle_path
):
raise CassandraInvalidConfig(
"Cassandra config: 'datacenters' is mutually exclusive with "
"'hosts' and 'secure_bundle_path'"
)3. Design: Profiles are created but never used by Feast operations
The PR creates named execution profiles and exposes them via _dc_execution_profiles, but none of the existing Feast operations (online_read, online_write_batch, update) pass execution_profile= to the driver. All queries will use EXEC_PROFILE_DEFAULT — effectively still single-DC behavior.
This means:
- The profiles are registered but dead code from Feast's perspective
- Users can't actually route reads/writes to a specific DC without modifying Feast internals
I understand this may be intended as a foundational PR with follow-up work to add target_dc parameters to the Feast API. If so, please mention this explicitly in the PR description and/or open a tracking issue for the follow-up. Otherwise, consider adding at minimum an optional execution_profile pass-through in _read_rows_by_entity_keys and _write_rows_concurrently.
4. Design: replication_factor and replication_strategy are purely informational
These fields are documented as "informational" and Feast doesn't use them. This raises questions:
- If they're never consumed by code, why are they in the config? They'll confuse users who expect Feast to act on them.
- If the intent is future keyspace creation support, consider deferring these fields to that PR to keep this one focused.
At minimum, add a log warning if these are set:
if dc.replication_factor or dc.replication_strategy:
logger.info(
"Cassandra multi-DC: replication_factor/replication_strategy are "
"informational only; keyspace must be pre-created."
)5. Suggestion: Duplicate local_dc naming is confusing
In the config structure, local_dc appears in two places with different semantics:
datacenters[*].local_dc— the datacenter name (identifier)load_balancing.local_dc— which DC is the default
Consider renaming the per-DC field to just name or dc_name for clarity:
datacenters:
- name: dc1 # clearer than local_dc
hosts: [...]
- name: dc2
hosts: [...]
load_balancing:
local_dc: dc1 # this selects the defaultThe field is called local_dc because that's what the driver's DCAwareRoundRobinPolicy parameter is named, but at the config level it reads as "this datacenter's local_dc" which is semantically odd.
6. Suggestion: Empty datacenters list edge case
What happens if datacenters: [] (empty list)? The truthiness check if online_store_config.datacenters: will be False, so it falls through to Branch A. But Branch A will then fail with E_CASSANDRA_NOT_CONFIGURED because hosts is also None. The error message will be misleading.
Fix: Validate that datacenters contains at least one entry if set:
if online_store_config.datacenters is not None and len(online_store_config.datacenters) == 0:
raise CassandraInvalidConfig("Cassandra 'datacenters' list must not be empty")7. Tests needed
There are no unit tests for the new code path. Please add tests covering:
- Config parsing with
datacentersfield - Mutual exclusivity validation (once added)
- Default DC selection (from
load_balancing.local_dcvs. first-in-list fallback) - Error case:
load_balancing.local_dcdoesn't match any DC entry - Integration with the existing
CassandraOnlineStoreCreatortest fixture (even if it's a single-DC test, it validates backward compat)
8. Minor: Redundant if not self._session: in Branch A
After Branch B was inserted, the existing code still has:
if not self._session:
# configuration consistency checks
...This if not self._session: is now always True when reached (because we already returned early at line 294 if self._session: return). It's harmless but misleading — the original code had it as the single branch, now it's dead logic. Consider removing the if wrapper for clarity (keeping the body).
Summary
| Item | Severity | Action needed |
|---|---|---|
| Mutable class-level list | Bug | Must fix |
| Missing mutual exclusivity check | Bug | Must fix |
| Profiles unused by Feast ops | Design gap | Document plan or implement |
| Informational-only fields | Design | Consider removing or add warning |
local_dc naming confusion |
Suggestion | Consider rename |
| Empty list edge case | Suggestion | Add validation |
| No tests | Required | Add unit tests |
Redundant if not self._session |
Minor | Cleanup |
The core idea is good and aligns with a real production need. Looking forward to the next iteration!
|
Hey @ntkathole thanks for replying back. I'll revert back with updated pr in 1-2 days, meanwhile could you review these once |
|
@ntkathole for solving issue 3, basically incorporating profiles in read and write, i am thinking something like this online_store: basically along with datacenters, adding a routing header, so user can decide at start time, that in this feature store session, which dc they would like to read from or which dc they would like to write from. does this work? and of course there would be validation that routing header params are legal, i.e in datacenter header params |
|
##Which issue(s) this PR fixes
Fixes #6433
Summary
Adds a
datacentersconfig list toCassandraOnlineStoreConfigas an alternative to the flathostsfield, enabling proper multi-datacenter support.Each datacenter entry gets a named Cassandra execution profile (keyed by
local_dc), so pipeline code can route reads/writes to a specific DC by passingexecution_profile="<dc-name>"to the driver. The default profile is pinned toload_balancing.local_dc, or the first DC whenload_balancingis absent.Changes
CassandraDatacenterConfiginner model withlocal_dc,hosts,replication_factor(informational),replication_strategy(informational)datacenters: Optional[List[CassandraDatacenterConfig]]field onCassandraOnlineStoreConfig_get_session()that builds per-DC execution profiles and connects with all DC hosts merged; originalhosts/secure_bundle_pathpath (Branch A) is completely untouched_dc_execution_profiles: List[str]attribute exposes registered profile namesdocs/reference/online-stores/cassandra.mdwith a multi-DCfeature_store.yamlexampleMotivation
The flat
hosts+ singlelocal_dcmodel forces all hosts to belong to the same datacenter. Users with multi-DC clusters had to create separate feature store projects per DC. With this change:DCAwareRoundRobinPolicyprevents reads from spilling into unintended regionsBackward compatibility
Fully backward compatible —
datacentersis optional and the original code path is untouched.