Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -934,7 +934,7 @@
"filename": "infra/feast-operator/api/v1/featurestore_types.go",
"hashed_secret": "44e17306b837162269a410204daaa5ecee4ec22c",
"is_verified": false,
"line_number": 695
"line_number": 725
}
],
"infra/feast-operator/api/v1/zz_generated.deepcopy.go": [
Expand All @@ -943,21 +943,21 @@
"filename": "infra/feast-operator/api/v1/zz_generated.deepcopy.go",
"hashed_secret": "f914fc9324de1bec1ad13dec94a8ea2ddb41fc87",
"is_verified": false,
"line_number": 663
"line_number": 681
},
{
"type": "Secret Keyword",
"filename": "infra/feast-operator/api/v1/zz_generated.deepcopy.go",
"hashed_secret": "44e17306b837162269a410204daaa5ecee4ec22c",
"is_verified": false,
"line_number": 1206
"line_number": 1249
},
{
"type": "Secret Keyword",
"filename": "infra/feast-operator/api/v1/zz_generated.deepcopy.go",
"hashed_secret": "c2028031c154bbe86fd69bef740855c74b927dcf",
"is_verified": false,
"line_number": 1211
"line_number": 1254
}
],
"infra/feast-operator/api/v1alpha1/featurestore_types.go": [
Expand Down Expand Up @@ -1156,7 +1156,7 @@
"filename": "infra/feast-operator/internal/controller/services/services.go",
"hashed_secret": "36dc326eb15c7bdd8d91a6b87905bcea20b637d1",
"is_verified": false,
"line_number": 173
"line_number": 176
}
],
"infra/feast-operator/internal/controller/services/tls_test.go": [
Expand Down
6 changes: 4 additions & 2 deletions docs/how-to-guides/feast-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,11 @@ spec:
> _More advanced FeatureStore CR examples can be found in the feast-operator [samples directory](../../infra/feast-operator/config/samples)._

{% hint style="success" %}
**Scaling:** The Feast Operator supports horizontal scaling via static replicas, HPA autoscaling, or external autoscalers like [KEDA](https://keda.sh). Scaling requires DB-backed persistence for all enabled services.
**Scaling & High Availability:** The Feast Operator supports horizontal scaling via static replicas, HPA autoscaling, or external autoscalers like [KEDA](https://keda.sh). Scaling requires DB-backed persistence for all enabled services.

See the [Horizontal Scaling with the Feast Operator](./scaling-feast.md#horizontal-scaling-with-the-feast-operator) guide for configuration details, or check the general recommendations on [how to scale Feast](./scaling-feast.md).
When scaling is enabled, the operator auto-injects soft pod anti-affinity and zone topology spread constraints for resilience. You can also configure a PodDisruptionBudget to protect against voluntary disruptions.

See the [Horizontal Scaling with the Feast Operator](./scaling-feast.md#horizontal-scaling-with-the-feast-operator) guide for configuration details, including [HA options](./scaling-feast.md#high-availability), or check the general recommendations on [how to scale Feast](./scaling-feast.md).
{% endhint %}

> _Sample scaling CRs are available at [`v1_featurestore_scaling_static.yaml`](../../infra/feast-operator/config/samples/v1_featurestore_scaling_static.yaml) and [`v1_featurestore_scaling_hpa.yaml`](../../infra/feast-operator/config/samples/v1_featurestore_scaling_hpa.yaml)._
70 changes: 69 additions & 1 deletion docs/how-to-guides/scaling-feast.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ spec:
target:
type: Utilization
averageUtilization: 70
podDisruptionBudgets:
maxUnavailable: 1
onlineStore:
persistence:
store:
Expand All @@ -107,7 +109,7 @@ spec:
```

{% hint style="info" %}
When autoscaling is configured, the operator automatically sets the deployment strategy to `RollingUpdate` (instead of the default `Recreate`) to ensure zero-downtime scaling. You can override this by explicitly setting `deploymentStrategy` in the CR.
When autoscaling is configured, the operator automatically sets the deployment strategy to `RollingUpdate` (instead of the default `Recreate`) to ensure zero-downtime scaling, and auto-injects soft pod anti-affinity and zone topology spread constraints. You can override any of these by explicitly setting `deploymentStrategy`, `affinity`, or `topologySpreadConstraints` in the CR.
{% endhint %}

#### Validation Rules
Expand All @@ -117,6 +119,72 @@ The operator enforces the following rules:
- Scaling with `replicas > 1` or any `autoscaling` config is **rejected** if any enabled service uses file-based persistence.
- S3 (`s3://`) and GCS (`gs://`) backed registry file persistence is allowed with scaling, since these object stores support concurrent readers.

#### High Availability

When scaling is enabled (`replicas > 1` or `autoscaling`), the operator provides HA features to improve resilience:

**Pod Anti-Affinity** — The operator automatically injects a soft (`preferredDuringSchedulingIgnoredDuringExecution`) pod anti-affinity rule that prefers spreading pods across different nodes. This prevents multiple replicas from being co-located on the same node, improving resilience to node failures. You can override this by providing your own `affinity` configuration:

```yaml
spec:
replicas: 3
services:
# Override with custom affinity (e.g. strict anti-affinity)
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
feast.dev/name: my-feast
# ...
```

**Topology Spread Constraints** — The operator automatically injects a soft zone-spread constraint (`whenUnsatisfiable: ScheduleAnyway`) that distributes pods across availability zones. This is a best-effort spread — if zones are unavailable, pods will still be scheduled. You can override this with explicit constraints or disable it with an empty array:

```yaml
spec:
replicas: 3
services:
# Override with custom topology spread (e.g. strict zone spreading)
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
feast.dev/name: my-feast
# ...
```

To disable the auto-injected topology spread:

```yaml
spec:
replicas: 3
services:
topologySpreadConstraints: []
# ...
```

**PodDisruptionBudget** — You can configure a PDB to limit voluntary disruptions (e.g. during node drains or cluster upgrades). The PDB is only created when scaling is enabled. Exactly one of `minAvailable` or `maxUnavailable` must be set:

```yaml
spec:
replicas: 3
services:
podDisruptionBudgets:
maxUnavailable: 1 # at most 1 pod unavailable during disruptions
# -- OR --
# podDisruptionBudgets:
# minAvailable: "50%" # at least 50% of pods must remain available
# ...
```

{% hint style="info" %}
The PDB is not auto-injected — you must explicitly configure it. This is intentional because a misconfigured PDB (e.g. `minAvailable` equal to the replica count) can block node drains and cluster upgrades.
{% endhint %}

#### Using KEDA (Kubernetes Event-Driven Autoscaling)

[KEDA](https://keda.sh) is also supported as an external autoscaler. KEDA should target the FeatureStore's scale sub-resource directly (since it implements the Kubernetes scale API). This is the recommended approach because the operator manages the Deployment's replica count from `spec.replicas` — targeting the Deployment directly would conflict with the operator's reconciliation.
Expand Down
30 changes: 30 additions & 0 deletions infra/feast-operator/api/v1/featurestore_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
batchv1 "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
)

const (
Expand Down Expand Up @@ -314,6 +315,21 @@ type FeatureStoreServices struct {
// Scaling configures horizontal scaling for the FeatureStore deployment (e.g. HPA autoscaling).
// For static replicas, use spec.replicas instead.
Scaling *ScalingConfig `json:"scaling,omitempty"`
// PodDisruptionBudgets configures a PodDisruptionBudget for the FeatureStore deployment.
// Only created when scaling is enabled (replicas > 1 or autoscaling).
// +optional
PodDisruptionBudgets *PDBConfig `json:"podDisruptionBudgets,omitempty"`
// TopologySpreadConstraints defines how pods are spread across topology domains.
// When scaling is enabled and this is not set, the operator auto-injects a soft
// zone-spread constraint (whenUnsatisfiable: ScheduleAnyway).
// Set to an empty array to disable auto-injection.
// +optional
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
// Affinity defines the pod scheduling constraints for the FeatureStore deployment.
// When scaling is enabled and this is not set, the operator auto-injects a soft
// pod anti-affinity rule to prefer spreading pods across nodes.
// +optional
Affinity *corev1.Affinity `json:"affinity,omitempty"`
}

// ScalingConfig configures horizontal scaling for the FeatureStore deployment.
Expand Down Expand Up @@ -342,6 +358,20 @@ type AutoscalingConfig struct {
Behavior *autoscalingv2.HorizontalPodAutoscalerBehavior `json:"behavior,omitempty"`
}

// PDBConfig configures a PodDisruptionBudget for the FeatureStore deployment.
// Exactly one of minAvailable or maxUnavailable must be set.
// +kubebuilder:validation:XValidation:rule="[has(self.minAvailable), has(self.maxUnavailable)].exists_one(c, c)",message="Exactly one of minAvailable or maxUnavailable must be set."
type PDBConfig struct {
// MinAvailable specifies the minimum number/percentage of pods that must remain available.
// Mutually exclusive with maxUnavailable.
// +optional
MinAvailable *intstr.IntOrString `json:"minAvailable,omitempty"`
// MaxUnavailable specifies the maximum number/percentage of pods that can be unavailable.
// Mutually exclusive with minAvailable.
// +optional
MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"`
}

// OfflineStore configures the offline store service
type OfflineStore struct {
// Creates a remote offline server container
Expand Down
43 changes: 43 additions & 0 deletions infra/feast-operator/api/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading