Skip to content
Prev Previous commit
update doc
Signed-off-by: HaoXuAI <sduxuhao@gmail.com>
  • Loading branch information
HaoXuAI committed Jul 14, 2025
commit dde5028425d842c2b3a91b60700dfbecef6f8d85
99 changes: 98 additions & 1 deletion docs/getting-started/architecture/feature-transformation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,101 @@ when to use which transformation engine/communication pattern is extremely criti
the success of your implementation.

In general, we recommend transformation engines and network calls to be chosen by aligning it with what is most
appropriate for the data producer, feature/model usage, and overall product.
appropriate for the data producer, feature/model usage, and overall product.


## API
### feature_transformation
`feature_transformation` or `udf` are the core APIs for defining feature transformations in Feast. They allow you to specify custom logic that can be applied to the data during materialization or retrieval. Examples include:

```python
def remove_extra_spaces(df: DataFrame) -> DataFrame:
df['name'] = df['name'].str.replace('\s+', ' ')
return df

spark_transformation = SparkTransformation(
mode=TransformationMode.SPARK,
udf=remove_extra_spaces,
udf_string="remove extra spaces",
)
feature_view = FeatureView(
feature_transformation=spark_transformation,
...
)
```
OR
```python
spark_transformation = Transformation(
mode=TransformationMode.SPARK_SQL,
udf=remove_extra_spaces_sql,
udf_string="remove extra spaces sql",
)
feature_view = FeatureView(
feature_transformation=spark_transformation,
...
)
```
OR
```python
@transformation(mode=TransformationMode.SPARK)
def remove_extra_spaces_udf(df: pd.DataFrame) -> pd.DataFrame:
return df.assign(name=df['name'].str.replace('\s+', ' '))

feature_view = FeatureView(
feature_transformation=remove_extra_spaces_udf,
...
)
```

### Aggregation
Aggregation is builtin API for defining batch or streamable aggregations on data. It allows you to specify how to aggregate data over a time window, such as calculating the average or sum of a feature over a specified period. Examples include:
```python
from feast import Aggregation
feature_view = FeatureView(
aggregations=[
Aggregation(
column="amount",
function="sum"
)
Aggregation(
column="amount",
function="avg",
time_window="1h"
),
]
...
)
```

### Filter
ttl: They amount of time that the features will be available for materialization or retrieval. The entity rows' timestamp higher that the current time minus the ttl will be used to filter the features. This is useful for ensuring that only recent data is used in feature calculations. Examples include:

```python
feature_view = FeatureView(
ttl="1d", # Features will be available for 1 day
...
)
```

### Join
Feast can join multiple feature views together to create a composite feature view. This allows you to combine features from different sources or views into a single view. Examples include:
```python
feature_view = FeatureView(
name="composite_feature_view",
entities=["entity_id"],
source=[
FeatureView(
name="feature_view_1",
features=["feature_1", "feature_2"],
...
),
FeatureView(
name="feature_view_2",
features=["feature_3", "feature_4"],
...
)
]
...
)
```
The underlying implementation of the join is an inner join by default, and join key is the entity id.
2 changes: 1 addition & 1 deletion docs/getting-started/concepts/batch-feature-view.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
## ✅ Key Capabilities

- **Composable DAG of FeatureViews**: Supports defining a `BatchFeatureView` on top of one or more other `FeatureView`s.
- **Transformations**: Apply PySpark-based transformation logic (`feature_transformation` or `udf`) to raw data source, can also be used to deal with multiple data sources.
- **Transformations**: Apply [transformation](../../getting-started/architecture/feature-transformation.md) logic (`feature_transformation` or `udf`) to raw data source, can also be used to deal with multiple data sources.
- **Aggregations**: Define time-windowed aggregations (e.g. `sum`, `avg`) over event-timestamped data.
- **Feature resolution & execution**: Automatically resolves and executes DAGs of dependent views during materialization or retrieval. More details in the [Compute engine documentation](../../reference/compute-engine/README.md).
- **Materialization Sink Customization**: Specify a custom `sink_source` to define where derived feature data should be persisted.
Expand Down
Loading