Skip to content

Commit cdd1b07

Browse files
authored
feat: Custom Docker image for Bytewax batch materialization (#3099)
Dockerfile and instructions for building a custom Bytewax image. Signed-off-by: Dan Herrera <whoahbot@bytewax.io> Signed-off-by: Dan Herrera <whoahbot@bytewax.io>
1 parent 41be511 commit cdd1b07

File tree

5 files changed

+73
-3
lines changed

5 files changed

+73
-3
lines changed

docs/reference/batch-materialization/bytewax.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,20 @@ batch_engine:
5555

5656
The `namespace` configuration directive specifies which Kubernetes [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) jobs, services and configuration maps will be created in.
5757

58-
The `image` parameter specifies which container image to use when running the materialization job. To create a custom image based on this container, please see the [GitHub repository](https://github.com/bytewax/bytewax-feast) for this image.
58+
#### Building a custom Bytewax Docker image
59+
60+
The `image` configuration directive specifies which container image to use when running the materialization job. To create a custom image based on this container, run the following command:
61+
62+
``` shell
63+
DOCKER_BUILDKIT=1 docker build . -f ./sdk/python/feast/infra/materialization/contrib/bytewax/Dockerfile -t <image tag>
64+
```
65+
66+
Once that image is built and pushed to a registry, it can be specified as a part of the batch engine configuration:
67+
68+
``` shell
69+
batch_engine:
70+
type: bytewax
71+
namespace: bytewax
72+
image: <image tag>
73+
```
5974

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
FROM python:3.9-slim-bullseye AS build
2+
3+
RUN apt-get update && \
4+
apt-get install --no-install-suggests --no-install-recommends --yes git
5+
6+
WORKDIR /bytewax
7+
8+
# Copy dataflow code
9+
COPY sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_dataflow.py /bytewax
10+
COPY sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py /bytewax
11+
12+
# Copy entrypoint
13+
COPY sdk/python/feast/infra/materialization/contrib/bytewax/entrypoint.sh /bytewax
14+
15+
# Copy necessary parts of the Feast codebase
16+
COPY sdk/python sdk/python
17+
COPY protos protos
18+
COPY go go
19+
COPY setup.py setup.py
20+
COPY pyproject.toml pyproject.toml
21+
COPY README.md README.md
22+
23+
# Install Feast for AWS with Bytewax dependencies
24+
# We need this mount thingy because setuptools_scm needs access to the
25+
# git dir to infer the version of feast we're installing.
26+
# https://github.com/pypa/setuptools_scm#usage-from-docker
27+
# I think it also assumes that this dockerfile is being built from the root of the directory.
28+
RUN --mount=source=.git,target=.git,type=bind pip3 install --no-cache-dir -e '.[aws,gcp,bytewax]'
29+

sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
)
2121
from feast.infra.offline_stores.offline_store import OfflineStore
2222
from feast.infra.online_stores.online_store import OnlineStore
23-
from feast.registry import BaseRegistry
23+
from feast.infra.registry.base_registry import BaseRegistry
2424
from feast.repo_config import FeastConfigBaseModel
2525
from feast.stream_feature_view import StreamFeatureView
2626
from feast.utils import _get_column_names
@@ -341,7 +341,7 @@ def _create_job_definition(self, job_id, namespace, pods, env):
341341
{
342342
"command": ["sh", "-c", "sh ./entrypoint.sh"],
343343
"env": job_env,
344-
"image": "bytewax/bytewax-feast:latest",
344+
"image": self.batch_engine_config.image,
345345
"imagePullPolicy": "Always",
346346
"name": "process",
347347
"ports": [
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import yaml
2+
3+
from feast import FeatureStore, RepoConfig
4+
from feast.infra.materialization.contrib.bytewax.bytewax_materialization_dataflow import (
5+
BytewaxMaterializationDataflow,
6+
)
7+
8+
if __name__ == "__main__":
9+
with open("/var/feast/feature_store.yaml") as f:
10+
feast_config = yaml.safe_load(f)
11+
12+
with open("/var/feast/bytewax_materialization_config.yaml") as b:
13+
bytewax_config = yaml.safe_load(b)
14+
15+
config = RepoConfig(**feast_config)
16+
store = FeatureStore(config=config)
17+
18+
job = BytewaxMaterializationDataflow(
19+
config,
20+
store.get_feature_view(bytewax_config["feature_view"]),
21+
bytewax_config["paths"],
22+
)
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/bin/sh
2+
3+
cd /bytewax
4+
python dataflow.py

0 commit comments

Comments
 (0)