Skip to content

Commit 290ceb3

Browse files
authored
Add a Sandbox for Feathr (#966)
* Update registry-access-control.md * Update README.md * add logo * Update README.md * Add docs for how to create bacpac file * update dockerfile * update * Update local_quickstart_nyc_taxi_demo.ipynb * Update FeathrSandbox.Dockerfile * add SQLIte connection * Update local_quickstart_nyc_taxi_demo.ipynb * update local registry * update registry * update * add dockerfile * Change to ORM * Update db_registry.py * update registry * delete unused files * don't change the existing registry code * update * Update main.py * update configs * make jupyter runnable * add readme * Update start.sh * Revert "Add docs for how to create bacpac file" This reverts commit 2837926. * delete unused files * Update local_quickstart_nyc_taxi_demo.ipynb * Update local_quickstart_nyc_taxi_demo.ipynb * Fix redis issues * Update client.py * Update _env_config_reader.py * add docs * Update quickstart_local_sandbox.md * Update quickstart_local_sandbox.md * Update quickstart_local_sandbox.md * Update quickstart_local_sandbox.md * merge ORM based sql registry to sql registry * fix typo * improve usability * Update FeathrSandbox.Dockerfile * Update FeathrSandbox.Dockerfile * Update start_local.sh * Update FeathrSandbox.Dockerfile * update instructions * Add code server * Remove unused dockerfile * disable code server * update samples * Update feathr_init_script.py * update notebook * Update FeathrSandbox.Dockerfile * Update local_quickstart_notebook.ipynb * Update _feathr_registry_client.py * Update setup.py * remove numpy * Update quickstart_local_sandbox.md * Update quickstart_local_sandbox.md * Add search function in sandbox * Update db_registry_orm.py * Update db_registry_orm.py * Update db_registry_orm.py * fix search issue * udpate * Update FeathrSandbox.Dockerfile * update * Update feathr_init_script.py * merge ORM based registry * Merge * Update main.py * Delete db_registry_orm.py * update dependencies * Update .prettierrc * update docs * Update database.py * Update database.py * Update database.py * Add CI docker push * Optimize image size * Update local_quickstart_notebook.ipynb * Update start_local.sh * update based on comments
1 parent ae752c5 commit 290ceb3

25 files changed

Lines changed: 2004 additions & 205 deletions

.github/workflows/docker-publish.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,34 @@ jobs:
4444
tags: ${{ steps.meta.outputs.tags }}
4545
labels: ${{ steps.meta.outputs.labels }}
4646

47+
48+
build_and_push_feathr_sandbox_image:
49+
name: Push Feathr Sandbox image to Docker Hub
50+
runs-on: ubuntu-latest
51+
steps:
52+
- name: Check out the repo
53+
uses: actions/checkout@v3
54+
55+
- name: Log in to Docker Hub
56+
uses: docker/login-action@v2
57+
with:
58+
username: ${{ secrets.DOCKER_USERNAME }}
59+
password: ${{ secrets.DOCKER_PASSWORD }}
60+
61+
- name: Extract metadata (tags, labels) for Docker
62+
id: meta
63+
uses: docker/metadata-action@v4
64+
with:
65+
images: feathrfeaturestore/feathr-sandbox
66+
67+
- name: Build and push Docker image
68+
uses: docker/build-push-action@v3
69+
with:
70+
context: .
71+
file: FeathrSandbox.Dockerfile
72+
push: true
73+
tags: ${{ steps.meta.outputs.tags }}
74+
labels: ${{ steps.meta.outputs.labels }}
4775
# Trigger Azure Web App webhooks to pull the latest nightly image
4876
deploy:
4977
runs-on: ubuntu-latest

FeathrSandbox.Dockerfile

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# TODO: persist the SQLite file in the volumes
2+
3+
# Stage 1: build frontend ui
4+
FROM node:16-alpine as ui-build
5+
WORKDIR /usr/src/ui
6+
COPY ./ui .
7+
8+
## Use api endpoint from same host and build production static bundle
9+
RUN echo 'REACT_APP_API_ENDPOINT=http://localhost:8000' >> .env.production
10+
RUN npm install && npm run build
11+
12+
13+
FROM jupyter/pyspark-notebook
14+
15+
USER root
16+
17+
## Install dependencies
18+
RUN apt-get update -y && apt-get install -y nginx freetds-dev sqlite3 libsqlite3-dev lsb-release redis gnupg redis-server lsof
19+
20+
# UI Sectioin
21+
## Remove default nginx index page and copy ui static bundle files
22+
RUN rm -rf /usr/share/nginx/html/*
23+
COPY --from=ui-build /usr/src/ui/build /usr/share/nginx/html
24+
COPY ./deploy/nginx.conf /etc/nginx/nginx.conf
25+
26+
27+
# Feathr Package Installation Section
28+
# always install feathr from main
29+
WORKDIR /home/jovyan/work
30+
COPY --chown=1000:100 ./feathr_project ./feathr_project
31+
RUN python -m pip install -e ./feathr_project
32+
33+
34+
# Registry Section
35+
# install registry
36+
COPY ./registry /usr/src/registry
37+
WORKDIR /usr/src/registry/sql-registry
38+
RUN pip install -r requirements.txt
39+
40+
41+
42+
## Start service and then start nginx
43+
WORKDIR /usr/src/registry
44+
COPY ./feathr-sandbox/start_local.sh /usr/src/registry/
45+
46+
# install code server
47+
# RUN curl -fsSL https://code-server.dev/install.sh | sh
48+
49+
# default dir by the jupyter image
50+
WORKDIR /home/jovyan/work
51+
USER jovyan
52+
# copy as the jovyan user
53+
# UID is like this: uid=1000(jovyan) gid=100(users) groups=100(users)
54+
COPY --chown=1000:100 ./docs/samples/local_quickstart_notebook.ipynb .
55+
COPY --chown=1000:100 ./feathr-sandbox/feathr_init_script.py .
56+
57+
# Run the script so that maven cache can be added for better experience. Otherwise users might have to wait for some time for the maven cache to be ready.
58+
RUN python feathr_init_script.py
59+
RUN python -m pip install interpret
60+
61+
USER root
62+
WORKDIR /usr/src/registry
63+
RUN ["chmod", "+x", "/usr/src/registry/start_local.sh"]
64+
65+
# remove ^M chars in Linux to make sure the script can run
66+
RUN sed -i "s/\r//g" /usr/src/registry/start_local.sh
67+
68+
69+
# install a Kafka single node instance
70+
# Reference: https://www.looklinux.com/how-to-install-apache-kafka-single-node-on-ubuntu/
71+
RUN wget https://downloads.apache.org/kafka/3.3.1/kafka_2.12-3.3.1.tgz && tar xzf kafka_2.12-3.3.1.tgz && mv kafka_2.12-3.3.1 /usr/local/kafka && rm kafka_2.12-3.3.1.tgz
72+
73+
# /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
74+
# /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
75+
76+
WORKDIR /home/jovyan/work
77+
78+
79+
# 80: Feathr UI
80+
# 8000: Feathr REST API
81+
# 8888: Jupyter
82+
# 8080: VsCode
83+
# 7080: Interpret
84+
EXPOSE 80 8000 8080 8888 7080 2181
85+
# run the service so we can initialize
86+
# RUN ["/bin/bash", "/usr/src/registry/start.sh"]
87+
CMD ["/bin/bash", "/usr/src/registry/start_local.sh"]

docker/Dockerfile

Lines changed: 0 additions & 35 deletions
This file was deleted.

docker/supervisord.conf

Lines changed: 0 additions & 39 deletions
This file was deleted.

docs/how-to-guides/feathr-configuration-and-env.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Feathr will get the configurations in the following order:
3030
2. If it's not set in the environment, then a value is retrieved from the feathr_config.yaml file with the same config key.
3131
3. If it's not available in the feathr_config.yaml file, Feathr will try to retrieve the value from a key vault service. Currently only Azure Key Vault is supported.
3232

33-
# A list of environment variables that Feathr uses
33+
# A list of environment variables that Feathr uses when running Spark job
3434

3535
| Environment Variable | Description | Required? |
3636
| ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
@@ -85,6 +85,12 @@ Feathr will get the configurations in the following order:
8585
| FEATURE_REGISTRY__PURVIEW__TYPE_SYSTEM_INITIALIZATION (Deprecated Soon) | Controls whether the type system (think this as the "schema" for the registry) will be initialized or not. Usually this is only required to be set to `True` to initialize schema, and then you can set it to `False` to shorten the initialization time. | Required if using Purview directly without registry service. Deprecate soon, see [here](#deprecation) for more details. |
8686
| MAVEN_ARTIFACT_VERSION | Version number like `0.9.0`. Used to define maven package version when main jar is not defined. | Optional |
8787

88+
# A list of environment variables that Feathr uses when running registry and service
89+
| Environment Variable | Description | Required? |
90+
| ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
91+
| FEATHR_SANDBOX | If it is set to any value, the registry server will be running in sandbox mode and will connect to a local database with SQLite. | Optional |
92+
| FEATHR_SANDBOX_REGISTRY_URL | If it's set, Feathr will be using a registry file pointed by the user. This is useful when users want to persist the SQLite file to a volume, so it won't lost if you restart docker constantly. | Optional |
93+
8894
# Explanation for selected configurations
8995

9096
## MAVEN_ARTIFACT_VERSION
92.2 KB
Loading
78 KB
Loading

docs/images/feathr-sandbox-ui.png

31.4 KB
Loading

docs/images/feathr-sandbox.png

149 KB
Loading

docs/quickstart_local_sandbox.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
layout: default
3+
title: Quick Start Guide with Local Sandbox
4+
---
5+
6+
# Feathr Quick Start Guide with Local Sandbox
7+
8+
We provide a local sandbox so users can use Feathr easily. The goal of the Feathr Sandbox is to:
9+
10+
- make it easier for users to get started,
11+
- make it easy to validate feature definitions and new ideas
12+
- make it easier for Feathr developers to setup environment and develop new things
13+
- Interactive experience, usually try to run a job takes less than 1 min.
14+
15+
As an end user, you can become productive in less than 5 mins and try out Feathr.
16+
17+
The Sandbox is ideal for:
18+
19+
- Feathr users who want to get started quickly
20+
- Feathr developers to test new features since this docker should everything they need. It comes with the python package as editable model so developers can iterate easily.
21+
22+
## Getting Started
23+
24+
To get started, simply run the command below. Note that the image is around 5GB so it might take a while to pull it from DockerHub.
25+
26+
```bash
27+
# 80: Feathr UI 8000: Feathr API 8888: Jupyter 8080: VsCode 7080: Interpret
28+
docker run -it --rm -p 8888:8888 -p 8000:8000 -p 80:80 -p 8080:8080 -p 7080:7080 --env CONNECTION_STR="Server=" --env API_BASE="api/v1" --env FEATHR_SANDBOX=True -e GRANT_SUDO=yes feathrfeaturestore/feathr-sandbox
29+
```
30+
31+
It should pop up a Jupyter link in `http://127.0.0.1:8888/`. Double click on the notebook file to start the Jupyter Notebook, and you should be able to see the Feathr sample notebook. Click the triangle button on the Jupyter notebook and the whole notebook will run locally.
32+
33+
The default jupyter notebook is here:
34+
```bash
35+
http://localhost:8888/lab/workspaces/auto-w/tree/local_quickstart_notebook.ipynb
36+
```
37+
38+
![Feathr Notebook](./images/feathr-sandbox.png)
39+
40+
41+
After running the Notebooks, all the features will be registered in the UI, and you can visit the Feathr UI at:
42+
43+
```bash
44+
http://localhost:80
45+
```
46+
47+
48+
After executing those scripts, you should be able to see a project called `local_spark` in the Feathr UI. You can also view lineage in the Feathr UI and explore all the details.
49+
![Feathr UI](./images/feathr-sandbox-ui.png)
50+
51+
![Feathr UI](./images/feathr-sandbox-lineage.png)
52+
53+
## Components
54+
55+
The Feathr sandbox comes with:
56+
- Built-in Jupyter Notebook
57+
- Pre-installed data science packages such as `interpret` so that data science development becomes easy
58+
- Pre-installed Feathr package
59+
- A local spark environment for dev/test purpose
60+
- Feathr samples that can run locally
61+
- A local Feathr registry backed by SQLite
62+
- Feathr UI
63+
- Feathr Registry API
64+
- Local Redis server
65+
66+
67+
## Build Docker Container
68+
69+
If you want to build the Feathr sandbox, run the below command in the Feathr root directory:
70+
71+
```bash
72+
docker build -f FeathrSandbox.Dockerfile -t feathrfeaturestore/feathr-sandbox .
73+
```
74+
75+
76+
## For Feathr Developers
77+
The Feathr package is copied to the user folder, and is installed with `pip install -e` option, which means you can do interactive development in the python package. For example you want to validate changes, instead of setting up the environment, you can simply go to the
78+
79+
80+
note that if you are using Jupyter notebook to run the code, make sure you restart jupyter notebook so the kernel can reload Feathr package.
81+
You should be able to see the
82+
83+
![Feathr Dev Experience](./images/feathr-sandbox-dev-experience.png)
84+
85+
In the future, an VSCode Server might be installed so that you can do interactive development in the docker container.

0 commit comments

Comments
 (0)