|
1 | 1 | # Create a feature repository |
2 | 2 |
|
| 3 | +We believe that the best way to keep track of your feature definitions is to manage them as code. To define features, you simply describe your feature and data source declarations in pure Python. Then Feast CLI can read Python files with feature definitions, parse the definitions and help you create and manage the infrastructure required to serve these features in production. |
| 4 | + |
| 5 | +## What is a Feature Repository? |
| 6 | + |
| 7 | +Feature Repository is nothing more than a collection of Python files containing feature declarations, and a config file with some Feast settings. Typically, Feast users store those files in a git repository, hence the name. Note, however, that Feast makes no hard assumptions about your source control repository structure and doesn't even require you to use git. |
| 8 | + |
| 9 | +## Creating a Feature Repository |
| 10 | + |
| 11 | +The easiest way to get started is to use `feast init` command: |
| 12 | + |
| 13 | +```bash |
| 14 | +$ mkdir my_feature_repo && cd my_feature_repo |
| 15 | +$ feast init |
| 16 | +Generated feature_store.yaml and example features in example_repo.py |
| 17 | +Now try runing `feast apply` to apply, or `feast materialize` to sync data to the online store |
| 18 | +``` |
| 19 | + |
| 20 | +You can see that all this does is create a python file with feature definitions, some sample data, and a Feast configuration for local development: |
| 21 | + |
| 22 | +```bash |
| 23 | +$ tree |
| 24 | +. |
| 25 | +├── data |
| 26 | +│ └── driver_stats.parquet |
| 27 | +├── example.py |
| 28 | +└── feature_store.yaml |
| 29 | + |
| 30 | +1 directory, 3 files |
| 31 | +``` |
| 32 | + |
| 33 | +## What's Inside a Feature Repository |
| 34 | + |
| 35 | +Feast configuration is stored in a file named `feature_store.yaml`. There are no restrictions on how Python feature definition files can be named, as long as they're valid Python module names \(so no dashes\). There could be multiple files as well. |
| 36 | + |
| 37 | +If you take a look at `feature_store.yaml` you'll see something like this: |
| 38 | + |
| 39 | +{% code title="feature\_store.yaml" %} |
| 40 | +```yaml |
| 41 | +project: robust_tortoise |
| 42 | +metadata_store: data/metadata.db |
| 43 | +provider: local |
| 44 | +online_store: |
| 45 | + local: |
| 46 | + path: data/online_store.db |
| 47 | +``` |
| 48 | +{% endcode %} |
| 49 | +
|
| 50 | +Here `project` is a unique identifier for the Feature Repository generated by `feast init`. You can also notice that this configuration file uses a "local" provider that is most useful for development, as all data is stored and served locally on your computer. Because we're using a Local provider, both metadata store and online feature store are just files on your local file system. |
| 51 | + |
| 52 | +Now, if you open `example.py` you'll see some example Feature Views and Data Source definitions. The file is too large to quote here but you should see something like this when you open it: |
| 53 | + |
| 54 | +```python |
| 55 | +from feast import Entity, Feature, FeatureView, ValueType |
| 56 | +from feast.data_source import FileSource |
| 57 | +
|
| 58 | +... |
| 59 | +
|
| 60 | +driver_hourly_stats = FileSource( |
| 61 | + ... |
| 62 | +) |
| 63 | +
|
| 64 | +driver = Entity(...) |
| 65 | +
|
| 66 | +driver_hourly_stats_view = FeatureView( |
| 67 | + name="driver_hourly_stats", |
| 68 | + entities=["driver_id"], |
| 69 | + ... |
| 70 | +) |
| 71 | +``` |
| 72 | + |
| 73 | +The way to declare Feature Views and other objects in Feast Feature Repository is to simply write Python code to instantiate the objects, set the parameters and make sure to assign them to a top-level module variable. |
| 74 | + |
| 75 | +Feast CLI will process all Python files from the Feature Repository as modules and find all top-level variables. You don't need to name Python files or variables in a certain way; just make sure there is a separate variable for each Feast object. |
| 76 | + |
| 77 | + |
| 78 | + |
0 commit comments