feathr-ai · xiaoyongzhu · Aug 2, 2022 · Jun 28, 2022 · Jul 15, 2022 · Jul 15, 2022
diff --git a/docs/concepts/feathr-concepts-for-beginners.md b/docs/concepts/feathr-concepts-for-beginners.md
@@ -126,9 +126,19 @@ client.get_online_features(feature_table = "agg_features",
 ## Illustration
 
 An illustration of the concepts and process that we talked about is like this:
-![Feature Join Process](../images/observation_data.jpg)
+![Observation Data and Feature Query Process](../images/observation_data.jpg)
 
-## Point in time joins and aggregations
+## FAQs on the Concepts
+
+### A bit more on `Observation Data`
+
+The "Observation Data" is a concept that is a bit confusing for some beginners, and simply think it as an immutable dataset, but this dataset could be enhanced by other dataset. For example, you usually cannot drop a column for your "observation data", but you can add additional columns to it.
+
+### What's the relationship between `Source` and `Anchor`?
+
+Usually an Anchor can only have one source, but one source can be consumed by different anchors. From `Source` to `Anchor`, there might be an intermediate step, which is the "preprocessing" function and allows you to customize the input a bit.
+
+### Point in time joins and aggregations - why we need them?
 
 Assuming users are already familiar with the "regular" joins, for example inner join or outer join, and in many of the use cases, we care about time.
 

diff --git a/docs/concepts/feature-join.md b/docs/concepts/feature-join.md
diff --git a/docs/concepts/get-offline-features.md b/docs/concepts/get-offline-features.md
@@ -0,0 +1,87 @@
+---
+layout: default
+title: Getting Offline Features using Feature Query
+parent: Feathr Concepts
+---
+
+# Getting Offline Features using Feature Query
+
+## Intuitions
+
+After the feature producers have defined the features (as described in the [Feature Definition](./feature-definition.md) part), the feature consumers may want to consume those features.
+
+For example, the dataset is like below, where there are 3 tables that feature producers want to extract features from: `user_profile_mock_data`, `user_purchase_history_mock_data`, and `product_detail_mock_data`.
+
+For feature consumers, they will usually use a central dataset ("observation data", `user_observation_mock_data` in this case) which contains a couple of IDs (`user_id` and `product_id` in this case), timestamps, and other columns. Feature consumers will use this "observation data" to query from different feature tables (using `Feature Query` below).
+
+![Feature Flow](https://github.com/linkedin/feathr/blob/main/docs/images/product_recommendation_advanced.jpg?raw=true)
+
+As we can see, the use case for getting offline features using Feathr is straightforward. Feature consumers want to get a few features - for a particular user, what's the gift card balance? What's the total purchase in the last 90 days; Feature consumers can also get a few features for other entities in the same `Feature Query`. For example, in the meanwhile, feature consumers can also query the product feature such as product quantity and price.
+
+In this case, Feathr users can simply specify the feature name that they want to query, and specify for which entity/key that they want to query on, like below. Note that for feature consumers, they don't have to query all the features; instead they can just query a subset of the features that the feature producers have defined.
+
+```python
+user_feature_query = FeatureQuery(
+    feature_list=["feature_user_age",
+                  "feature_user_tax_rate",
+                  "feature_user_gift_card_balance",
+                  "feature_user_has_valid_credit_card",
+                  "feature_user_total_purchase_in_90days",
+                  "feature_user_purchasing_power"
+                  ],
+    key=user_id)
+
+product_feature_query = FeatureQuery(
+    feature_list=[
+                  "feature_product_quantity",
+                  "feature_product_price"
+                  ],
+    key=product_id)
+```
+
+And specify the location for the observation data:
+
+```python
+settings = ObservationSettings(
+    observation_path="wasbs://public@azurefeathrstorage.blob.core.windows.net/sample_data/product_recommendation_sample/user_observation_mock_data.csv",
+    event_timestamp_column="event_timestamp",
+    timestamp_format="yyyy-MM-dd")
+```
+
+And finally, specify the feature query and finally trigger the computation:
+
+```python
+client.get_offline_features(observation_settings=settings,
+                            feature_query=[user_feature_query, product_feature_query],
+                            output_path=output_path)
+
+```
+
+More details for the above APIs can be read from:
+
+- [ObservationSettings API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.ObservationSettings)
+- [client.get_offline_feature API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.get_offline_features)
+
+## More on `Observation data`
+
+The path of a dataset as the 'spine' for the to-be-created training dataset. We call this input 'spine' dataset the 'observation' dataset. Typically, each row of the observation data contains:
+
+1. **Entity ID Column:** Column(s) representing entity id(s), which will be used as the join key to query feature value.
+
+2. **Timestamp Column:** A column representing the event time of the row. By default, Feathr will make sure the feature values queried have a timestamp earlier than the timestamp in observation data, ensuring no data leakage in the resulting training dataset. Refer to [Point in time Joins](./point-in-time-join.md) for more details.
+
+3. **Other columns** will be simply pass through to the output training dataset, which can be treated as immutable columns.
+
+## More on `Feature Query`
+
+After you have defined all the features, you probably don't want to use all of them in this particular program. In this case, instead of putting every features in this `FeatureQuery` part, you can just put a selected list of features. Note that they have to be of the same key.
+
+## Difference between `materialize_features` and `get_offline_features` API
+
+It is sometimes confusing between "getting offline features" in this document and the "[getting materialized features](./materializing-features.md)" part, given they both seem to "get features and put it somewhere". However there are some differences and you should know when to use which:
+
+1. For `get_offline_features` API, feature consumers usually need to have a central `observation data` so they can use `Feature Query` to query different features for different entities from different tables. For `materialize_features` API, feature consumers don't have the `observation data`, because they don't need to query from existing feature definitions. In this case, feature consumers only need to specify for a specific entity (say `user_id`), which features they want to materialize to offline or online store. Note that for a feature table in the materialization settings, feature consumers can only materialize features for the same key for the same table.
+
+2. For the timestamps, in `get_offline_features` API, Feathr will make sure the feature values queried have a timestamp earlier than the timestamp in observation data, ensuring no data leakage in the resulting training dataset. For `materialize_features` API, Feathr will always materialize the latest feature available in the dataset.
+
+3. Those two APIs are used in two different stage of feature engineering pipeline, and serves different purpose. For `get_offline_features`, it is usually to get data for model training and usually is focused on getting historical data from an offline storage; while for `materialize_features`, it is usually to pre-compute features for model inference via online store.
diff --git a/docs/concepts/feature-generation.md → docs/concepts/materializing-features.md b/docs/concepts/feature-generation.md → docs/concepts/materializing-features.md
@@ -1,16 +1,16 @@
 ---
 layout: default
-title: Feature Generation and Materialization
+title: Feature Materialization (also known as feature generation)
 parent: Feathr Concepts
 ---
 
-# Feature Generation and Materialization
+# Feature Materialization (also known as feature generation)
 
-Feature generation (also known as feature materialization) is the process to create features from raw source data into a certain persisted storage in either offline store (for further reuse), or online store (for online inference).
+Feature materialization (also known as feature generation) is the process to create features for a certain entity from raw source data into a certain persisted storage in either offline store (for further reuse), or online store (for online inference).
 
-User can utilize feature generation to pre-compute and materialize pre-defined features to online and/or offline storage. This is desirable when the feature transformation is computation intensive or when the features can be reused (usually in offline setting). Feature generation is also useful in generating embedding features, where those embeddings distill information from large data and is usually more compact.
+User can utilize feature generation to pre-compute and materialize pre-defined features to online and/or offline storage. This is desirable when the feature transformation is computation intensive or when the features can be reused (usually in offline setting). Feature generation is also useful in generating embedding features, where those embeddings distill information from large data and is usually more compact. Also, please note that you can only materialize features for a specific entity/key in the same `materialize_features` call.
 
-## Generating Features to Online Store
+## Materializing Features to Online Store
 
 When the models are served in an online environment, we also need to serve the corresponding features in the same online environment as well. Feathr provides APIs to generate features to online storage for future consumption. For example:
 
@@ -119,7 +119,7 @@ client.materialize_features(settings, execution_configurations={ "spark.feathr.o
 For reading those materialized features, Feathr has a convenient helper function called `get_result_df` to help you view the data. For example, you can use the sample code below to read from the materialized result in offline store:
 
 ```python
-
+from feathr import get_result_df
 path = "abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/20/"
 res = get_result_df(client=client, format="parquet", res_url=path)
 ```

diff --git a/docs/dev_guide/feathr_overall_release_guide.md b/docs/dev_guide/feathr_overall_release_guide.md
@@ -4,26 +4,36 @@ title: Developer Guide for Feathr Overall Release Guide
 parent: Developer Guides
 ---
 
-# When to Release
-- For each major and minor version release, please follow these steps. 
+# Feathr Overall Release Guide
+
+This document describes all the release process for the development team.
+
+## When to Release
+
+- For each major and minor version release, please follow these steps.
 - For patch versions, there should be no releases.
 
-# Writing Release Note
+## Writing Release Note
+
 Write a release note following past examples [here](https://github.com/linkedin/feathr/releases).
 Read through the [commit log](https://github.com/linkedin/feathr/commits/main) to identify the commits after last release to include in the release note. Here are the major things to include
+
 - highlights of the release
 - improvements and changes of this release
 - new contributors of this release
 
+## Release Maven
 
-# Release Maven
 See [Developer Guide for publishing to maven](publish_to_maven.md)
 
 ## Upload Feathr Jar
+
 Run the command to generate the Java jar. After the jar is generated, please upload to [Azure storage](https://ms.portal.azure.com/#view/Microsoft_Azure_Storage/ContainerMenuBlade/~/overview/storageAccountId/%2Fsubscriptions%2Fa6c2a7cc-d67e-4a1a-b765-983f08c0423a%2FresourceGroups%2Fazurefeathrintegration%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fazurefeathrstorage/path/public/etag/%220x8D9E6F64D62D599%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container) for faster access.
 
-# Release PyPi
+## Release PyPi
+
 See [Python Package Release Note](python_package_release.md)
 
-# Announcement
+## Announcement
+
 Please announce the release in our #general Slack channel.
diff --git a/docs/dev_guide/publish_to_maven.md b/docs/dev_guide/publish_to_maven.md
@@ -3,13 +3,15 @@ layout: default
 title: Developer Guide for publishing to maven
 parent: Developer Guides
 ---
+
 # Developer Guide for publishing to maven
 
 ## Manual Publishing
 
 1. Get account details to login to https://oss.sonatype.org/
 2. Install GPG, setup keys, and export to a key server
-```
+
+```bash
 $ gpg --gen-key
 ...
 Real name: Central Repo Test
@@ -32,37 +34,46 @@ $ gpg --keyserver keyserver.ubuntu.com --recv-keys CA925CD6C9E8D064FF05B4728190C
 if failing to programmatically export to key server, you can export it manually and upload to http://keyserver.ubuntu.com/ via `submit key`
 
 run the following command to generated the ASCII-armored public key needed by the key server
+
 ```
 gpg --armor --export user-id > pubkey.asc
 ```
+
 https://www.linuxbabe.com/security/a-practical-guide-to-gpg-part-1-generate-your-keypair
 
 3. Setup your credentials locally at `$HOME/.sbt/0.13/sonatype.sbt`
+
 ```
 credentials += Credentials("Sonatype Nexus Repository Manager",
         "oss.sonatype.org",
         "(Sonatype user name)",
         "(Sonatype password)")
 ```
+
 (ref, https://github.com/xerial/sbt-sonatype)
 
 4. Publish to maven via sbt
-In your feathr directory, clear your cache to prevent stale errors
+   In your feathr directory, clear your cache to prevent stale errors
+
 ```
 rm -rf target/sonatype-staging/
 ```
+
 Start sbt console by running
+
 ```
 sbt -java-home /Library/Java/JavaVirtualMachines/jdk1.8.0_282-msft.jdk/Contents/Home
 ```
+
 Execute command in sbt console to publish to maven
+
 ```
 reload
  ; publishSigned; sonatypeBundleRelease
 ```
 
 5. "Upon release, your component will be published to Central: this typically occurs within 30 minutes, though updates to search can take up to four hours."
-https://central.sonatype.org/publish/publish-guide/#releasing-to-central
+   https://central.sonatype.org/publish/publish-guide/#releasing-to-central
 
 6. After new version is released via Maven, use the released version to run a test to ensure it actually works. You can do this by running a codebase that imports Feathr scala code.
 
@@ -72,9 +83,6 @@ https://central.sonatype.org/publish/publish-guide/#releasing-to-central
 
 ### References
 
-
-
 https://central.sonatype.org/publish/publish-guide/#deployment
 
 https://www.scala-sbt.org/1.x/docs/Using-Sonatype.html
-