Skip to content

Commit a5976cb

Browse files
tgrelszmigacz
authored andcommitted
VAE-CF README fixes (NVIDIA#332)
1 parent 51b910f commit a5976cb

1 file changed

Lines changed: 10 additions & 13 deletions

File tree

TensorFlow/Recommendation/VAE-CF/README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Variational Autoencoder for Collaborative Filtering 19.11 for TensorFlow
1+
# Variational Autoencoder for Collaborative Filtering for TensorFlow
22

33
This repository provides a script and recipe to train the Variational Autoencoder model for TensorFlow to achieve state-of-the-art accuracy on a Collaborative Filtering task and is tested and maintained by NVIDIA.
44

@@ -29,20 +29,19 @@ This repository provides a script and recipe to train the Variational Autoencode
2929
* [Inference performance benchmark](#inference-performance-benchmark)
3030
* [Results](#results)
3131
* [Training accuracy results](#training-accuracy-results)
32-
* [Training accuracy: NVIDIA DGX-1 (8x V100 16G)](#training-accuracy-nvidia-dgx-1-(8x-v100-16G))
33-
* [Training stability test](#training-stability-test)
32+
* [Training accuracy: NVIDIA DGX-1 (8x V100 16G)](#training-accuracy-nvidia-dgx-1-8x-v100-16g)
3433
* [Training performance results](#training-performance-results)
35-
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-(8x-v100-16G))
34+
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-8x-v100-16g)
3635
* [Inference performance results](#inference-performance-results)
37-
* [Inference performance: NVIDIA DGX-1 (1x V100 16G)](#inference-performance-nvidia-dgx-1-(1x-v100-16G))
36+
* [Inference performance: NVIDIA DGX-1 (1x V100 16G)](#inference-performance-nvidia-dgx-1-1x-v100-16g)
3837
- [Release notes](#release-notes)
3938
* [Changelog](#changelog)
4039
* [Known issues](#known-issues)
4140

4241

4342
## Model overview
4443

45-
The Variational Autoencoder (VAE) shown here is an optimized implementation of the architecture first described in Variational [Autoencoders for Collaborative Filtering] (https://arxiv.org/abs/1802.05814) and can be used for recommendation tasks. The main differences between this model and the original one are the performance optimizations, such as using sparse matrices, mixed precision, larger mini-batches and multiple GPUs. These changes enabled us to achieve a significantly better speed while maintaining the same accuracy. Because of our fast implementation, we’ve also been able to carry out an extensive hyperparameter search to slightly improve the accuracy metrics.
44+
The Variational Autoencoder (VAE) shown here is an optimized implementation of the architecture first described in [Variational Autoencoders for Collaborative Filtering](https://arxiv.org/abs/1802.05814) and can be used for recommendation tasks. The main differences between this model and the original one are the performance optimizations, such as using sparse matrices, mixed precision, larger mini-batches and multiple GPUs. These changes enabled us to achieve a significantly better speed while maintaining the same accuracy. Because of our fast implementation, we’ve also been able to carry out an extensive hyperparameter search to slightly improve the accuracy metrics.
4645

4746
When using Variational Autoencoder for Collaborative Filtering (VAE-CF), you can quickly train a recommendation model for a collaborative filtering task. The required input data consists of pairs of user-item IDs for each interaction between a user and an item. With a trained model, you can run inference to predict what items are a new user most likely to interact with.
4847

@@ -83,10 +82,12 @@ The following features are supported by this model:
8382

8483
#### Features
8584

86-
Horovod
85+
##### Horovod
86+
8787
Horovod is a distributed training framework for TensorFlow, Keras, PyTorch and MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. For more information about how to get started with Horovod, see the [Horovod: Official repository](https://github.com/horovod/horovod).
8888

89-
Multi-GPU training with Horovod
89+
##### Multi-GPU training with Horovod
90+
9091
Our model uses Horovod to implement efficient multi-GPU training with NCCL. For details, see example sources in this repository or see the [TensorFlow tutorial](https://github.com/horovod/horovod/#usage).
9192

9293

@@ -134,14 +135,12 @@ For those unable to use the TensorFlow NGC container, to set up the required env
134135
To train your model using mixed precision with Tensor Cores or using FP32, perform the following steps using the default parameters of the VAE-CF model on the [MovieLens 20m dataset](https://grouplens.org/datasets/movielens/20m/). For the specifics concerning training and inference, see the [Advanced](#advanced) section.
135136

136137
1. Clone the repository.
137-
138138
```bash
139139
git clone https://github.com/NVIDIA/DeepLearningExamples
140140
cd DeepLearningExamples/Tensorflow/Recommendation/VAE_CF
141141
```
142142

143143
2. Build the VAE TensorFlow NGC container.
144-
145144
```bash
146145
docker build . -t vae
147146
```
@@ -160,11 +159,9 @@ python3 prepare_dataset.py
160159
```bash
161160
python3 main.py --train --use_tf_amp --checkpoint_dir ./checkpoints
162161
```
163-
6. Start validation/evaluation.
164162

163+
6. Start validation/evaluation.
165164
The model is exported to the default `model_dir` and can be loaded and tested using:
166-
167-
168165
```bash
169166
python3 main.py --test --use_tf_amp --checkpoint_dir ./checkpoints
170167
```

0 commit comments

Comments
 (0)