Skip to content

Commit 12c7bc2

Browse files
authored
Merge pull request NVIDIA#45 from eric-haibin-lin/patch-1
Fix a few typos, and add gluonnlp implementation link
2 parents 9712cf8 + e46eed4 commit 12c7bc2

1 file changed

Lines changed: 8 additions & 7 deletions

File tree

TensorFlow/LanguageModeling/BERT/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ This repository provides a script and recipe to train BERT to achieve state of t
3636

3737
## The model
3838

39-
BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. This model is based on [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) paper. NVIDIA's BERT 19.03 is an optimized version of [Google's official implementation](https://github.com/google-research/bert), leveraging mixed precision arithmetic and tensor cores on V100 GPUS for faster training times while maintaining target accuracy.
39+
BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. This model is based on [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) paper. NVIDIA's BERT 19.03 is an optimized version of [Google's official implementation](https://github.com/google-research/bert), leveraging mixed precision arithmetic and tensor cores on V100 GPUS for faster training times while maintaining target accuracy.
4040

4141

4242
The repository also contains scripts to interactively launch data download, training, benchmarking and inference routines in a Docker container for both pretraining and fine tuning for Question Answering. The major differences between the official implementation of the paper and our version of BERT are as follows:
4343
- Mixed precision support with TensorFlow Automatic Mixed Precision (TF-AMP), which enables mixed precision training without any changes to the code-base by performing automatic graph rewrites and loss scaling controlled by an environmental variable.
4444
- Scripts to download dataset for
45-
- Pretraining - [Wikipedia](https://dumps.wikimedia.org/), [BookCorpus](http://yknzhu.wixsite.com/mbweb)
46-
- Fine Tuning - [SQuaD](https://rajpurkar.github.io/SQuAD-explorer/) (Stanford Question Answering Dataset), Pretrained Weights from Google
45+
- Pretraining - [Wikipedia](https://dumps.wikimedia.org/), [BooksCorpus](http://yknzhu.wixsite.com/mbweb)
46+
- Fine Tuning - [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) (Stanford Question Answering Dataset), Pretrained Weights from Google
4747
- Custom fused CUDA kernels for faster computations
4848
- Multi-GPU/Multi-Node support using Horovod
4949

@@ -58,6 +58,7 @@ These techniques and optimizations improve model performance and reduce training
5858
Other publicly available implementations of BERT include:
5959
1. [Hugging Face](https://github.com/huggingface/pytorch-pretrained-BERT)
6060
2. [codertimo](https://github.com/codertimo/BERT-pytorch)
61+
3. [gluon-nlp](https://github.com/dmlc/gluon-nlp/tree/master/scripts/bert)
6162

6263

6364
This model trains with mixed precision tensor cores on Volta, therefore researchers can get results much faster than training without tensor cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
@@ -121,14 +122,14 @@ bash scripts/docker/launch.sh
121122
```
122123

123124
The `launch.sh` script assumes that the datasets are in the following locations by default after downloading data.
124-
- SQuaD v1.1 - `data/squad/v1.1`
125+
- SQuAD v1.1 - `data/squad/v1.1`
125126
- BERT - `data/pretrained_models_google/uncased_L-24_H-1024_A-16`
126127
- Wikipedia - `data/wikipedia_corpus/final_tfrecords_sharded`
127-
- BookCorpus - `data/bookcorpus/final_tfrecords_sharded`
128+
- BooksCorpus - `data/bookcorpus/final_tfrecords_sharded`
128129

129130

130131
### 5. Start pre-training.
131-
BERT is designed to pre-train deep bidirectional representations for language representations. The following scripts are to replicate pretraining on Wikipedia+Book Corpus from the [paper](https://arxiv.org/pdf/1810.04805.pdf). These scripts are general and can be used for pretraining language representations on any corpus of choice.
132+
BERT is designed to pre-train deep bidirectional representations for language representations. The following scripts are to replicate pretraining on Wikipedia+Books Corpus from the [paper](https://arxiv.org/pdf/1810.04805.pdf). These scripts are general and can be used for pretraining language representations on any corpus of choice.
132133

133134
From within the container, you can use the following script to run pre-training.
134135
```bash
@@ -222,7 +223,7 @@ Aside from options to set hyperparameters, some relevant options to control the
222223
```
223224

224225
### Getting the data
225-
For pre-training BERT, we use the concatenation of Wikipedia (2500M words) as well as Book Corpus (800M words). For Wikipedia, we extract only the text passages from [here](ftp://ftpmirror.your.org/pub/wikimedia/dumps/enwiki/20190301/enwiki-20190301-pages-articles-multistream.xml.bz2) and ignore headers list and tables. It is structured as a document level corpus rather than a shuffled sentence level corpus because it is critical to extract long contiguous sentences. The next step is to run `create_pretraining_data.py` with the document level corpus as input, which generates input data and labels for the masked language modeling and next sentence prediction tasks. Pre-training can also be performed on any corpus of your choice. The collection of data generation scripts are intended to be modular to allow modifications for additional preprocessing steps or to use additional data.
226+
For pre-training BERT, we use the concatenation of Wikipedia (2500M words) as well as Books Corpus (800M words). For Wikipedia, we extract only the text passages from [here](ftp://ftpmirror.your.org/pub/wikimedia/dumps/enwiki/20190301/enwiki-20190301-pages-articles-multistream.xml.bz2) and ignore headers list and tables. It is structured as a document level corpus rather than a shuffled sentence level corpus because it is critical to extract long contiguous sentences. The next step is to run `create_pretraining_data.py` with the document level corpus as input, which generates input data and labels for the masked language modeling and next sentence prediction tasks. Pre-training can also be performed on any corpus of your choice. The collection of data generation scripts are intended to be modular to allow modifications for additional preprocessing steps or to use additional data.
226227

227228
We can use a pre-trained BERT model for other fine tuning tasks like Question Answering. We use SQuaD for this task. SQuaD v1.1 has 100,000+ question-answer pairs on 500+ articles. SQuaD v2.0 combines v1.1 with an additional 50,000 new unanswerable questions and must not only answer questions but also determine when that is not possible.
228229

0 commit comments

Comments
 (0)