Skip to content

Commit 5e42cc8

Browse files
authored
add doc for new bert example (deepspeedai#2224)
1 parent 7d8ad45 commit 5e42cc8

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

docs/_tutorials/bert-pretraining.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ excerpt: ""
44
tags: training pre-training
55
---
66

7+
**Note:**
8+
On 08/15/2022 we have added another BERT pre-training/fine-tuning example at [github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/bert_with_pile](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/bert_with_pile), which includes a README.md that describes how to use it. Compared to the example described below, the new example in Megatron-DeepSpeed adds supports of ZeRO and tensor-slicing model parallelism (thus support larger model scale), uses a public and richer [Pile dataset](https://github.com/EleutherAI/the-pile) (user can also use their own data), together with some changes to the model architecture and training hyperparameters as described in [this paper](https://arxiv.org/abs/1909.08053). As a result, the BERT models trained by the new example is able to provide better MNLI results than original BERT, but with a slightly different model architecture and larger computation requirements. If you want to train a larger-scale or better quality BERT-style model, we recommend to follow the new example in Megatron-DeepSpeed. If your goal is to strictly reproduce the original BERT model, we recommend to follow the example under DeepSpeedExamples/bing_bert as described below. On the other hand, the tutorial below helps explaining how to integrate DeepSpeed into a pre-training codebase, regardless of which BERT example you use.
9+
{: .notice--info}
10+
711
In this tutorial we will apply DeepSpeed to pre-train the BERT
812
(**B**idirectional **E**ncoder **R**epresentations from **T**ransformers),
913
which is widely used for many Natural Language Processing (NLP) tasks. The

0 commit comments

Comments
 (0)