Skip to content

Commit 6fe463f

Browse files
committed
[BERT/PyT] Support for multi-node
1 parent b07f501 commit 6fe463f

43 files changed

Lines changed: 929 additions & 264 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,20 @@
1-
data/download/
2-
data/extracted/
3-
data/formatted_one_article_per_line/
4-
data/sharded/
5-
data/hdf5/
1+
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
data/download
15+
data/extracted
16+
data/formatted_one_article_per_line
17+
data/sharded
18+
data/hdf5
619
vocab/
7-
results/
8-
checkpoints/*
20+
results/

PyTorch/LanguageModeling/BERT/.gitignore

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,11 @@ __pycache__/
88
# C extensions
99
*.so
1010

11-
#Data
11+
#Data checkpoints and results
1212
data/*/*/
1313
data/*/*.zip
14-
data/*
15-
16-
#checkpoints and results
17-
checkpoints/*
18-
results/*
14+
checkpoints/
15+
results/
1916

2017
# Distribution / packaging
2118
.Python

PyTorch/LanguageModeling/BERT/Dockerfile

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,22 @@
1-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.07-py3
1+
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.08-py3
215
FROM ${FROM_IMAGE_NAME}
316
RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract
417

518
ENV BERT_PREP_WORKING_DIR /workspace/bert/data
619

7-
WORKDIR /opt
8-
RUN rm -rf /opt/pytorch/apex ; \
9-
git clone https://github.com/NVIDIA/apex.git pytorch/apex ; \
10-
cd pytorch/apex ; \
11-
pip uninstall --yes apex; \
12-
git checkout 880ab925bce9f817a93988b021e12db5f67f7787; \
13-
git pull; \
14-
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
15-
16-
#WORKDIR /opt
17-
#RUN cd pytorch/apex \
18-
# && git fetch origin pull/334/head:multi_tensor_lamb_optimizer \
19-
# && git checkout multi_tensor_lamb_optimizer \
20-
# && python setup.py develop --cuda_ext --cpp_ext
21-
2220
WORKDIR /workspace
2321
RUN git clone https://github.com/attardi/wikiextractor.git
2422
RUN git clone https://github.com/soskek/bookcorpus.git

PyTorch/LanguageModeling/BERT/LICENSE

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
Apache License
32
Version 2.0, January 2004
43
http://www.apache.org/licenses/
@@ -176,6 +175,8 @@
176175

177176
END OF TERMS AND CONDITIONS
178177

178+
Copyright 2019 NVIDIA CORPORATION. All rights reserved.
179+
179180
APPENDIX: How to apply the Apache License to your work.
180181

181182
To apply the Apache License to your work, attach the following

PyTorch/LanguageModeling/BERT/README.md

Lines changed: 278 additions & 186 deletions
Large diffs are not rendered by default.

PyTorch/LanguageModeling/BERT/bind_pyt.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,16 @@
1+
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
114
import sys
215
import subprocess
316
import os
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#1 DGX1 phase1
16+
bert--DGX1:
17+
<<: *BERT_ON_CLUSTER
18+
<<: *DGX1
19+
variables:
20+
<<: *DGX1_VARS
21+
NNODES: "1"
22+
BATCHSIZE: "8192"
23+
LR: "6e-3"
24+
GRADIENT_STEPS: "512"
25+
PHASE: "1"
26+
27+
#4 DGX1 phase1
28+
bert--DGX1_4x8x16x128:
29+
<<: *BERT_ON_CLUSTER
30+
<<: *DGX1
31+
variables:
32+
<<: *DGX1_VARS
33+
NNODES: "4"
34+
BATCHSIZE: "2048"
35+
LR: "6e-3"
36+
GRADIENT_STEPS: "128"
37+
PHASE: "1"
38+
39+
#16 DGX1 phase1
40+
bert--DGX1_16x8x16x32:
41+
<<: *BERT_ON_CLUSTER
42+
<<: *DGX1
43+
variables:
44+
<<: *DGX1_VARS
45+
NNODES: "16"
46+
BATCHSIZE: "512"
47+
LR: "6e-3"
48+
GRADIENT_STEPS: "32"
49+
PHASE: "1"
50+
51+
#1 DGX2 phase1
52+
bert--DGX2:
53+
<<: *BERT_ON_CLUSTER
54+
<<: *DGX2
55+
variables:
56+
<<: *DGX2_VARS
57+
NNODES: "1"
58+
BATCHSIZE: "4096"
59+
LR: "6e-3"
60+
GRADIENT_STEPS: "64"
61+
PHASE: "1"
62+
63+
#4 DGX2 phase1
64+
bert--DGX2_4x16x64x16:
65+
<<: *BERT_ON_CLUSTER
66+
<<: *DGX2
67+
variables:
68+
<<: *DGX2_VARS
69+
NNODES: "4"
70+
BATCHSIZE: "1024"
71+
LR: "6e-3"
72+
GRADIENT_STEPS: "16"
73+
PHASE: "1"
74+
75+
#16 DGX2 phase1
76+
bert--DGX2_16x16x64x4:
77+
<<: *BERT_ON_CLUSTER
78+
<<: *DGX2
79+
variables:
80+
<<: *DGX2_VARS
81+
NNODES: "16"
82+
BATCHSIZE: "256"
83+
LR: "6e-3"
84+
GRADIENT_STEPS: "4"
85+
PHASE: "1"
86+
87+
#64 DGX2 phase1
88+
bert--DGX2_64x16x64:
89+
<<: *BERT_ON_CLUSTER
90+
<<: *DGX2
91+
variables:
92+
<<: *DGX2_VARS
93+
NNODES: "64"
94+
BATCHSIZE: "64"
95+
LR: "6e-3"
96+
GRADIENT_STEPS: "1"
97+
PHASE: "1"
98+
99+
#1 DGX1 phase2
100+
bert--DGX1_1x8x4x1024:
101+
<<: *BERT_ON_CLUSTER
102+
<<: *DGX1
103+
variables:
104+
<<: *DGX1_VARS
105+
NNODES: "1"
106+
BATCHSIZE: "4096"
107+
LR: "4e-3"
108+
GRADIENT_STEPS: "1024"
109+
PHASE: "2"
110+
111+
#4 DGX1 phase2
112+
bert--DGX1_4x8x4x256:
113+
<<: *BERT_ON_CLUSTER
114+
<<: *DGX1
115+
variables:
116+
<<: *DGX1_VARS
117+
NNODES: "4"
118+
BATCHSIZE: "1024"
119+
LR: "4e-3"
120+
GRADIENT_STEPS: "256"
121+
PHASE: "2"
122+
123+
#16 DGX1 phase2
124+
bert--DGX1_16x8x4x64:
125+
<<: *BERT_ON_CLUSTER
126+
<<: *DGX1
127+
variables:
128+
<<: *DGX1_VARS
129+
NNODES: "16"
130+
BATCHSIZE: "256"
131+
LR: "4e-3"
132+
GRADIENT_STEPS: "64"
133+
PHASE: "2"
134+
135+
#1 DGX2 phase2
136+
bert--DGX2_1x16x8x256:
137+
<<: *BERT_ON_CLUSTER
138+
<<: *DGX2
139+
variables:
140+
<<: *DGX2_VARS
141+
NNODES: "1"
142+
BATCHSIZE: "2048"
143+
LR: "4e-3"
144+
GRADIENT_STEPS: "256"
145+
PHASE: "2"
146+
147+
#4 DGX2 phase2
148+
bert--DGX2_4x16x8x64:
149+
<<: *BERT_ON_CLUSTER
150+
<<: *DGX2
151+
variables:
152+
<<: *DGX2_VARS
153+
NNODES: "4"
154+
BATCHSIZE: "512"
155+
LR: "4e-3"
156+
GRADIENT_STEPS: "64"
157+
PHASE: "2"
158+
159+
#16 DGX2 phase2
160+
bert--DGX2_16x16x8x16:
161+
<<: *BERT_ON_CLUSTER
162+
<<: *DGX2
163+
variables:
164+
<<: *DGX2_VARS
165+
NNODES: "16"
166+
BATCHSIZE: "128"
167+
LR: "4e-3"
168+
GRADIENT_STEPS: "16"
169+
PHASE: "2"
170+
171+
#64 DGX2 phase2
172+
bert--DGX2_64x16x8x4:
173+
<<: *BERT_ON_CLUSTER
174+
<<: *DGX2
175+
variables:
176+
<<: *DGX2_VARS
177+
NNODES: "64"
178+
BATCHSIZE: "32"
179+
LR: "4e-3"
180+
GRADIENT_STEPS: "4"
181+
PHASE: "2"
182+

PyTorch/LanguageModeling/BERT/create_pretraining_data.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# coding=utf-8
2-
# Copyright 2018 The Google AI Language Team Authors.
3-
#
2+
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
3+
# Copyright 2018 The Google AI Language Team Authors and The HugginFace Inc. team.
44
# Licensed under the Apache License, Version 2.0 (the "License");
55
# you may not use this file except in compliance with the License.
66
# You may obtain a copy of the License at
@@ -12,6 +12,7 @@
1212
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
15+
1516
"""Create masked LM/next sentence masked_lm TF examples for BERT."""
1617
from __future__ import absolute_import, division, print_function, unicode_literals
1718

PyTorch/LanguageModeling/BERT/data/BooksDownloader.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
11
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
214
import subprocess
315

416
class BooksDownloader:

PyTorch/LanguageModeling/BERT/data/BookscorpusTextFormatting.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
11
# Copyright (c) 2019 NVIDIA CORPORATION. All rights reserved.
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
214
import glob
315
import os
416

0 commit comments

Comments
 (0)