Skip to content

Commit 85c49c0

Browse files
DeepSpeed Data Efficiency Library finetuning examples (deepspeedai#227)
* initial commit * update random-ltd * add vit * vision transformer * update-name * saving without randomltd * update naming * update for dynamic train * update for dynamic train * checking kernel implementation * check kernel acc * update json * fix for cifar randomltd * vit-finetuning * refactor * refacrtor * refactor * update readme * update readme * update readme * update readme * move to bash * training log * training log * clean and update gpt * output * rename dir * cleanup * fix * fix Co-authored-by: xiaoxiawu <xiaoxiawu@microsoft.com> Co-authored-by: xiaoxiawu <xiaoxiawu>
1 parent d5eb762 commit 85c49c0

20 files changed

Lines changed: 3166 additions & 0 deletions
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#### Example of fine-tuning GPT using random-LTD (https://arxiv.org/abs/2211.11586)
2+
3+
#### Install
4+
5+
``pip install -r requirement.txt``
6+
7+
You will also need to install updated DeepSpeed version (>=0.7.7), which contains the random-ltd library.
8+
9+
#### Key File: run_clm_no_trainer.py
10+
11+
The python code is modified based on huggingface (https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py). The key added feature is our random-ltd.
12+
13+
#### Folders (config)
14+
15+
* **config:** This folder provides DeepSpeed configuration, including the schedules of sequence-length and the layers applied by random-ltd.
16+
17+
#### bash script
18+
19+
* **run_base_random_ltd.sh/run_medium_random_ltd.sh** This bash script contains jobs for training with random-ltd
20+
* Run the job under the gpt_finetuning directory:
21+
22+
``DeepSpeedExamples/random_ltd/gpt_finetuning$ . ./bash_script/run_base_random_ltd.sh``
23+
24+
25+
``DeepSpeedExamples/random_ltd/gpt_finetuning$ . ./bash_script/run_medium_random_ltd.sh``
26+
See more descriptions and results in our [tutorial page](https://www.deepspeed.ai/).
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/bin/bash
2+
##################apply random-ltd to fine-tune ptb on GPT-base (12-layer)##############################
3+
####see more on random-ltd: https://arxiv.org/abs/2211.11586
4+
export CUDA_VISIBLE_DEVICES=1
5+
mkdir -p ./output/check_base
6+
python -m torch.distributed.launch --nproc_per_node=1 \
7+
--master_port 12346 \
8+
run_clm_no_trainer.py \
9+
--random_ltd \
10+
--dataset_name ptb_text_only \
11+
--dataset_config_name penn_treebank \
12+
--model_name_or_path gpt2 \
13+
--per_device_train_batch_size 2 \
14+
--per_device_eval_batch_size 4 \
15+
--num_train_epochs 2 \
16+
--deepspeed_config config/ds_config_gpt_base_random_ltd.json \
17+
--deepspeed --seed 1234 --num_warmup_steps 100 \
18+
--output_dir ./output/check_base &> ./output/check_base/training.log
19+
20+
# python run_clm_no_trainer.py \
21+
# --random_ltd \
22+
# --dataset_name ptb_text_only \
23+
# --dataset_config_name penn_treebank \
24+
# --model_name_or_path gpt2 \
25+
# --per_device_train_batch_size 2 \
26+
# --per_device_eval_batch_size 4 \
27+
# --num_train_epochs 2 \
28+
# --deepspeed_config config/ds_config_gpt_base_random_ltd.json \
29+
# --deepspeed --seed 1234\
30+
# --output_dir ./output/check_base
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
##################apply random-ltd to fine-tune ptb on GPT-medium (24-layer)##############################
3+
####see more on random-ltd: https://arxiv.org/abs/2211.11586
4+
export CUDA_VISIBLE_DEVICES=2
5+
mkdir -p ./output/check_medium
6+
python -m torch.distributed.launch --nproc_per_node=1 \
7+
--master_port 12345 \
8+
run_clm_no_trainer.py \
9+
--random_ltd \
10+
--dataset_name ptb_text_only \
11+
--dataset_config_name penn_treebank \
12+
--model_name_or_path gpt2-medium \
13+
--per_device_train_batch_size 2 \
14+
--per_device_eval_batch_size 2 \
15+
--num_train_epochs 2 \
16+
--deepspeed_config config/ds_config_gpt_medium_random_ltd.json \
17+
--deepspeed --seed 1234 --num_warmup_steps 100 \
18+
--output_dir ./output/check_medium &> ./output/check_medium/training.log
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
{
2+
"train_batch_size" : 4,
3+
"train_micro_batch_size_per_gpu": 2,
4+
"steps_per_print": 2,
5+
"optimizer": {
6+
"type": "Adam",
7+
"params": {
8+
"lr": 0.0001,
9+
"betas": [0.8,0.999],
10+
"eps": 1e-8,
11+
"weight_decay": 3e-7
12+
}
13+
},
14+
"zero_optimization": {
15+
"stage": 0
16+
},
17+
"fp16":{
18+
"enabled": false
19+
},
20+
"gradient_clipping": 1.0,
21+
"prescale_gradients": true,
22+
"wall_clock_breakdown" : false,
23+
"data_efficiency": {
24+
"enabled": true,
25+
"data_routing": {
26+
"enabled": true,
27+
"random_ltd":{
28+
"enabled": true,
29+
"total_layer_num": 12,
30+
"random_ltd_layer_num": 10,
31+
"random_ltd_layer_id": [1,2,3,4,5,6,7,8,9,10],
32+
"model_mask_name": "attention_mask",
33+
"model_type": "decoder",
34+
"hidden_state_order": "batch_seq_dim",
35+
"random_ltd_schedule": {
36+
"min_value": 128,
37+
"max_value": 1024,
38+
"schedule_type": "fixed_linear",
39+
"schedule_config": {
40+
"require_steps": 400,
41+
"seq_per_step": 8
42+
}
43+
}
44+
}
45+
}
46+
}
47+
}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
{
2+
"train_batch_size" : 4,
3+
"train_micro_batch_size_per_gpu": 2,
4+
"steps_per_print": 2,
5+
"optimizer": {
6+
"type": "Adam",
7+
"params": {
8+
"lr": 0.0001,
9+
"betas": [0.8,0.999],
10+
"eps": 1e-8,
11+
"weight_decay": 3e-7
12+
}
13+
},
14+
"zero_optimization": {
15+
"stage": 0
16+
},
17+
"fp16":{
18+
"enabled": false
19+
},
20+
"gradient_clipping": 1.0,
21+
"prescale_gradients": true,
22+
"wall_clock_breakdown" : false,
23+
"data_efficiency": {
24+
"enabled": true,
25+
"data_routing": {
26+
"enabled": true,
27+
"random_ltd":{
28+
"enabled": true,
29+
"total_layer_num": 24,
30+
"random_ltd_layer_num": 22,
31+
"random_ltd_layer_id": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
32+
"model_mask_name": "attention_mask",
33+
"model_type": "decoder",
34+
"hidden_state_order": "batch_seq_dim",
35+
"random_ltd_schedule": {
36+
"min_value": 128,
37+
"max_value": 1024,
38+
"schedule_type": "fixed_linear",
39+
"schedule_config": {
40+
"require_steps": 400,
41+
"seq_per_step": 8
42+
}
43+
}
44+
}
45+
}
46+
}
47+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
datasets >= 1.8.0
2+
sentencepiece != 0.1.92
3+
protobuf
4+
transformers == 4.15.0
5+
accelerate

0 commit comments

Comments
 (0)