@@ -7,6 +7,8 @@ For example,
77- comparing rows 1 and 3 answers the question: "What is the performance difference when we train only the last layer instead of the last block?";
88- and so forth.
99
10+   ;
11+
1012| | Model | Weights | Trainable token | Trainable layers | Context length | CPU/GPU | Training time | Training acc | Validation acc | Test acc |
1113| ---| --------------------| ------------| -----------------| ------------------| -------------------------| ---------| ---------------| --------------| ----------------| ----------|
1214| 1 | gpt2-small (124M) | pretrained | last | last_block | longest train ex. (120) | V100 | 0.39 min | 96.63% | 97.99% | 94.33% |
@@ -16,4 +18,17 @@ For example,
1618| 5 | gpt2-medium (355M) | pretrained | last | last_block | longest train ex. (120) | V100 | 0.91 min | 87.50% | 51.01% | 56.67% |
1719| 6 | gpt2-large (774M) | pretrained | last | last_block | longest train ex. (120) | V100 | 1.91 min | 99.52% | 98.66% | 96.67% |
1820| 7 | gpt2-small (124M) | random | last | all | longest train ex. (120) | V100 | 0.93 min | 100% | 97.32% | 93.00% |
19- | 8 | gpt2-small (124M) | pretrained | last | last_block | context length (1024) | V100 | 3.24 min | 83.08% | 87.92% | 78.33% |
21+ | 8 | gpt2-small (124M) | pretrained | last | last_block | context length (1024) | V100 | 3.24 min | 83.08% | 87.92% | 78.33% |
22+
23+   ;
24+
25+ ### Usage:
26+
27+ - Row 1: ` python additional-experiments.py `
28+ - Row 2: ` python additional-experiments.py --trainable_token first `
29+ - Row 3: ` python additional-experiments.py --trainable_layers last_layer `
30+ - Row 4: ` python additional-experiments.py --trainable_layers all `
31+ - Row 5: ` python additional-experiments.py --model_size gpt2-medium (355M) `
32+ - Row 6: ` python additional-experiments.py --model_size gpt2-large (774M) `
33+ - Row 7: ` python additional-experiments.py --weights random --trainable_layers all `
34+ - Row 8: ` python additional-experiments.py --context_length "model_context_length" `
0 commit comments