Skip to content

Commit 78ef018

Browse files
authored
Fix the inference-test.py script and update text-generation README (deepspeedai#219)
* Fix inference-test.py script and update README * change init arg name to model to match HF * revert to model_name * Update README to point to proper header
1 parent e4fe07a commit 78ef018

3 files changed

Lines changed: 45 additions & 18 deletions

File tree

inference/huggingface/text-generation/README.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11

22
# DeepSpeed Huggingface Text Generation Examples
33

4+
# Contents
5+
* [Setup](#setup)
6+
* [Usage](#usage)
7+
* [Single-batch Example](#single-batch-example)
8+
* [Multi-batch Example](#multi-batch-example)
9+
* [`DSPipeline` utility class](#dspipeline-utility-class)
10+
411
# Setup
512
Python dependencies:
613
<pre>
@@ -9,12 +16,13 @@ pip install -r requirements.txt
916

1017
# Usage
1118
Examples can be run as follows:
12-
<pre>deepspeed --num_gpus [number of GPUs] inference_test.py --name [model name/path] --batch_size [batch] --dtype [data type]
19+
<pre>deepspeed --num_gpus [number of GPUs] inference-test.py --name [model name/path] --batch_size [batch] --dtype [data type]
1320
</pre>
21+
1422
# Single-batch Example
1523
Command:
1624
<pre>
17-
deepspeed --num_gpus 1 inference_test.py --name facebook/opt-125m
25+
deepspeed --num_gpus 1 inference-test.py --name facebook/opt-125m
1826
</pre>
1927

2028
Output:
@@ -27,7 +35,7 @@ out=DeepSpeed is a machine learning framework based on TensorFlow. It was first
2735
# Multi-batch Example
2836
Command:
2937
<pre>
30-
deepspeed --num_gpus 1 inference_test.py --name bigscience/bloom-3b --batch_size 2
38+
deepspeed --num_gpus 1 inference-test.py --name bigscience/bloom-3b --batch_size 2
3139
</pre>
3240

3341
Output:
@@ -40,4 +48,16 @@ in=He is working on
4048
out=He is working on the new video game 'Bloodborne's' expansion pack. Check out the trailer here: Bloodborne's expansion pack includes a complete remaster of the original game, including over 120 maps, playable characters, new quests, and the possibility
4149
to bring Blood
4250
------------------------------------------------------------
43-
</pre>
51+
</pre>
52+
53+
# `DSPipeline` utility class
54+
The text-generation examples make use of the [`DSPipeline`](utils.py) utility class, a class that helps with loading DeepSpeed meta tensors and is meant to mimic the Hugging Face transformer pipeline.
55+
56+
The BLOOM model is quite large and the way DeepSpeed loads checkpoints for this model is a little different than other HF models. Specifically, we use meta tensors to initialize the model before loading the weights:
57+
58+
<pre>
59+
with deepspeed.OnDevice(dtype=self.dtype, device="meta"):
60+
</pre>
61+
62+
This reduces the total system/GPU memory needed to load the model across multiple GPUs and makes the checkpoint loading faster.
63+
The DSPipeline class helps to load the model and run inference on it, given these differences.

inference/huggingface/text-generation/inference-test.py

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import math
55
import os
66
import torch
7-
from utils import Pipeline
7+
from utils import DSPipeline
88

99

1010
parser = ArgumentParser()
@@ -18,24 +18,29 @@
1818
parser.add_argument("--greedy", default=False, type=bool, help="greedy generation mode")
1919
parser.add_argument("--use_meta_tensor", default=False, type=bool, help="use the meta tensors to initialize model")
2020
parser.add_argument("--use_cache", default=True, type=bool, help="use cache for generation")
21+
parser.add_argument("--local_rank", type=int, default=0, help="local rank")
2122
args = parser.parse_args()
2223

23-
local_rank = int(os.getenv('LOCAL_RANK', '0'))
2424
world_size = int(os.getenv('WORLD_SIZE', '1'))
2525

2626
data_type = getattr(torch, args.dtype)
27-
pipe = Pipeline(model_name=args.name,
28-
dtype=data_type,
29-
is_meta=args.use_meta_tensor,
30-
device=local_rank
31-
)
27+
pipe = DSPipeline(model_name=args.name,
28+
dtype=data_type,
29+
is_meta=args.use_meta_tensor,
30+
device=args.local_rank)
31+
32+
if args.use_meta_tensor:
33+
ds_kwargs = dict(base_dir=pipe.repo_root, checkpoint=pipe.checkpoints_json)
34+
else:
35+
ds_kwargs = dict()
3236

3337
if args.ds_inference:
3438
pipe.model = deepspeed.init_inference(pipe.model,
3539
dtype=data_type,
3640
mp_size=world_size,
3741
replace_with_kernel_inject=True,
38-
max_tokens=args.max_tokens
42+
max_tokens=args.max_tokens,
43+
**ds_kwargs
3944
)
4045

4146
input_sentences = [
@@ -55,10 +60,9 @@
5560

5661
inputs = input_sentences[:args.batch_size]
5762

58-
outputs = pipe(inputs,
59-
num_tokens=args.max_new_tokens,
60-
do_sample=(not args.greedy),
61-
use_cache=args.use_cache)
63+
outputs = pipe(inputs,
64+
num_tokens=args.max_new_tokens,
65+
do_sample=(not args.greedy))
6266

6367
for i, o in zip(inputs, outputs):
6468
print(f"\nin={i}\nout={o}\n{'-'*60}")

inference/huggingface/text-generation/utils.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,11 @@
1111
from huggingface_hub import snapshot_download
1212
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
1313

14-
class Pipeline():
15-
'''Example helper class, meant to mimic HF pipelines'''
14+
class DSPipeline():
15+
'''
16+
Example helper class for comprehending DeepSpeed Meta Tensors, meant to mimic HF pipelines.
17+
The DSPipeline can run with and without meta tensors.
18+
'''
1619
def __init__(self,
1720
model_name='bigscience/bloom-3b',
1821
dtype=torch.float16,

0 commit comments

Comments
 (0)