Skip to content

Commit 8588e98

Browse files
authored
[BERT/PyT] specify GPU for triton (NVIDIA#666)
1 parent 5cc03ca commit 8588e98

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

  • PyTorch/LanguageModeling/BERT/triton

PyTorch/LanguageModeling/BERT/triton/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ To make the machine wait until the server is initialized, and the model is ready
102102

103103
## Performance
104104

105-
The numbers below are averages, measured on Triton, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching).
105+
The numbers below are averages, measured on Triton on V100 32G GPU, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching).
106106

107107
| Format | GPUs | Batch size | Sequence length | Throughput - FP32(sequences/sec) | Throughput - mixed precision(sequences/sec) | Throughput speedup (mixed precision/FP32) |
108108
|--------|------|------------|-----------------|----------------------------------|---------------------------------------------|--------------------------------------------|

0 commit comments

Comments
 (0)