From 74383d728d82e23b4505402be1e43388c2bc7ba3 Mon Sep 17 00:00:00 2001 From: Sharath T S Date: Tue, 1 Sep 2020 19:11:23 -0700 Subject: [PATCH] specify GPU for triton --- PyTorch/LanguageModeling/BERT/triton/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/PyTorch/LanguageModeling/BERT/triton/README.md b/PyTorch/LanguageModeling/BERT/triton/README.md index 119b92592..1474fdc4f 100644 --- a/PyTorch/LanguageModeling/BERT/triton/README.md +++ b/PyTorch/LanguageModeling/BERT/triton/README.md @@ -102,7 +102,7 @@ To make the machine wait until the server is initialized, and the model is ready ## Performance -The numbers below are averages, measured on Triton, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching). +The numbers below are averages, measured on Triton on V100 32G GPU, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching). | Format | GPUs | Batch size | Sequence length | Throughput - FP32(sequences/sec) | Throughput - mixed precision(sequences/sec) | Throughput speedup (mixed precision/FP32) | |--------|------|------------|-----------------|----------------------------------|---------------------------------------------|--------------------------------------------|