@@ -531,16 +531,16 @@ FP16
531531| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
532532| ------------| -----------------| --------------------------------| ------------------| ------------------| ------------------| ------------------|
533533| 1 | 384 | 178 | 5.630 | 5.500 | 5.555 | 5.608 |
534- | 256 | 384 | 857 | 1.112 | 1.111 | 1.111 | 1.112 |
535- | 512 | 384 | 864 | 1.054 | 1.051 | 1.053 | 1.053 |
534+ | 256 | 384 | 857 | 284.67 | 284.416 | 284.416 | 284.67 |
535+ | 512 | 384 | 864 | 539.648 | 538.112 | 539.136 | 539.136 |
536536
537537TF32
538538
539539| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
540540| ------------| -----------------| --------------------------------| ------------------| ------------------| ------------------| ------------------|
541541| 1 | 384 | 123 | 8.186 | 7.995 | 8.078 | 8.152 |
542- | 256 | 384 | 344 | 2.832 | 2.822 | 2.826 | 2.830 |
543- | 512 | 384 | 351 | 2.787 | 2.781 | 2.784 | 2.784 |
542+ | 256 | 384 | 344 | 724.992 | 722.432 | 723.456 | 724.48 |
543+ | 512 | 384 | 351 | 1426.944 | 1423.872 | 1425.408 | 1425.408 |
544544
545545
546546
@@ -556,17 +556,17 @@ FP16
556556| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
557557| ------------| -----------------| --------------------------------| ------------------| ------------------| ------------------| ------------------|
558558| 1 | 384 | 141 | 7.100 | 7.071 | 7.081 | 7.091 |
559- | 128 | 384 | 517 | 1.933 | 1.930 | 1.930 | 1.932 |
560- | 256 | 384 | 524 | 1.910 | 1.907 | 1.908 | 1.909 |
559+ | 128 | 384 | 517 | 247.424 | 247.04 | 247.04 | 247.296 |
560+ | 256 | 384 | 524 | 488.96 | 488.192 | 488.448 | 488.704 |
561561
562562
563563FP32
564564
565565| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
566566| ------------| -----------------| --------------------------------| ------------------| ------------------| ------------------| ------------------|
567567| 1 | 384 | 84 | 11.869 | 11.814 | 11.832 | 11.850 |
568- | 128 | 384 | 117 | 8.548 | 8.527 | 8.529 | 8.537 |
569- | 256 | 384 | 141 | 7.100 | 7.071 | 7.081 | 7.091 |
568+ | 128 | 384 | 117 | 1094.144 | 1091.456 | 1091.712 | 1092.736 |
569+ | 256 | 384 | 141 | 1817.6 | 1810.176 | 1812.736 | 1815.552 |
570570
571571
572572##### Inference performance: NVIDIA DGX-2 (1x V100 32GB)
@@ -581,16 +581,16 @@ FP16
581581| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
582582| ------------| -----------------| --------------------------------| ------------------| ------------------| ------------------| ------------------|
583583| 1 | 384 | 144 | 6.953 | 6.888 | 6.910 | 6.932 |
584- | 128 | 384 | 547 | 1.828 | 1.827 | 1.827 | 1.828 |
585- | 256 | 384 | 557 | 1.795 | 1.792 | 1.793 | 1.794 |
584+ | 128 | 384 | 547 | 233.984 | 233.856 | 233.856 | 233.984 |
585+ | 256 | 384 | 557 | 459.52 | 458.752 | 459.008 | 459.264 |
586586
587587FP32
588588
589589| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
590590| ------------| -----------------| --------------------------------| ------------------| ------------------| ------------------| ------------------|
591591| 1 | 384 | 86 | 11.580 | 11.515 | 11.535 | 11.558 |
592- | 128 | 384 | 124 | 8.056 | 8.05 | 8.052 | 8.055 |
593- | 256 | 384 | 125 | 8.006 | 8.002 | 8.004 | 8.005 |
592+ | 128 | 384 | 124 | 1031.168 | 1030.4 | 1030.656 | 1031.04 |
593+ | 256 | 384 | 125 | 2049.536 | 2048.512 | 2049.024 | 2049.29 |
594594
595595
596596To achieve these same results, follow the steps in the [ Quick Start Guide] ( #quick-start-guide ) .
0 commit comments