Skip to content

Commit 446c878

Browse files
authored
[ELECTRA/TF2] Update inference latency (NVIDIA#657)
* Update inference latency * Fix inference perf numbers * Fix latency computation
1 parent bbbc823 commit 446c878

2 files changed

Lines changed: 13 additions & 13 deletions

File tree

TensorFlow2/LanguageModeling/ELECTRA/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -531,16 +531,16 @@ FP16
531531
| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
532532
|------------|-----------------|--------------------------------|------------------|------------------|------------------|------------------|
533533
| 1 | 384 | 178 | 5.630 | 5.500 | 5.555 | 5.608 |
534-
| 256 | 384 | 857 | 1.112 | 1.111 | 1.111 | 1.112 |
535-
| 512 | 384 | 864 | 1.054 | 1.051 | 1.053 | 1.053 |
534+
| 256 | 384 | 857 | 284.67 | 284.416| 284.416 | 284.67 |
535+
| 512 | 384 | 864 | 539.648| 538.112 | 539.136 | 539.136 |
536536

537537
TF32
538538

539539
| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
540540
|------------|-----------------|--------------------------------|------------------|------------------|------------------|------------------|
541541
| 1 | 384 | 123 | 8.186 | 7.995 | 8.078 | 8.152 |
542-
| 256 | 384 | 344 | 2.832 | 2.822 | 2.826 | 2.830 |
543-
| 512 | 384 | 351 | 2.787 | 2.781 | 2.784 | 2.784 |
542+
| 256 | 384 | 344 | 724.992 | 722.432 | 723.456 | 724.48 |
543+
| 512 | 384 | 351 | 1426.944 | 1423.872 | 1425.408 | 1425.408 |
544544

545545

546546

@@ -556,17 +556,17 @@ FP16
556556
| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
557557
|------------|-----------------|--------------------------------|------------------|------------------|------------------|------------------|
558558
| 1 | 384 | 141 | 7.100 | 7.071 | 7.081 | 7.091 |
559-
| 128 | 384 | 517 | 1.933 | 1.930 | 1.930 | 1.932 |
560-
| 256 | 384 | 524 | 1.910 | 1.907 | 1.908 | 1.909 |
559+
| 128 | 384 | 517 | 247.424 | 247.04 | 247.04 | 247.296 |
560+
| 256 | 384 | 524 | 488.96 | 488.192 | 488.448 | 488.704 |
561561

562562

563563
FP32
564564

565565
| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
566566
|------------|-----------------|--------------------------------|------------------|------------------|------------------|------------------|
567567
| 1 | 384 | 84 | 11.869 | 11.814 | 11.832 | 11.850 |
568-
| 128 | 384 | 117 | 8.548 | 8.527 | 8.529 | 8.537 |
569-
| 256 | 384 | 141 | 7.100 | 7.071 | 7.081 | 7.091 |
568+
| 128 | 384 | 117 | 1094.144 | 1091.456 | 1091.712 | 1092.736 |
569+
| 256 | 384 | 141 | 1817.6 | 1810.176 | 1812.736 | 1815.552 |
570570

571571

572572
##### Inference performance: NVIDIA DGX-2 (1x V100 32GB)
@@ -581,16 +581,16 @@ FP16
581581
| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
582582
|------------|-----------------|--------------------------------|------------------|------------------|------------------|------------------|
583583
| 1 | 384 | 144 | 6.953 | 6.888 | 6.910 | 6.932 |
584-
| 128 | 384 | 547 | 1.828 | 1.827 | 1.827 | 1.828 |
585-
| 256 | 384 | 557 | 1.795 | 1.792 | 1.793 | 1.794 |
584+
| 128 | 384 | 547 | 233.984 | 233.856 | 233.856 | 233.984 |
585+
| 256 | 384 | 557 | 459.52 | 458.752 | 459.008 | 459.264 |
586586

587587
FP32
588588

589589
| Batch size | Sequence length | Throughput Avg (sequences/sec) | Latency Avg (ms) | Latency 90% (ms) | Latency 95% (ms) | Latency 99% (ms) |
590590
|------------|-----------------|--------------------------------|------------------|------------------|------------------|------------------|
591591
| 1 | 384 | 86 | 11.580 | 11.515 | 11.535 | 11.558 |
592-
| 128 | 384 | 124 | 8.056 | 8.05 | 8.052 | 8.055 |
593-
| 256 | 384 | 125 | 8.006 | 8.002 | 8.004 | 8.005 |
592+
| 128 | 384 | 124 | 1031.168 | 1030.4 | 1030.656 | 1031.04 |
593+
| 256 | 384 | 125 | 2049.536 | 2048.512 | 2049.024 | 2049.29 |
594594

595595

596596
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).

TensorFlow2/LanguageModeling/ELECTRA/run_tf_squad.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -560,7 +560,7 @@ def main():
560560

561561
infer_time = (time.time() - iter_start)
562562
infer_perf_avg.update_state(1. * EVAL_BATCH_SIZE / infer_time)
563-
latency.append(1. * infer_time / EVAL_BATCH_SIZE)
563+
latency.append(infer_time)
564564

565565
for iter_ in range(input_ids.shape[0]):
566566

0 commit comments

Comments
 (0)