You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -413,7 +416,10 @@ To train your model using mixed or TF32 precision with Tensor Cores or using FP3
413
416
414
417
5. Start preprocessing.
415
418
416
-
For details of the required file format and certain preprocessing parameters (for example, `${NUMBER_OF_USER_FEATURES}` refer to [BYO dataset](#byo-dataset))
419
+
For details of the required file format and certain preprocessing parameters refer to [BYO dataset](#byo-dataset).
420
+
421
+
422
+
`${NUMBER_OF_USER_FEATURES}` defines how many user specific features are present in dataset. If using default Amazon Books dataset and `sim_preprocessing` script (as shown below), this parameter should be set to <b>1</b> (in this case, the only user specific features is <b>user_id</b>. Other features are item specific).
417
423
418
424
```bash
419
425
python preprocessing/sim_preprocessing.py \
@@ -452,6 +458,8 @@ To train your model using mixed or TF32 precision with Tensor Cores or using FP3
452
458
--amp
453
459
```
454
460
461
+
For the explanation of output logs, refer to [Log format](#log-format) section.
462
+
455
463
Now that you have your model trained and evaluated, you can choose to compare your training results with our [Training accuracy results](#training-accuracy-results). You can also choose to benchmark your performance to [Training performance benchmark](#training-performance-results), or [Inference performance benchmark](#inference-performance-results). Following the steps in these sections will ensure that you achieve the same accuracy and performance results as stated in the [Results](#results) section.
456
464
457
465
## Advanced
@@ -705,6 +713,51 @@ Inference can be run using `main.py` script by specifying the `--mode inference
705
713
706
714
Example usage of training and inference are demonstrated in [Quick Start Guide](#quick-start-guide).
707
715
716
+
### Log format
717
+
718
+
There are three type of log lines during model execution. Each of them have `step` value, however it is formatted differently based on the type of log:
719
+
- <b>step log</b> - step value is in format `[epoch, step]`:
In those logs, `data` field contains dictonary in form `{metric: value}`. Metrics logged differ based on log type (step, end of epoch, summary) and model mode (training, inference).
732
+
733
+
#### Training log data
734
+
- <b> step log </b>
735
+
- classification_loss - loss at the final output of the model.
736
+
- dien_aux_loss - loss at the output of auxiliary model.
737
+
- total_loss - sum of the above.
738
+
- samples/s - estimated throughput in samples per second.
739
+
- <b> end of epoch log </b>
740
+
- throughput - average throughput during epoch in samples/s.
741
+
- time - epoch timein seconds.
742
+
- train_auc - AUC during evaluation on train set.
743
+
- test_auc - AUC during evaluation on test set.
744
+
- train_loss - loss during evaluation on train set.
745
+
- test_loss - loss during evaluation on test set.
746
+
- latency_[mean, p90, p95, p99] - latencies in miliseconds.
747
+
- <b> summary log </b>
748
+
- time_to_train - total training timein seconds.
749
+
- train_auc, test_auc, train_loss, test_loss - results from the last epoch (see above).
750
+
751
+
#### Inference log data
752
+
- <b> step log </b>
753
+
- samples/s - estimated throughput in samples per second.
754
+
- <b> end of epoch log is not present</b>
755
+
- <b> summary log </b>
756
+
- throughput - average throughput during epoch in samples/s.
757
+
- time - total execution timein seconds.
758
+
- latency_[mean, p90, p95, p99] - latencies in miliseconds.
759
+
760
+
708
761
## Performance
709
762
710
763
The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA's latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
0 commit comments