| System | Model type | Runtime | Device type | Precision | Video | Video length (s) - # Frames | FPS | Frame size | Display settings | Pose model backbone | Avg Inference time ± Std (including 1st inference) |
Avg Inference time ± Std | Average FPS ± Std | Model size |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Linux | ONNX | ONNX | CUDA | Full precision (FP32) | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
29.02ms ± 47.59ms | 27.8ms ± 2.32ms | 36 ± 3 | 92.12 MB |
| Linux | ONNX | ONNX | CPU | Full precision (FP32) | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
146.12ms ± 13.26ms | 146.11 ± 13.25 | 7 ± 1 | 92.12 MB |
| Linux | PyTorch | PyTorch | CUDA | Full precision (FP32) | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
6.04ms ± 7.37ms | 5.97ms ± 6.8ms | 271 ± 112 | 96.5 MB |
| Linux | PyTorch | PyTorch | CPU | Full precision (FP32) | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
365.26ms ± 13.88ms | 365.17ms ± 13.44ms | 3 ± 0 | 96.5 MB |
| Linux | ONNX | TensorRT | CUDA | Full precision (FP32) - no caching | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
55.32ms ± 1254.16ms | 22.93ms ± 0.88 | 44 ± 2 | 92.12 MB |
| Linux | ONNX | TensorRT | CUDA | Full precision (FP32) - engine caching | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
20.8ms ± 3.4ms | 20.72ms ± 1.25ms | 48 ± 3 | 92.12 MB |
| Linux | ONNX | TensorRT | CUDA | FP16 | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
34.37ms ± 858.96ms | 12.19ms ± 0.87 | 82 ± 6 | 46.16 MB |
| Linux | ONNX | ONNX | CUDA | FP16 | Ventral gait | 10s - 1.5k | 150 | (658,302) | None | ResNet50 (bu) |
21.74ms ± 43.24ms | 20.62ms ± 2.5ms | 49 ± 5 | 46.16 MB |
| Linux | PyTorch | PyTorch | CUDA | FP32 | Ventral gait | 10s - 1.5k | 150 | (164,75) | Resize=0.25 | ResNet50 (bu) |
22.27ms ± 12.5ms | 22.16ms ± 11.65ms | 70 ± 68 | 96.5 MB |
| Linux | ONNX | ONNX | CUDA | (FP32) | Ventral gait | 10s - 1.5k | 150 | (164,75) | Resize=0.25 | ResNet50 (bu) |
6.18ms ± 37.03ms | 5.22ms ± 0.86ms | 195 ± 25 | |
| Linux | ONNX | ONNX | CPU | (FP32) | Ventral gait | 10s - 1.5k | 150 | (164,75) | Resize=0.25 | ResNet50 (bu) |
13.17ms ± 1.25ms | 13.17ms ± 1.23ms | 76 ± 4 | |
| Linux | ONNX | TensorRT | CUDA | (FP32) | Ventral gait | 10s - 1.5k | 150 | (164,75) | Resize=0.25 | ResNet50 (bu) |
15.12ms ± 458.27ms | 3.28ms ± 0.24ms | 306 ± 23 | |
| Linux | ONNX | ONNX | CUDA | FP16 | Ventral gait | 10s - 1.5k | 150 | (164,75) | Resize=0.25 | ResNet50 (bu) |
5.83ms ± 33.27ms | 4.97ms ± 1.5ms | 214 ± 45 | |
| Linux | ONNX | ONNX | CUDA | FP16 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
17.08 ms ± 139.91ms | 12.82 ms ± 1.52 | 79 ± 8 | 45.50 MB |
| Linux | ONNX | ONNX | CUDA | FP32 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
25.06 ms ± 129.74ms | 21.1 ms ± 0.82ms | 47 ± 2 | 90.79 MB |
| Linux | ONNX | TensorRT | CUDA | FP32 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
6.18 ms ± 1376.44 ms | 14.22 ms ± 0.48ms | 70 ± 3 | |
| Linux | ONNX | TensorRT | CUDA | FP16 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
49.81 ms ± 1361.7ms | 8.3 ms ± 0.75ms | 121 ± 11 | |
| Linux | PyTorch | PyTorch | CUDA | FP32 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
7.7 ms ± 5.38 ms | 7.78 ms ± 6.0 ms | 185 ± 96 | |
| Linux | PyTorch | PyTorch | CPU | FP32 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
167.33 ms ± 21.0 ms | 167.32 ms ± 21.01 ms | 6 ± 1 | |
| Linux | ONNX | ONNX | CPU | FP32 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
85.64 ms ± 8.23 ms | 85.65 ms ± 8.23 | 12 ± 1 | |
| Linux | ONNX | ONNX | CPU | FP16 | Pigeon | 36s - ~1k | 30 | (480, 270) | Resize=0.25 | ResNet50 + SSDLite detector (td) |
161.32 ms ± 18.29ms | 161.3 ms ± 18.29ms | 6 ± 1 |
** CUDA: NVIDIA GeForce RTX 3050 (6GB) ** CPU: 13th Gen Intel Core i7-13620H × 16 ** Linux: Ubuntu 24.04 LTS
^ Startup time at inference for a TensorRT engine takes between 30 and 50 seconds, which skews the inference time measurement. Caching is used to reduce that time.