Following some guidelines of the Rust Performance Book here are some things we can try to improve performance:
- Add
codegen-units = 1 to release build
- Use a faster allocator. E.g. mimalloc works on all operating systems
Not so easy:
- properly profile to identify hot parts
- remove clones/allocations where not needed
- use profile-guided optimization (e.g. via cargo-pgo)
- unfortunately this is currently not working with LTO and the PGO version is 10-20% slower than LTO
- might be available in the future in
maturin directly, see here
Quick tests with codegen-units = 1 added to release-lto (see here) show performance improvements of benchmarks of up to 12% (mean is about 7%) while for dual_number, changes are a bit smaller (see below).
Proper benchmarks (across all benchmarks) with comparison to current release workflow are needed but this might be an easy-to-get improvement if it turns out to be faster for all cases.
- Benchmark: dual_numbers
- System: methane/CO2
main: main branch + lto
main_codegen: main branch + lto + codegen-units = 1
develop_: like main
Execution times in µs
| name |
f64 |
dual |
dual2 |
hyperdual |
dual3 |
| main |
1.1382 |
1.2325 |
1.4539 |
1.6267 |
1.7563 |
| main_codegen |
1.0229 |
1.1741 |
1.3708 |
1.5777 |
1.6316 |
| develop |
1.0138 |
1.1989 |
1.4465 |
1.589 |
1.7549 |
| develop_codegen |
0.9761 |
1.1681 |
1.4195 |
1.5446 |
1.6304 |
Slowdown t_f64/t_d for each branch/option
|
f64 |
dual |
dual2 |
hyperdual |
dual3 |
| main |
1 |
1.08285 |
1.27737 |
1.42919 |
1.54305 |
| main_codegen |
1 |
1.14782 |
1.34011 |
1.54238 |
1.59507 |
| develop |
1 |
1.18258 |
1.42681 |
1.56737 |
1.73101 |
| develop_codegen |
1 |
1.1967 |
1.45426 |
1.58242 |
1.67032 |
Relative difference in % w.r.t. main + lto for each dual number (t_d_branch - t_d_main) / t_d_main * 100
| name |
f64 |
dual |
dual2 |
hyperdual |
dual3 |
| main_codegen |
-10.13 |
-4.74 |
-5.72 |
-3.01 |
-7.10 |
| develop |
-10.93 |
-2.73 |
-0.51 |
-2.32 |
-0.08 |
| develop_codegen |
-14.24 |
-5.23 |
-2.37 |
-5.05 |
-7.17 |
Following some guidelines of the Rust Performance Book here are some things we can try to improve performance:
codegen-units = 1to release buildNot so easy:
maturindirectly, see hereQuick tests with
codegen-units = 1added torelease-lto(see here) show performance improvements of benchmarks of up to 12% (mean is about 7%) while fordual_number, changes are a bit smaller (see below).Proper benchmarks (across all benchmarks) with comparison to current release workflow are needed but this might be an easy-to-get improvement if it turns out to be faster for all cases.
main: main branch + ltomain_codegen: main branch + lto + codegen-units = 1develop_: like mainExecution times in µs
Slowdown t_f64/t_d for each branch/option
Relative difference in % w.r.t. main + lto for each dual number (t_d_branch - t_d_main) / t_d_main * 100