-
Notifications
You must be signed in to change notification settings - Fork 52
Description
While working on #47 I noticed what looks like performance regressions in the cargo bench, in particular functions like map_simd and map_scalar, but quite a few others.
test tests::map_scalar ... bench: 2,022 ns/iter (+/- 264)
test tests::map_simd ... bench: 6,898 ns/iter (+/- 392)
However, comparing #49 to the commit before the refactoring, the numbers are mostly unchanged.
I then assumed it's related to unfortunate default feature flags on my machine, but playing with avx2 and sse4.1 didn't have any effect either. I also have a first implementation of #48, and it actually looks like no fallbacks are emitted for map_simd. (Tried to cross check that with radare2, but have some problems locating the right symbol / disassembly for the benchmarks). Lastly, the functions map_scalar and map_simd differ a bit, but even when I make them equal (e.g., sqrt vs. rsqrt) the difference remains.
- Is that a "known issue"?
- Did
rustcbecame so good in auto-vectorization? - Any suggestions how to extract the disassembly from
tests::map_simdandtests::map_scalar?
Running on rustc 1.29.0-nightly (9fd3d7899 2018-07-07), MBP 2015, i7-5557U.
Update: I linked the latest faster version from my SVM library and I don't see these problems in 'production':
csvm_predict_sv1024_attr1024_problems1 ... bench: 232,109 ns/iter (+/- 20,808) [faster AVX2]
csvm_predict_sv1024_attr1024_problems1 ... bench: 942,925 ns/iter (+/- 64,156) [scalar]
Update 2 Seems to be related to some intrinsics. When I dissect the benchmark, I get
test tests::map_scalar ... bench: 558 ns/iter (+/- 55) [without .abs()]
test tests::map_scalar ... bench: 556 ns/iter (+/- 33) [with .abs()]
test tests::map_simd ... bench: 144 ns/iter (+/- 17) [without .abs()]
test tests::map_simd ... bench: 883 ns/iter (+/- 64) [with .abs()]
I now think that each intrinsic should have its own benchmark, e.g. intrinsic_abs_scalar, intrinsic_abs_simd, ...
Update 3 ... oh boy. I think that by "arcane magic" Rust imports and prefers std::simd::f32x4 and friends over the faster types and methods.
So when you do my_f32s.abs(), it calls std::simd::f32x4::abs, not faster::arch::current::intrin::abs.
The reason I think that's the problem is you can now easily do my_f32s.sqrte(), which isn't implemented in faster, but in std::simd.
What's more annoying is that it doesn't warn about any collision, and that std::simd is actually slower than "vanilla" Rust.
TODO:
- Investigate import tree why that happens
- Clean up imports if import problem
- Have single-intrinsic benchmarks to detect bad intrinsics
- Have Rust warn somehow if similar name conflict happens again?
- Remove all usages of
#![feature(stdsimd)]except inlib.rs
Update 4 Now one more thing makes sense ... I sometimes got use of unstable library feature 'stdsimd' in test cases and I didn't understand why. Probably because that's where the std::simd built-ins were used.