Skip to content

Surprising benchmark numbers #51

@ralfbiedert

Description

@ralfbiedert

While working on #47 I noticed what looks like performance regressions in the cargo bench, in particular functions like map_simd and map_scalar, but quite a few others.

test tests::map_scalar                                ... bench:       2,022 ns/iter (+/- 264)
test tests::map_simd                                  ... bench:       6,898 ns/iter (+/- 392)

However, comparing #49 to the commit before the refactoring, the numbers are mostly unchanged.

I then assumed it's related to unfortunate default feature flags on my machine, but playing with avx2 and sse4.1 didn't have any effect either. I also have a first implementation of #48, and it actually looks like no fallbacks are emitted for map_simd. (Tried to cross check that with radare2, but have some problems locating the right symbol / disassembly for the benchmarks). Lastly, the functions map_scalar and map_simd differ a bit, but even when I make them equal (e.g., sqrt vs. rsqrt) the difference remains.

  • Is that a "known issue"?
  • Did rustc became so good in auto-vectorization?
  • Any suggestions how to extract the disassembly from tests::map_simd and tests::map_scalar?

Running on rustc 1.29.0-nightly (9fd3d7899 2018-07-07), MBP 2015, i7-5557U.

Update: I linked the latest faster version from my SVM library and I don't see these problems in 'production':

csvm_predict_sv1024_attr1024_problems1 ... bench:     232,109 ns/iter (+/- 20,808) [faster AVX2]
csvm_predict_sv1024_attr1024_problems1 ... bench:     942,925 ns/iter (+/- 64,156) [scalar]

Update 2 Seems to be related to some intrinsics. When I dissect the benchmark, I get

test tests::map_scalar                                ... bench:         558 ns/iter (+/- 55) [without .abs()]
test tests::map_scalar                                ... bench:         556 ns/iter (+/- 33) [with .abs()]
test tests::map_simd                                  ... bench:         144 ns/iter (+/- 17) [without .abs()]
test tests::map_simd                                  ... bench:         883 ns/iter (+/- 64) [with .abs()]

I now think that each intrinsic should have its own benchmark, e.g. intrinsic_abs_scalar, intrinsic_abs_simd, ...

Update 3 ... oh boy. I think that by "arcane magic" Rust imports and prefers std::simd::f32x4 and friends over the faster types and methods.

So when you do my_f32s.abs(), it calls std::simd::f32x4::abs, not faster::arch::current::intrin::abs.

The reason I think that's the problem is you can now easily do my_f32s.sqrte(), which isn't implemented in faster, but in std::simd.

What's more annoying is that it doesn't warn about any collision, and that std::simd is actually slower than "vanilla" Rust.

TODO:

  • Investigate import tree why that happens
  • Clean up imports if import problem
  • Have single-intrinsic benchmarks to detect bad intrinsics
  • Have Rust warn somehow if similar name conflict happens again?
  • Remove all usages of #![feature(stdsimd)] except in lib.rs

Update 4 Now one more thing makes sense ... I sometimes got use of unstable library feature 'stdsimd' in test cases and I didn't understand why. Probably because that's where the std::simd built-ins were used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions