Surprising benchmark numbers

While working on #47 I noticed what looks like performance regressions in the `cargo bench`, in particular functions like `map_simd` and `map_scalar`, but quite a few others.

```
test tests::map_scalar                                ... bench:       2,022 ns/iter (+/- 264)
test tests::map_simd                                  ... bench:       6,898 ns/iter (+/- 392)
```

However, comparing #49 to the commit before the refactoring, the numbers are mostly unchanged. 

I then assumed it's related to unfortunate default feature flags on my machine, but playing with `avx2` and `sse4.1` didn't have any effect either. I also have a first implementation of #48, and it actually looks like no fallbacks are emitted for `map_simd`. (Tried to cross check that with `radare2`, but have some problems locating the right symbol / disassembly for the benchmarks). Lastly, the functions `map_scalar` and `map_simd` differ a bit, but even when I make them equal (e.g., `sqrt` vs. `rsqrt`) the difference remains.

- Is that a "known issue"? 
- Did `rustc` became so good in auto-vectorization?
- Any suggestions how to extract the disassembly from `tests::map_simd` and `tests::map_scalar`?

Running on `rustc 1.29.0-nightly (9fd3d7899 2018-07-07)`, MBP 2015, i7-5557U.

**Update:** I linked the latest faster version from my SVM library and I don't see these problems in 'production':

```
csvm_predict_sv1024_attr1024_problems1 ... bench:     232,109 ns/iter (+/- 20,808) [faster AVX2]
csvm_predict_sv1024_attr1024_problems1 ... bench:     942,925 ns/iter (+/- 64,156) [scalar]
```

**Update 2** Seems to be related to some intrinsics.  When I dissect the benchmark, I get 

```
test tests::map_scalar                                ... bench:         558 ns/iter (+/- 55) [without .abs()]
test tests::map_scalar                                ... bench:         556 ns/iter (+/- 33) [with .abs()]
test tests::map_simd                                  ... bench:         144 ns/iter (+/- 17) [without .abs()]
test tests::map_simd                                  ... bench:         883 ns/iter (+/- 64) [with .abs()]
```

I now think that each intrinsic should have its own benchmark, e.g. `intrinsic_abs_scalar`, `intrinsic_abs_simd`, ...


**Update 3** ... oh boy. I think that by "arcane magic" Rust imports and prefers `std::simd::f32x4` and friends over the `faster` types and methods. 

So when you do `my_f32s.abs()`, it calls `std::simd::f32x4::abs`, not `faster::arch::current::intrin::abs`. 

The reason I think that's the problem is you can now easily do `my_f32s.sqrte()`, which isn't implemented in `faster`, but in `std::simd`. 

What's more annoying is that it doesn't warn about any collision, and that `std::simd` is actually slower than "vanilla" Rust. 

**TODO:**
- Investigate import tree why that happens
- Clean up imports if import problem
- Have single-intrinsic benchmarks to detect bad intrinsics 
- Have Rust warn somehow if similar name conflict happens again? 
- Remove all usages of `#![feature(stdsimd)]` except in `lib.rs`

**Update 4** Now one more thing makes sense ... I sometimes got `use of unstable library feature 'stdsimd'` in test cases and I didn't understand why. Probably because that's where the `std::simd` built-ins were used. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surprising benchmark numbers #51

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Surprising benchmark numbers #51

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions