PoC: Make the codelets operate entirely in registers#113
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #113 +/- ##
==========================================
- Coverage 98.80% 98.18% -0.62%
==========================================
Files 9 9
Lines 2086 1871 -215
==========================================
- Hits 2061 1837 -224
- Misses 25 34 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This reverts commit a2f7c2a.
|
This is now completely green on benchmarks on both Zen4 and M4. At this point there are still lots of register spills in the f64 version, but it's actually giving us a huge perf boost on x86 at some sizes without regressing anything, and Apple M4 is also all green but the gains are less dramatic there. This still needs cleanup and better unconditional enabling of codelets (those are the failing tests), but the core is already in place and universally beneficial. |
|
Results on the latest version on the M2 chip: |
…rformance on avx2
|
This could really use linebender/fearless_simd#206 but I've polyfilled it for now. |
No description provided.