Skip to content

test ivand's matrix optimization#8

Open
bruno-dasilva wants to merge 4 commits into
masterfrom
matrix-sse-optimization
Open

test ivand's matrix optimization#8
bruno-dasilva wants to merge 4 commits into
masterfrom
matrix-sse-optimization

Conversation

@bruno-dasilva

Copy link
Copy Markdown
Owner

No description provided.

lhog added 4 commits April 19, 2026 21:01
- Add MatrixMatrixMultiplySSE using in-register shuffle vs scalar loads
- SSE version uses 12 _mm_load1_ps (scalar loads from memory)
- SSENew uses 4 _mm_load_ps (vector loads) + _mm_shuffle_ps (in-register)
- Result: ~30% faster (fewer memory accesses: 8 vs 16 loads)

The _mm_shuffle_ps instruction is SSE1. The optimization is
algorithmic - reduced memory bandwidth, not newer SIMD instructions.

- operator* now uses MatrixMatrixMultiplySSE by default

Benchmark (40M iterations):
- SSEOld: 1.39s
- SSE (new): 1.73s

Tests verify bitwise equivalence across 100000 random matrices
with affine assumptions (m2.m[3]=0, m2.m[7]=0).
Comparison done using operator* vs in-place MatrixMatrixMultiplySSEOld.
Remove MatrixMatrixMultiplySSEOld from Matrix44f.cpp (already preserved
in test file). Make MatrixMatrixMultiplySSE static inline — it has no
callers outside this translation unit. Remove the now-unnecessary
forward declaration from Matrix44f.h.
- Rename m2r to m2c and fix comments (column-major, not row-major)
- Fix SSE_Opt benchmark to use m1 = m1 * m_ for genuine data dependency
- Remove unused TestMMSSENew() and TestMMSSEOldVsSSENew() helpers
- Tabify indentation in test file to match rest of codebase
@github-actions

github-actions Bot commented Apr 25, 2026

Copy link
Copy Markdown

bar-benchmark — PR #8

candidate 2f50bd5 vs baseline eb1c69f

sim trimmed mean (ms) with 95% CI on the relative delta

scenario candidate baseline Δ (95% CI) n cand n base
fightertest-bots 23.88 ms ♻️ 23.86 ms ♻️ $\color{green}{-0.01\%} \text{ to } \color{red}{+0.21\%}$ 50 80
fightertest-aircraft 19.22 ms ♻️ 19.17 ms ♻️ $\color{red}{+0.16\%} \text{ to } \color{red}{+0.33\%}$ 50 70
fightertest-tanks 24.92 ms ♻️ 24.82 ms ♻️ $\color{red}{+0.23\%} \text{ to } \color{red}{+0.59\%}$ 50 70
fightertest-pathfinding 21.80 ms ♻️ 21.77 ms ♻️ $\color{red}{+0.01\%} \text{ to } \color{red}{+0.25\%}$ 50 70
lategame1 23.27 ms 23.39 ms ♻️ $\color{green}{-1.23\%} \text{ to } \color{red}{+0.28\%}$ 40 110

💰 compute cost: $0.28 · 1 fresh leg · 9 cached at $0

last updated: 2026-04-25T15:30:12.405Z · workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants