try combined optimizations by bruno-dasilva · Pull Request #11 · bruno-dasilva/RecoilEngine

bruno-dasilva · 2026-04-26T08:14:49Z

No description provided.

* Optimize LocalModelPiece. Move some cold data to pointers * Apply optimized loop over dirty pieces (only recalculate relative transforms where changed)

Default GCC/Clang builds targeted SSE2 via a wall of -mno-sse3/ssse3/sse4.* flags in the generic MARCH fallback. Replace with -msse4.2, which pulls in SSE3/SSSE3/SSE4.1. AVX/FMA stay banned — FMA contraction changes FP bit patterns and desyncs the deterministic simulation. Minimum x86 CPU is now Nehalem (2008) / Bulldozer (2011). Requires a replay-level sync validation pass before shipping — autovectorization output will differ from prior builds even without FMA. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> explicitly list sse4 flags revert part of the code

Change Moves cob engine's internal storage/tracking of threads to a slot pool backed by a std::deque instead of a spring::unordered_map, to eek out some extra performance. There's a hack due to ordering requirements of the existing code but I figure a second PR can clean that up - let's keep this PR focused. Context Some instrumentation of our hash maps found that cob was one of the worst users of our spring::unordered_map, because of how many tombstones it would leave behind in the map (cob has constant thread churn). `spring::unordered_map<int, CCobThread> threadInstances` performance over 5000 late game sim frames was: ``` [HashContainerStats] Top 20 containers by total time (ns/op): type | total-ms | find-hit | find-miss | insert | erase | rehash synced map<int, CCobThread> | 3031.91ms | 130ns | 1360ns | 1292ns | 0ns | 219100ns ``` Changing it to a chained hashmap fixed the find-miss() ns per op (since it didnt have to search the entire array of tombstones) but made the find() worse because it had to chase bucket pointers. So, taking a step back and realizing that nothing really iterates on this and the amount of churn it sees, an object pool is probably a better data structure. So let's try it!

github-actions · 2026-04-26T08:54:22Z

bar-benchmark — PR #11

candidate 60248b5 vs baseline eb1c69f

sim trimmed mean (ms) with 95% CI on the relative delta

scenario	candidate	baseline	Δ (95% CI)	n cand	n base
fightertest-bots	23.57 ms ♻️	23.85 ms ♻️	$\color{green}{-1.32\%} \text{ to } \color{green}{-1.03\%}$	50	70
fightertest-aircraft	19.37 ms ♻️	19.17 ms ♻️	$\color{red}{+0.96\%} \text{ to } \color{red}{+1.13\%}$	50	60
fightertest-tanks	24.70 ms ♻️	24.82 ms ♻️	$\color{green}{-0.74\%} \text{ to } \color{green}{-0.16\%}$	50	60
fightertest-pathfinding	21.70 ms ♻️	21.78 ms ♻️	$\color{green}{-0.49\%} \text{ to } \color{green}{-0.20\%}$	50	60
lategame1	21.83 ms	23.42 ms ♻️	$\color{green}{-7.58\%} \text{ to } \color{green}{-5.95\%}$	70	100

Per-VM distribution box plots (5)

_{💰 compute cost: $0.68 · 1 fresh leg · 9 cached at $0} _{last updated: 2026-04-26T09:53:34.656Z · [workflow run](https://github.com/bruno-dasilva/RecoilEngine/actions/runs/24953555306)}

lhog and others added 6 commits April 22, 2026 05:30

Optimize animation execution

36013b4

* Optimize LocalModelPiece. Move some cold data to pointers * Apply optimized loop over dirty pieces (only recalculate relative transforms where changed)

Some fresh fixes

e8950be

fix the leak

d24e2d8

add some asserts + some cleanup

60248b5

bruno-dasilva added the benchmark label Apr 26, 2026

bruno-dasilva force-pushed the master branch 2 times, most recently from 6ae3d38 to eff5a00 Compare May 27, 2026 08:43

bruno-dasilva force-pushed the master branch from e326f65 to 2a39fb0 Compare June 17, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try combined optimizations#11

try combined optimizations#11
bruno-dasilva wants to merge 6 commits into
masterfrom
bruno/try-all

bruno-dasilva commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bruno-dasilva commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

bar-benchmark — PR #11

sim trimmed mean (ms) with 95% CI on the relative delta

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 26, 2026 •

edited

Loading