Skip to content

perf: move nano particles out of sim frame #7969

Draft
bruno-dasilva wants to merge 3 commits into
beyond-all-reason:masterfrom
bruno-dasilva:bruno/move-nano-particles-to-update-frame
Draft

perf: move nano particles out of sim frame #7969
bruno-dasilva wants to merge 3 commits into
beyond-all-reason:masterfrom
bruno-dasilva:bruno/move-nano-particles-to-update-frame

Conversation

@bruno-dasilva

@bruno-dasilva bruno-dasilva commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

NOTE: Draft until full testing is complete + screenshots/vids are added.

Context

the nano particles GL4 currently runs a bunch of work in the ::GameFrame callin, which runs during sim frames. This slows down sim particularly during catchup. So the idea is: let's move it out of ::GameFrame and into ::Update.

FYI to readers the way that "frames" work inside engine is at the bottom in the "Addendum" section.

Work done

Two things (in two commits):

  1. move the nano particle logic out of GameFrame() callin into Update(). This has a slight change in complexity:
    • since an Update() frame can run after potentially many sim frames (in catch-up), we now need to integrate over multiple sim frames, too instead of just over the existing amortization striding.
  2. add an explicit ramping down of particles as the pool saturates to match the old (implicit) ramping

Performance Table (TODO)

time before after diff
0-5min
5-10min
10-20min
20-30min
30-60min

Test steps

For all of the above, check behaviour before/after this change (it should look similar if not identical)

  • Look at a large number of nanos after at a bunch of different game speeds
  • Look at a large number of nanos while paused
  • Run replays of a few games at max speed and see what the performance difference is

See videos:
MASTER
- EARLY GAME: https://drive.google.com/file/d/1yuE9dKRcM_90_Ya9ivxHfH8ttSQWjRLb/view?usp=drive_link
- MID GAME: https://drive.google.com/file/d/1FCWuzqZy43l0xsqxS3lNyecWvnd4ofVc/view?usp=drive_link
- LATE GAME: https://drive.google.com/file/d/13IrT1ibgL3_lrLQPxG6UA56-YxcPVPIb/view?usp=drive_link
THIS BRANCH:
- EARLY GAME: https://drive.google.com/file/d/1QjQyhdOIYiO3W40w_1moubon41VUiMpX/view?usp=drive_link
- MID GAME: https://drive.google.com/file/d/1BlqFTP5NYYWZSDTCk5iajYmQADQSW72r/view?usp=drive_link
- LATE GAME: https://drive.google.com/file/d/1fmyun2z7yYc354sZX5DPqNNY735Isz1H/view?usp=drive_link

Screenshots:

BEFORE:

Screenshot 2026-06-15 010110 Screenshot 2026-06-15 005958

AFTER:

Screenshot 2026-06-15 010609

AI / LLM usage statement:

Claude Code to do the initial POC/implementation, significant cleanup and comments by me.

Addendum - how engine runs frames

The main loop is Update → Draw, repeating. Each iteration produces one draw frame and drains any queued sim frame packets first (0..N sim frames per iteration). CGame::Update (synced) dispatches SimFrame() calls as NETMSG_NEWFRAME packets arrive; CGame::Draw then does an unsynced update phase (CGame::UpdateUnsynced: timings, interpolation, camera, GUI, sound, world-drawer prep) followed by rendering (DrawGenesisDrawScreenPost). The sim burst is capped at ~500 ms (minDrawFPS) so draw always gets to run. It's all one thread — sim and rendering are not concurrent; parallelism only happens inside a phase.

Conversely, if no sim frames are in the queue the main loop runs Draw/UpdateUnsynced as fast as possible — many draw iterations can pass between successive sim frames, with visuals interpolating smoothly in between via globalRendering->timeOffset.

main-loop iteration  (repeats as fast as possible)
├── CGame::Update            (synced)
│   └── SimFrame × 0..N      ← processes queued sim frames capped at
|                              ~500ms per iteration
└── CGame::Draw              (unsynced)
    ├── UpdateUnsynced       ← unsynced update phase
    └── DrawGenesis → DrawScreenPost  ← render world + screen
Phase Rate Synced? Responsibility
Sim frameCGame::SimFrame fixed 30 Hz (GAME_SPEED) yes advance deterministic state: units, pathing, projectiles, line-of-sight, scripts, Lua GameFrame
Draw frameCGame::Draw variable no update phase (see below) + render world/screen
Update phaseCGame::UpdateUnsynced (inside draw frame) per draw frame no timings, interpolation, camera, GUI, sound, world-drawer prep

Benchmark names. fightertest reports these phases as three peer buckets Sim / Update / Render — Sim ≈ CGame::SimFrame, Update ≈ CGame::UpdateUnsynced, Render ≈ DrawGenesisDrawScreenPost.

…eFrame

Move the per-tick particle refresh (emit/cull/homing + VBO upload) off the
sim critical path into the unsynced gadget:Update callin, gated to run at
most once per sim frame. The main loop drains 0..N queued sim frames before
each draw, so one Update may cover several sim frames under fast-forward /
catch-up; that is handled by:

- a boundary-crossing gate (`crossed`) for the periodic polls, instead of
  exact `n % K == 0` which a frame jump could step over;
- a per-Update `tick` counter driving the amortized scan/homing/clamp
  cadences, so they stay even regardless of how far `n` jumps;
- per-builder `elapsed` integration so emission density stays proportional
  to buildpower*time across throttling and frame jumps;
- a range-sweeping `cullDead` that drains every `deathBuckets` frame in
  `(prev, n]`, not just `[n]`, so a jump can't strand VBO slots.

Removes the now-unnecessary high-gamespeed emission throttle: the work is
draw-rate-bound under Update, so that pressure is gone. Comment/doc cleanups
throughout.
Add a saturation-driven emit keep-factor (satKeep) so particle density
thins evenly as the pool fills, rather than every spray staying full until
the hard cap cuts emission dead. Derived from the continuous (un-floored)
1/(runEvery*stride) form, it reproduces the old per-gameframe saturation
thinning smoothly: 1.0 at an empty pool, ~1/6 at full. The pool then
self-stabilises below MAX_PARTICLES, demoting the hard cap to a safety net.
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Integration Test Results

16 tests  ±0   8 ✅ ±0   3s ⏱️ ±0s
 1 suites ±0   8 💤 ±0 
 1 files   ±0   0 ❌ ±0 

Results for commit 8b81049. ± Comparison against base commit 2b7b14b.

♻️ This comment has been updated with latest results.

@bruno-dasilva bruno-dasilva force-pushed the bruno/move-nano-particles-to-update-frame branch from 53af0d6 to 4e80a60 Compare June 16, 2026 01:24
@bruno-dasilva

Copy link
Copy Markdown
Contributor Author

So one thing that actually makes this maybe not be as much of an improvement is there was already throttling of work for when speed was > 1. So this pushes a usually-once-per-update to a strictly-one-per-update which limits the upside benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants