perf: move nano particles out of sim frame #7969
Draft
bruno-dasilva wants to merge 3 commits into
Draft
Conversation
…eFrame Move the per-tick particle refresh (emit/cull/homing + VBO upload) off the sim critical path into the unsynced gadget:Update callin, gated to run at most once per sim frame. The main loop drains 0..N queued sim frames before each draw, so one Update may cover several sim frames under fast-forward / catch-up; that is handled by: - a boundary-crossing gate (`crossed`) for the periodic polls, instead of exact `n % K == 0` which a frame jump could step over; - a per-Update `tick` counter driving the amortized scan/homing/clamp cadences, so they stay even regardless of how far `n` jumps; - per-builder `elapsed` integration so emission density stays proportional to buildpower*time across throttling and frame jumps; - a range-sweeping `cullDead` that drains every `deathBuckets` frame in `(prev, n]`, not just `[n]`, so a jump can't strand VBO slots. Removes the now-unnecessary high-gamespeed emission throttle: the work is draw-rate-bound under Update, so that pressure is gone. Comment/doc cleanups throughout.
Add a saturation-driven emit keep-factor (satKeep) so particle density thins evenly as the pool fills, rather than every spray staying full until the hard cap cuts emission dead. Derived from the continuous (un-floored) 1/(runEvery*stride) form, it reproduces the old per-gameframe saturation thinning smoothly: 1.0 at an empty pool, ~1/6 at full. The pool then self-stabilises below MAX_PARTICLES, demoting the hard cap to a safety net.
Contributor
53af0d6 to
4e80a60
Compare
Contributor
Author
|
So one thing that actually makes this maybe not be as much of an improvement is there was already throttling of work for when speed was > 1. So this pushes a usually-once-per-update to a strictly-one-per-update which limits the upside benefit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NOTE: Draft until full testing is complete + screenshots/vids are added.
Context
the nano particles GL4 currently runs a bunch of work in the
::GameFramecallin, which runs during sim frames. This slows down sim particularly during catchup. So the idea is: let's move it out of::GameFrameand into::Update.FYI to readers the way that "frames" work inside engine is at the bottom in the "Addendum" section.
Work done
Two things (in two commits):
Performance Table (TODO)
Test steps
For all of the above, check behaviour before/after this change (it should look similar if not identical)
See videos:
MASTER
- EARLY GAME: https://drive.google.com/file/d/1yuE9dKRcM_90_Ya9ivxHfH8ttSQWjRLb/view?usp=drive_link
- MID GAME: https://drive.google.com/file/d/1FCWuzqZy43l0xsqxS3lNyecWvnd4ofVc/view?usp=drive_link
- LATE GAME: https://drive.google.com/file/d/13IrT1ibgL3_lrLQPxG6UA56-YxcPVPIb/view?usp=drive_link
THIS BRANCH:
- EARLY GAME: https://drive.google.com/file/d/1QjQyhdOIYiO3W40w_1moubon41VUiMpX/view?usp=drive_link
- MID GAME: https://drive.google.com/file/d/1BlqFTP5NYYWZSDTCk5iajYmQADQSW72r/view?usp=drive_link
- LATE GAME: https://drive.google.com/file/d/1fmyun2z7yYc354sZX5DPqNNY735Isz1H/view?usp=drive_link
Screenshots:
BEFORE:
AFTER:
AI / LLM usage statement:
Claude Code to do the initial POC/implementation, significant cleanup and comments by me.
Addendum - how engine runs frames
The main loop is
Update → Draw, repeating. Each iteration produces one draw frame and drains any queued sim frame packets first (0..N sim frames per iteration).CGame::Update(synced) dispatchesSimFrame()calls asNETMSG_NEWFRAMEpackets arrive;CGame::Drawthen does an unsynced update phase (CGame::UpdateUnsynced: timings, interpolation, camera, GUI, sound, world-drawer prep) followed by rendering (DrawGenesis→DrawScreenPost). The sim burst is capped at ~500 ms (minDrawFPS) so draw always gets to run. It's all one thread — sim and rendering are not concurrent; parallelism only happens inside a phase.Conversely, if no sim frames are in the queue the main loop runs
Draw/UpdateUnsyncedas fast as possible — many draw iterations can pass between successive sim frames, with visuals interpolating smoothly in between viaglobalRendering->timeOffset.CGame::SimFrameGAME_SPEED)GameFrameCGame::DrawCGame::UpdateUnsynced(inside draw frame)Benchmark names.
fightertestreports these phases as three peer buckets Sim / Update / Render — Sim ≈CGame::SimFrame, Update ≈CGame::UpdateUnsynced, Render ≈DrawGenesis→DrawScreenPost.