Skip to content

Luajit spike#29

Draft
bruno-dasilva wants to merge 13 commits into
masterfrom
luajit-spike
Draft

Luajit spike#29
bruno-dasilva wants to merge 13 commits into
masterfrom
luajit-spike

Conversation

@bruno-dasilva

Copy link
Copy Markdown
Owner

No description provided.

bruno-dasilva and others added 10 commits June 17, 2026 22:19
Vendor the LuaJIT 2.1 rolling release (GC64, JIT enabled) under
rts/lib/luajit/ as the basis for migrating the engine's Lua off the
modified PUC Lua 5.1. Source only; build artifacts and nested .git
stripped. Builds and runs standalone on the gcc 13.3 toolchain.

First pass intentionally ignores sync determinism to measure the raw
performance delta before deciding whether the determinism work is worth
it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Repurpose rts/lib/lua so the engine links LuaJIT 2.1 instead of the
modified PUC Lua 5.1, without touching the rts/Lua binding layer:

- CMakeLists builds LuaJIT via its own Makefile (static, PIC, amalgamated)
  and wraps it in the `lua` target carrying LuaUser.cpp + a custom-symbol
  shim.
- lua.h/lualib.h/lauxlib.h/luaconf.h become forwarders to LuaJIT's headers
  (wrapped in extern "C"); lua.h re-declares the fork's custom symbols.
- LuaInclude.h: GetLuaContextData now reads the context via lua_getallocf
  (LuaJIT keeps global_State opaque); the L->errorJmp pcall check becomes a
  conservative spring_lua_in_pcall() stub; lua_lock/unlock map to the
  (no-op) LuaMutex* hooks since LuaJIT exposes no lock hooks.
- spring_luajit_shim.cpp implements lua_calchash/lua_pushhstring,
  lua_set_* (no-op io sandbox), and luaL_loadbuffer_privileged.
- SerializeLuaState.cpp stubbed: CReg Lua save/load walked PUC internals
  that do not exist in LuaJIT (disabled; irrelevant to a bot-game bench).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- lauxlib.h forwarder: add PUC 5.1 compat aliases (luaL_reg -> luaL_Reg,
  luaI_openlib -> luaL_openlib) needed by vendored luasocket.
- LuaInclude.h: the fork built on 32-bit LUA_NUMBER (float) / LUA_INTEGER
  (int) and lua_toboolean->bool; LuaJIT uses double / ptrdiff_t / int.
  Narrow the engine-facing accessors back at the boundary via function-like
  macros (lua_tonumber/luaL_checknumber -> float, lua_tointeger/
  luaL_checkinteger -> int) and cast the opt-wrappers, so type-exact
  templates (std::clamp/min) keep compiling. Add spring_lua_toboolean and
  luaL_checknumber_noassert adapters.

spring-headless now builds and links LuaJIT (GC64, JIT on).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
BAR content (e.g. modules/lava.lua) contains string escapes like \^ that
Lua 5.1 silently accepts (drops the backslash, keeps the char). LuaJIT
follows Lua 5.2+ and errors ("invalid escape sequence"), which aborted
LuaRules/LuaUI loading. Patch lj_lex.c's lex_string to fall through and
save the char for unknown non-digit escapes, matching 5.1. Malformed
\x/\u/\ddd escapes keep their own errors.

Also make the LuaJIT archive rebuild when any vendored source changes.

With this, spring-headless runs the fightertest benchmark startscript to
completion (frame 2100, clean exit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d hash)

Make the synced VM deterministic across clients so LuaJIT can drive synced
gameplay, while unsynced keeps the full JIT:

- Fixed string-hash seed: build LuaJIT with LUAJIT_SECURITY_{PRNG,STRHASH,
  STRID}=0 so g->str.seed is fixed -> table iteration order (pairs/next) is
  identical on every client/CPU/run.
- Deterministic math: route the interpreter's transcendentals and the ^
  operator through streflop's bundled fdlibm (bit-identical cross-platform),
  via lj_sfm_* wrappers (spring_luajit_detmath.cpp) and vm_x86.dasc redirects.
  Exact ops (sqrt/floor/ceil/mod/+-*/) stay native (IEEE-correctly-rounded).
- Synced VM runs interpreter-only: CSyncedLuaHandle::Init calls
  luaJIT_setmode(ENGINE|OFF), eliminating JIT-vs-interpreter FP divergence
  (e.g. the x^2 -> x*x fold) and CPU-dependent codegen. Unsynced keeps JIT.
- math.random already routes to the engine's synced RNG; ffi is not opened
  in synced states.

Builds and runs the fightertest benchmark to completion; sim perf unchanged
(8.77 ms/frame, still ~6% faster than PUC) since synced sim is dominated by
C++ callout bodies, not Lua interpretation. NOTE: cross-platform determinism
still needs validation via the sync-fuzz harness (x64 Linux/Windows, ARM64).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y uses

The previous Tier 1 commit redirected math through streflop in vm_x86.dasc,
but TARGET_LJARCH=x64 so the build consumes vm_x64.dasc — the redirects never
compiled in (the benchmarked binary still called libc libm). Fixes:

- vm_x64.dasc: route math.log/sin/cos/.../pow and the `^` operator through
  lj_sfm_* (streflop), matching what the PUC fork's lmathlib did. This is the
  dasc DynASM actually processes on x86_64.

- CMakeLists: build only the `libluajit.a` target (LJCORE_O=ljamalg.o), not
  the full `amalg`/`all`. `all` also links the standalone luajit exe, which
  fails: lj_sfm_* are defined in liblua's detmath (C++/streflop), not LuaJIT's
  C world. Also drop *.h from the dependency GLOB (a `make clean` deletes
  generated headers, leaving phantom DEPENDS with no rule).

- Split detmath into its own archive linked AFTER libluajit.a: the dependency
  is one-way (libluajit.a -> detmath -> streflop), so detmath must follow on
  the link line or the linker discards lj_sfm_* before the reference appears.

- detmath: forward the global ::fastiroot(double) to streflop_libm::fastiroot.
  mpsqrt.cpp declares fastiroot at global scope but defines it in-namespace; the
  engine never pulled mpsqrt.o before, but streflop::pow does. Shimmed here to
  keep the spike self-contained rather than patching the streflop submodule.

Verified: all 15 lj_sfm_* land in the binary, lj_ff_math_pow calls lj_sfm_pow,
fightertest runs to f=2100 exit 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Double-typed streflop fdlibm routines (__ieee754_pow et al.) return garbage
in the engine's synced FPU context (float precision, streflop_init<Simple> per
LuaUser.cpp) -- e.g. pow(2,3) came back as 92581. This silently broke every BAR
Lua path that relies on the `^` operator for exact integer powers; notably
base64Decode (which does 2^shift), so scenario/benchmark JSON failed to decode
and the fightertest battle never spawned any units. The earlier "6% sim / 24%
draw" numbers were measured against that empty simulation and are invalid.

Match the PUC fork exactly: its lmathlib routed math through streflop's Simple
(32-bit float) functions, which agree with the ambient float-precision FPU mode
and are the precision BAR's synced code was written against. Wrapping args in
streflop::Simple fixes the decode; fightertest now spawns ~8160 units and runs
to f=2100, matching PUC's ~8260 within float-RNG drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tree)

LuaJIT's Makefile builds strictly in-tree, writing intermediate objects
(host/minilua.o) and generated headers (luajit.h, lj_vm.S, lj_*def.h)
into src/. The official docker build mounts the source read-only, so the
in-tree make failed with 'can't create host/minilua.o: Read-only file
system'.

Copy the vendored luajit tree into the CMake binary dir and build there
(make clean first, since copy_directory doesn't prune stale objects), and
repoint consumers' include path at the copy so they find the generated
luajit.h. Local writable-source builds behave the same and no longer
write artifacts back into the source tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LuaJIT's Makefile detects the target arch at parse time via $(CC) -E
lj_arch.h, so even the 'clean' invocation needs a CC that exists. The
docker image only has gcc-13, not a bare gcc, so clean aborted with
'Unsupported target architecture'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The vendored LuaJIT build needs a host compiler for its build-time tools
(minilua/buildvm run on the build machine and emit the target VM), plus
GNU make. The amd64-windows image had only the mingw cross toolchain and
ninja, so the LuaJIT step failed (cmake's 'make' exec'd to nothing, then
once make was present, host minilua couldn't find a native cc/libc).

- Image: add gcc-13 + libc6-dev (native host toolchain for minilua/buildvm)
  and make to the amd64-windows Dockerfile.
- CMake: when CMAKE_SYSTEM_NAME is Windows, drive LuaJIT's documented cross
  recipe (HOST_CC=gcc-13, CROSS=x86_64-w64-mingw32-, CC=gcc-posix,
  TARGET_SYS=Windows) instead of the native 'CC=<compiler> -fPIC' form,
  which would wrongly build the host tools for Windows.

Produces a PE32+ spring.exe with the det-math (lj_sfm_*) symbols linked in.
NOTE: the published amd64-windows image sha in images_versions.sh must be
rebuilt/re-pinned to include the new gcc-13/make/libc6-dev packages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bruno-dasilva and others added 3 commits June 17, 2026 22:30
The vendored LuaJIT cross-build needs gcc-13 + make + libc6-dev in the
builder image (host tools for minilua/buildvm). Those packages were added
to the amd64-windows image and pushed to this fork's ghcr namespace, so:

- pin the new amd64-windows image digest in images_versions.sh
- pull the amd64-windows image from ghcr.io/bruno-dasilva in engine-build.yml
  (linux images still come from beyond-all-reason)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI failed with 'No rule to make target clean' in the LuaJIT step: a
stale/partial luajit tree left in the build dir by an earlier run had a
truncated Makefile (no clean target), and copy_directory MERGES into an
existing dir without pruning, so the leftover survived.

Remove the build-dir copy before copying fresh, guaranteeing a pristine
from-scratch tree, and drop the fragile 'make clean' step (its only job
was pruning copy_directory leftovers). No image rebuild needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The repo-root .gitignore has a bare 'Makefile' rule that ignores every
file named Makefile repo-wide, so the vendored rts/lib/luajit/Makefile and
rts/lib/luajit/src/Makefile were never committed. Local builds worked off
leftover files in the working tree, but a fresh CI checkout had no LuaJIT
Makefile -> 'make: No rule to make target libluajit.a'.

Force-add both so the cross-build has its Makefile in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

bar-benchmark — PR #29

candidate 290d546 vs baseline eb1c69f

sim trimmed mean (ms) with 95% CI on the relative delta

scenario candidate baseline Δ (95% CI) n cand n base
fightertest-bots 23.86 ms 23.88 ms ♻️ $\color{green}{-0.23\%} \text{ to } \color{red}{+0.09\%}$ 60 130
fightertest-aircraft 18.95 ms 19.18 ms ♻️ $\color{green}{-1.23\%} \text{ to } \color{green}{-1.08\%}$ 60 120
fightertest-tanks 24.74 ms 24.85 ms ♻️ $\color{green}{-0.70\%} \text{ to } \color{green}{-0.19\%}$ 60 120
fightertest-pathfinding 21.73 ms 21.80 ms ♻️ $\color{green}{-0.44\%} \text{ to } \color{green}{-0.18\%}$ 60 165
lategame1 22.74 ms 23.43 ms ♻️ $\color{green}{-3.59\%} \text{ to } \color{green}{-2.30\%}$ 30 150
Per-VM distribution box plots (5)

fightertest-bots

fightertest-aircraft

fightertest-tanks

fightertest-pathfinding

lategame1

💰 compute cost: $1.37 · 5 fresh legs · 5 cached at $0 last updated: 2026-06-19T07:07:16.261Z · [workflow run](https://github.com/bruno-dasilva/RecoilEngine/actions/runs/27810080694)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant