macOS Apple Silicon support via Mesa Zink → KosmicKrisp → Metal#2991
macOS Apple Silicon support via Mesa Zink → KosmicKrisp → Metal#2991iamaperson000 wants to merge 28 commits into
Conversation
|
I see changes in the libs dir, I would prefer that third party libraries do not receive custom changes on our side: that creates complications if and when we choose to pull later versions. If there's issues with them, then I'd prefer to see them fixed at source unless there's a compelling reason to do otherwise. I'll have to dive in deeper, but that's something to bare in mind. If the changes in libs can be reduced down to the absolute minimum (ideally none) then it will be easier to assess the long term impact of such changes. |
| ProcessorMasks GetProcessorMasks() { | ||
| ProcessorMasks masks; | ||
|
|
||
| unsigned int numCores = std::thread::hardware_concurrency(); |
There was a problem hiding this comment.
Won't this report every core, including the efficiency cores? This will be disastrous for performance.
Could you use something like sysctlbyname("hw.perflevel0.physicalcpu", ... to count the number of performance cores?
Also during threads pinning, rather than do nothing (because you can't pin them on Mac) you could mark the threads as QOS_CLASS_USER_INTERACTIVE - without having too many performance threads.
There was a problem hiding this comment.
Given that modern Intel CPUs have efficiency cores, wouldn't it make sense to split this out into its own change rather than it being Arm/Mac specific?
|
Thanks for catching that, and you were right! Should be good now, please let me know if there is anything else. Oh and for the libs dir, I mean I guess we could technically commit upstream but I feel that it would take months, and am not sure if it is worth it. Happy to do something else though. What are your thoughts? |
Adds the foundational macOS/Apple Silicon platform-support:
Platform-specific code (new files):
- rts/System/Platform/Mac/CpuTopology.cpp: stubs the CPU
topology API on top of std::thread::hardware_concurrency()
and sysctl for cache sizes (macOS exposes no portable
per-core P/E topology).
- rts/System/Platform/Mac/ThreadSupport.cpp: native pthread
wrappers (Suspend/Resume are no-ops on macOS).
Build system (CMake):
- if(APPLE)/NOT APPLE branches in rts/CMakeLists.txt,
rts/builds/{legacy,dedicated}/CMakeLists.txt,
rts/lib/glad/CMakeLists.txt, rts/System/CMakeLists.txt,
test/CMakeLists.txt, and tools/unitsync/CMakeLists.txt so
the Mac branch picks up the new Platform/Mac sources,
pulls libunwind only on non-Apple UNIX, and links
Foundation / objc / EGL (the latter via find_library with
Homebrew/MESA_PREFIX hints).
Mac-gated source changes:
- Rendering/GlobalRendering.cpp: EGL-on-CAMetalLayer path
(Kopper/Zink via Mesa) for context creation and
SwapBuffers, all behind #if defined(__APPLE__).
- Game/LoadScreen.cpp, Rendering/GL/{myGL.cpp,glxHandler.*}:
skip GLX (Mac has no X server) and skip the Lua intro
screen (EGL/Metal incompatibility), all behind #ifdef.
- System/Platform/{ThreadAffinityGuard.cpp,.h}: stub the
affinity API on macOS, which exposes no portable
equivalent of sched_setaffinity.
- System/MemPoolTypes.h, Sim/Units/Unit.cpp: Apple-only
fallbacks where pthread_t is an opaque pointer and where
std::views::enumerate is unavailable in older Apple Clang
libc++.
- test/other/testMutex.cpp: use os_unfair_lock instead of
linux/futex.h on macOS.
Mac-driven but platform-neutral fixes:
- System/Platform/Threading.cpp: replace std::ranges::find_if
with std::find_if (still C++17, compiles everywhere).
- System/SafeUtil.h: add missing <type_traits> include
needed by libc++.
- lib/smmalloc/smmalloc.h: relax POD static_assert to
is_trivially_copyable_v (avoids deprecated is_trivial).
- lib/smmalloc/smmalloc_generic.cpp, lib/assimp/include/
assimp/{matrix3x3,matrix4x4,quaternion,vector2,vector3}.inl:
add missing <cstdlib>/<cmath> includes (libstdc++
transitively included them; libc++ does not).
- AI/Wrappers/CUtils/Util.c: const-correct the macOS branch
of util_fileSelector to match Apple's scandir signature.
Linux and Windows code paths are either unchanged
(#ifdef-guarded) or pick up trivially-compatible standard
library calls; this commit is additive from their
perspective.
- FindSDL2.cmake: Add parent directory to include path so #include <SDL2/SDL.h> works with macOS SDL2 config (which sets include to /include/SDL2 directly) - FindLibunwind.cmake: Use INTERFACE IMPORTED library target on macOS (fixes -framework treated as file path by Unix Makefiles generator) - glad/CMakeLists.txt: Exclude glad_glx.c on Apple (no GLX on macOS) - legacy/headless CMakeLists.txt: Suppress -no_warn_duplicate_libraries on macOS (benign transitive dependency duplicates) - include/AL/: Vendored OpenAL EFX headers (macOS OpenAL.framework has no EFX support; needed for sound system compilation)
- rts/CMakeLists.txt: DevIL CMake module sets IL_INCLUDE_DIR to the directory containing il.h (e.g. /include/IL), but code uses #include <IL/il.h>. Add parent directory to include path. - gladstub.cpp: Add missing GL function stubs needed for headless build (glMultiDrawArraysIndirect, glTexStorage1D, glDebugMessage*, GLAD_GL_ATI_meminfo, GLAD_GL_NVX_gpu_memory_info)
- FastMath.h: break circular dependency between math::sqrtf and streflop_cond.h's std::hypot workaround on __APPLE__ by providing a temporary declaration before the include - Add Cpp23Compat.hpp: polyfill for std::views::enumerate (Apple libc++ doesn't support this C++23 feature yet), following the pattern of existing Cpp17Compat.hpp - headless CMakeLists: re-link engineCommonLibraries after GameHeadless to fix macOS single-pass linker symbol resolution - gflags: set GFLAGS_NAMESPACE to "google;gflags" since subdirectory builds default to "gflags" only but engine code uses "google"
smmalloc.h defines `#define INLINE inline` which leaks into GLTFParser.cpp and conflicts with simdjson's layout_mode::INLINE enum member, causing compilation failures whenever both headers are included in the same translation unit (surfaces first on macOS where the system simdjson header triggers this code path).
- assimp: Resolve ambiguous math function calls (fabsf, fabs, sqrt, etc.) that fail with Clang's stricter overload resolution. Use explicit std:: qualified calls and static_cast where needed. - smmalloc: Add missing <type_traits> include and switch to <cstdlib> for C++ header consistency.
- float3.h: Clang template instantiation compatibility - SafeUtil.h: Clang constexpr handling; use is_trivially_default_- constructible for portability - MemPoolTypes.h: factor pthread_t/Win32/Linux thread-id logging into a helper so libc++ on Apple (where pthread_t is a pointer) formats cleanly - Util.c: drop dead __APPLE__/non-__APPLE__ branch for util_fileSelector (both branches had identical const struct dirent* signatures) - SolLua bind/*.cpp: sol::nil -> sol::lua_nil (sol3 compatibility with libc++ where sol::nil is unavailable on some configurations)
The projectile / explosion-FX texture atlas Finalize() can fail (atlasAllocator Allocate() returns false), leaving atlasTex null. CProjectileDrawer::Init then calls GetTexID() -> GL::TextureBase::GetId() on null atlasTex, faulting at offset 0x8. Guard GetTexID() / DisOwnTexture() to no-op on a null atlasTex so a failed atlas degrades gracefully instead of crashing. This is a defensive fix that helps any platform whose atlas allocation can fail. Also: LuaTextures::Create now logs size/format/glError when glTexImage fails (was a silent return-nil), aiding diagnosis. LoadScreen exposes a runtime toggle for CLuaIntro under macOS via the SPRING_MAC_ENABLE_LUAINTRO env var; the existing #if defined(__APPLE__) block is preserved and the env-var check is inside it.
SaveWindowPosAndSize was storing backing pixels (e.g. 2560x1440 on a 1280x720 logical Retina window), so the next launch restored a 2x-too- big window that was then clamped to the screen, producing a portrait sliver. Store the logical size from SDL_GetWindowSize instead of the backing size from SDL_GL_GetDrawableSize. Affects HiDPI Linux setups symmetrically.
The engine checks for ARB_multitexture, ARB_texture_env_combine, ARB_texture_compression, ARB_texture_float, ARB_texture_non_power_of_two, and ARB_framebuffer_object extensions by name. Per the GL spec, these were folded into core GL 1.3-3.0; core-profile contexts no longer advertise them by name, but their functionality is guaranteed. The name-only check is a false-negative on any core-profile context. Skip it when the active context is core profile so the engine doesn't spuriously reject otherwise-valid configurations.
Replaces a hardcoded absolute libEGL.dylib path from an earlier bring-up checkpoint with a SPRING_MAC_LIBEGL CMake cache variable (default empty -> no Mesa link). Configure with: cmake -DSPRING_MAC_LIBEGL=/opt/homebrew/opt/mesa/lib/libEGL.dylib ... When set, the engine also skips linking Apple's OpenGL.framework and uses Mesa's libGL.dylib (looked up next to libEGL.dylib) instead. Loading OpenGL.framework on macOS 26 registers an NSWindow notification observer that bus-errors in +[NSOpenGLContext currentContext] during window setup.
Each EGL bring-up step now prints the result + last error to stderr. Quickly identifies whether failure is in eglGetDisplay, eglInitialize, eglBindAPI, eglChooseConfig, eglCreateContext, or eglMakeCurrent. On Homebrew's stock Mesa on macOS 26 the failure is at eglInitialize because that Mesa was built only for the X11 platform.
Engine-level changes to bring up the GL context on Apple Silicon (macOS 26 / M-series) through a Mesa libEGL built for the surfaceless platform, driving Zink (OpenGL-on-Vulkan) against the KosmicKrisp Vulkan driver (Vulkan-on-Metal): 1. CMake: when SPRING_MAC_LIBEGL is set, link Mesa libGL only if a libGL.dylib sits next to it. libGL is not strictly required at link time — all GL entry points get resolved through eglGetProcAddress at runtime, so libEGL alone is enough. 2. EGL: switch eglChooseConfig from EGL_WINDOW_BIT to EGL_PBUFFER_BIT. Mesa's surfaceless EGL platform doesn't expose window-bit configs; presentation happens via CAMetalLayer + KosmicKrisp WSI (Vulkan -> Metal). 3. EGL: walk OpenGL versions 4.6 -> 3.2 calling eglCreateContext with CORE profile, take the first that succeeds. Mesa/Zink rejects both empty attribs (returns default GL 2.1 which the engine then rejects) and (3.0 + CORE) since profile attrs only apply to 3.2+. 4. EGL: dump renderer/version/vendor/GLSL strings right after eglMakeCurrent to make Zink-on-KosmicKrisp issues visible. The matching change to skip the legacy ARB extension-name check in CheckGLExtensions on a CORE-profile context was landed in an earlier commit on this branch. Required runtime env: EGL_PLATFORM=surfaceless MESA_LOADER_DRIVER_OVERRIDE=zink GALLIUM_DRIVER=zink MESA_GL_VERSION_OVERRIDE=4.6 MESA_GLSL_VERSION_OVERRIDE=460 VK_ICD_FILENAMES=<mesa-install>/share/vulkan/icd.d/kosmickrisp_mesa_icd.aarch64.json DYLD_LIBRARY_PATH=<mesa-install>/lib Result: GL 4.6 (Core Profile) Mesa 26.2.0-devel, renderer 'zink Vulkan 1.3(Apple M4 (MESA_KOSMICKRISP))', GLSL 4.60. GL4 mode enabled. Clean engine startup + graceful shutdown.
The surfaceless Mesa EGL can't make a window surface, so the
engine renders into an off-screen pbuffer that eglSwapBuffers
never presents -> white window. Add a manual present: read the
rendered framebuffer back (glReadPixels, BGRA8) and blit it onto
the window's CAMetalLayer drawable via Metal, then present.
- rts/System/Platform/Mac/MetalPresent.{h,mm}: MRC-safe Metal
helper. MacMetalPresent_Init(layer) sets up MTLDevice/queue and
configures the CAMetalLayer (BGRA8, framebufferOnly=NO).
MacMetalPresent_PresentBGRA() uploads a CPU BGRA buffer to a
staging texture, blits it into nextDrawable, presents, commits.
Optional vertical flip for GL bottom-up readback.
- System/CMakeLists.txt: build MetalPresent.mm in the Mac
platform sources.
- builds/legacy/CMakeLists.txt: link Metal + QuartzCore
frameworks on Apple.
- GlobalRendering.cpp: stash the CAMetalLayer (g_metalLayer);
SPRING_MAC_PRESENT_TEST now drives the flash through this path.
Confirmed: the window shows the rendered clear color (red)
instead of staying white -- the present mechanism works end to
end (GL/Zink -> KosmicKrisp -> glReadPixels -> Metal -> window).
Next: wire MacMetalPresent_PresentBGRA into
CGlobalRendering::SwapBuffers (with flipY) so real frames
present, and successively fix the load-time crashes
(CProjectileDrawer atlas, etc.) to reach the draw loop. The
glReadPixels roundtrip is a stopgap; IOSurface GL/Metal interop
is the perf follow-up.
- SwapBuffers: on the macOS/EGL path, read the rendered default framebuffer back (glReadPixels BGRA, flipY) and blit it to the CAMetalLayer via MacMetalPresent each frame, then SDL_PumpEvents() so CoreAnimation composites (we replaced SDL_GL_SwapWindow which used to service the run loop). - InitEGLContext: size the pbuffer to the window's *backing* pixels (SDL_GetWindowSize points * backingScaleFactor) instead of logical points, so full Retina-resolution rendering isn't clipped. Init MacMetalPresent here. - MetalPresent.mm: log nil drawables. - Debug: SPRING_MAC_DUMP_FRAME=<prefix> dumps rendered frames to raw files (header w,h + BGRA) for offline inspection without screen capture. Confirmed the present path is correct: a dumped load-time frame is solid black because an earlier prototype disables the Lua loading-screen renderer on macOS (CLoadScreen::Draw only draws when luaIntro != nullptr; it's skipped here), so nothing is drawn during load. Real imagery requires reaching CGame's draw loop past the CProjectileDrawer atlas crash.
Replace the SwapBuffers path's full-frame CPU pixel copy + Y-flip memcpy + MTLTexture replaceRegion upload with an IOSurface-backed MTLTexture. The engine writes glReadPixels output directly into the IOSurface's CPU view (honoring its rowBytes via GL_PACK_ROW_LENGTH), and a one-triangle Metal render pass samples the same surface and Y-flips into the drawable. Adds: - MacMetalPresent_AcquireIOSurfaceBuffer(w, h, &rowBytes) returns a locked CPU pointer also bound as an MTLTexture; recreates the backing only when dimensions change. - MacMetalPresent_PresentIOSurface(flipY) unlocks the surface, encodes a cached render pipeline state (flip / non-flip), and presents the drawable. - IOSurface.framework added to the legacy build's link list. The original MacMetalPresent_PresentBGRA is kept for the early-splash callsite, and as a runtime fallback selectable via SPRING_MAC_LEGACY_PRESENT=1. Notes: - Apple Silicon's IOSurface picks 64-pixel row alignment, so at width 2940 rowBytes is 11776 (= 2944 pixels per row, 4 pixels of padding). Honor it via GL_PACK_ROW_LENGTH or the readback tears. - Logs '[MetalPresent] IOSurface zero-copy path active (WxH, rowBytes=N)' once on first acquire, and a corresponding 'LEGACY CPU-staging path active' line if the fallback is taken.
ReadWindowPosAndSize bound winSize / viewSize to GetMetalDrawableSize() on the macOS surfaceless path, but the engine renders into a backing- resolution pbuffer FBO. They were equal by accident at full Retina (drawable == backing == pbuffer size); with non-1x render scales they diverge -> glViewport(0,0,drawableW,drawableH) on a smaller FBO meant only one quadrant of geometry landed. Bind to the FBO size instead. This is the latent-bug shape; HiDPI Linux setups that ever render into a smaller FBO than the drawable would hit the same issue. No behavior change in the default same-size case.
Mesa 26.2 Zink grants a 4.6 compatibility-profile context on Apple Silicon via KosmicKrisp (verified). The EGL init now prefers compatibility (version walk 4.6 -> 3.2) and falls back to core only if compat is refused. Set SPRING_MAC_GL_CORE=1 to force core. Compatibility profile is a strict superset of core: every modern GL4 feature is available AND legacy paths (immediate mode, display lists, the fixed-function matrix stack, '#version ... compatibility' GLSL) keep working. Several Lua-built shaders in BAR rely on those legacy paths, so the compat profile is the easier integration point on the macOS path. Geometry shaders remain unavailable regardless of profile because Vulkan reports geometryShader=false on Metal; that is a separate problem and not affected by this change.
Adds a glReadPixels-based frame dump hook in SwapBuffers, gated by the SPRING_FRAME_CAPTURE env var. When set, the engine writes the default FBO contents to <prefix>.<frame>.raw before each present (use raw2png.py to convert). Useful for verifying headless rendering output without needing a window-system. The hook is no-op when the env var is unset, so the default behavior on every platform is unchanged.
Three env-gated features useful on any GL backend that uses glReadPixels for present: - SPRING_NO_PBO=1: disable double-buffered async PBO readback (default is ON; PBO hides the glReadPixels GPU pipeline drain behind 1 frame of present latency) - SPRING_DOWNSAMPLE_READBACK=N: blit-downsample by N before readback - SPRING_TIME_PRESENT=1: per-60-frame stage timing breakdown Also adds SPRING_MAC_NO_RETINA=1 to render the pbuffer at logical (1x) resolution instead of full backing (Retina) size — Apple-Silicon specific perf knob, no effect on other platforms. PBO async readback default-on gave ~3x FPS in busy scenes on the macOS Zink+KosmicKrisp path (sync 41 ms busy -> PBO 13 ms steady). Behavior is unchanged on platforms that don't engage the readback present path.
CLuaIntro was previously disabled on macOS as a workaround for the core-profile shader path. With the compat-profile context now the default (see earlier commit on this branch), the loading-screen text / splash / progress works correctly via the engine font renderer. Flip the macOS default to ENABLED, and switch the env-var escape hatch to opt-OUT: set SPRING_MAC_DISABLE_LUAINTRO=1 to skip CLuaIntro and fall back to the simple black load screen. Non-macOS platforms are unchanged (CLuaIntro has always been on by default there).
Metal (via Mesa Zink / KosmicKrisp on Apple Silicon) has no geometry- shader stage; Vulkan reports geometryShader = false on that path. But Mesa advertises GL_MAX_GEOMETRY_OUTPUT_VERTICES > 0 regardless, so the engine cannot detect the missing capability via GL introspection. Strip GS from Lua-loaded shader programs on macOS so the program at least links. Non-macOS platforms continue to honor the shader author's intent -- Linux / Windows GL drivers support geometry shaders, and any custom Lua shader using GS would have been silently broken on those platforms by the previous unconditional strip. Widgets that need GS-style point expansion have a Lua-layer NoGS fallback in the BAR widget tree (separate PR), so the engine-level strip is a fail-safe rather than the primary mechanism.
SDL emits mouse events in logical (point) coordinates. On the macOS surfaceless+pbuffer path the engine viewport is in backing-pixel coordinates (winSize / viewSize are tied to the pbuffer FBO; see GlobalRendering::ReadWindowPosAndSize). Without rescaling, windowed- mode clicks land at half the cursor position on Retina displays. Add two static helpers (ScaleMouseCoords, ScaleMouseDelta) gated by #if defined(__APPLE__), and route MOUSEMOTION / MOUSEBUTTONDOWN / MOUSEBUTTONUP through them. Non-macOS platforms are untouched - the #else branch matches the prior behavior exactly.
LuaParser.cpp:127 and LuaHandleSynced.cpp:435 call
LuaLibs::OpenSynced, which is only defined in Lua/LuaLibs.cpp.
A previous cherry-pick dropped that file from both the
dedicated server and unitsync source lists, leaving the bare
declaration in LuaLibs.h to satisfy compilation while breaking
the link on every platform.
Re-add ${ENGINE_SRC_ROOT_DIR}/Lua/LuaLibs.cpp to both targets
so OpenSynced is actually linked in.
The glGetIntegerv(GL_MAX_GEOMETRY_OUTPUT_VERTICES) probe and the first L_WARNING log sat OUTSIDE the macOS-only #if, so any Linux/Windows Lua shader carrying a geometry stage would emit a spurious warning every compile and pay for an unnecessary GL query. Move both inside the __APPLE__ block alongside the existing "GS unconditionally stripped on macOS" log and geomSrcs.clear() call. Non-Apple builds now ignore non-empty geomSrcs as before.
- FindSDL2.cmake: the SDL2::SDL2 INTERFACE_INCLUDE_DIRECTORIES rewrite was added for Homebrew SDL2 (which sets the include to .../include/SDL2 only). On Linux distros sdl2-config already produces a usable include layout, and rewriting it would leak /usr/include into every SDL2-consuming target. Gate the elseif branch on APPLE. - builds/legacy/CMakeLists.txt: replace a U+2014 em-dash in a comment with ASCII -- so the source stays plain-ASCII. - lib/CMakeLists.txt: keep GFLAGS_NAMESPACE="google;gflags" but fix the rationale comment. Engine code uses gflags::, however Homebrew's /opt/homebrew/include/gflags/gflags.h is picked up first (DevIL etc. add -I/opt/homebrew/include) and that header hard-codes GFLAGS_NAMESPACE=google, so the DEFINE_* macros in main.cpp emit google::FlagRegisterer references. Building the vendored gflags with both namespaces resolves either spelling.
macOS's OpenAL.framework lacks the EFX extension headers, so we
vendor efx.h and alext.h. Previously these lived in include/AL/,
which is added to every target's include path via
include_directories(\${CMAKE_SOURCE_DIR}/include/AL). On Linux
that would shadow the system OpenAL-Soft devel headers exposed
through the OpenAL::OpenAL CMake target.
Move them to include/Mac/AL/ and add that path only inside the
Sound CMakeLists' if(APPLE) branch. Linux builds keep using the
system AL headers; macOS still finds <efx.h> and <alext.h>.
Addresses beyond-all-reason#2991 review feedback (thanks @lostsquirrel1). Apple Silicon cores are heterogeneous: a small high-performance (P) cluster and a larger efficiency (E) cluster. Treating every visible core as a P-core (the previous behavior) caused the engine to over-provision sim worker threads, some of which then landed on E-cores at ~1/3 the throughput of P-cores. Two changes: 1. CpuTopology::GetProcessorMasks now reads the per-perflevel sysctl keys (hw.perflevel0.physicalcpu, hw.perflevel1.physicalcpu) to count P-cores and E-cores separately, and reports them in the appropriate masks. Intel Macs and older kernels do not expose perflevel keys, so behavior on those targets is unchanged (all cores treated as P). On an Apple M4 (4 P + 6 E) the new masks read: Performance Core Mask: 0x0000000f Efficiency Core Mask: 0x000003f0 and Optimal thread count drops from 9 to 4, matching the P-cluster. 2. ThreadSupport::SetupCurrentThreadControls now calls pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0) so the kernel preferentially schedules these threads on the P-cluster. The call is gated to threads that pass through ThreadStart with a ThreadControls handle (the sim workers), not every pthread in the process, so background I/O / helper threads remain free to land on the E-cluster.
Summary
Adds an Apple Silicon macOS build path via Mesa 26.2 (Zink driver) → KosmicKrisp (Vulkan-on-Metal) → Metal. Renders BAR end-to-end on macOS 26 / M-series hardware.
Companion PRs
Cross-platform safety
All behavior changes are gated by one of:
Universal bug fixes (NOT macOS-specific) are called out per commit:
Env vars introduced
Tested
Known limitations
Notes for reviewers