Linux: gate cblas dgemv on USE_OPENBLAS, autodetect via pkg-config#4
Open
oaustegard wants to merge 1 commit intoPercepta-Core:mainfrom
Open
Linux: gate cblas dgemv on USE_OPENBLAS, autodetect via pkg-config#4oaustegard wants to merge 1 commit intoPercepta-Core:mainfrom
oaustegard wants to merge 1 commit intoPercepta-Core:mainfrom
Conversation
The matvec dispatch in transformer.cpp guards cblas_dgemv on __APPLE__, so on Linux the dense projection path falls through to a hand-rolled scalar nested loop regardless of whether libopenblas is installed. runner.py's Linux compile invocation also doesn't link any BLAS, so there's no way to opt in short of editing both files. This adds an explicit USE_OPENBLAS macro for the matvec dispatch and extends the Linux branch of _build_cpp_engine to detect openblas via pkg-config, defining USE_OPENBLAS and adding the cflags/libs when found. Falls back silently to the scalar loop when pkg-config or openblas-dev are unavailable, so existing builds without libopenblas behave identically to before. macOS Accelerate path is unchanged.
|
Ryvn Preview Creating preview This comment will be automatically updated with preview details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
matvecdispatch intransformer.cppguardscblas_dgemvon__APPLE__, so on Linux the dense projection path falls through to a hand-rolled scalar nested loop even whenlibopenblas-devis installed.runner.py's Linux compile invocation also doesn't link any BLAS, so there's no opt-in short of editing both files.This PR adds an explicit
USE_OPENBLASmacro for the matvec dispatch and extends the Linux branch of_build_cpp_engineto detect openblas viapkg-config, definingUSE_OPENBLASand appending thecflags/libswhen found. Silent fallback to the scalar loop when eitherpkg-configoropenblas-devare unavailable, so existing Linux builds without libopenblas behave identically.Why
I forked the repo to build a comparison artifact for an unrelated experiment, and noticed the dense BLAS path was never exercised on my Linux sandbox even with
libopenblas-devinstalled. Without it, "dense BLAS vs sparse" comparisons on Linux are misleading — the "BLAS" path is actually scalar.Scope
Two files, 13 insertions, 1 deletion. macOS Accelerate path unchanged.
Test plan
libopenblas-dev:pkg-configreturns non-zero, no-DUSE_OPENBLASadded, scalar fallback used (verified in sandbox).libopenblas-dev:pkg-config --cflags --libs openblasreturns flags,-DUSE_OPENBLASdefined,cblas_dgemvlinked.__APPLE__branch in the preprocessor and the Darwin branch inrunner.pyare untouched).hello,addition,collatz,fibonacci,min_cost_matching(no FP-rounding divergence at these scales).Notes
This is intentionally minimal and orthogonal to the larger perf work in #1 and #2. Happy to fold into either if you'd prefer; filed standalone since it's a small correctness fix that's useful regardless of how those land.