Skip to content

Linux: gate cblas dgemv on USE_OPENBLAS, autodetect via pkg-config#4

Open
oaustegard wants to merge 1 commit intoPercepta-Core:mainfrom
oaustegard:add-linux-openblas
Open

Linux: gate cblas dgemv on USE_OPENBLAS, autodetect via pkg-config#4
oaustegard wants to merge 1 commit intoPercepta-Core:mainfrom
oaustegard:add-linux-openblas

Conversation

@oaustegard
Copy link
Copy Markdown

Summary

The matvec dispatch in transformer.cpp guards cblas_dgemv on __APPLE__, so on Linux the dense projection path falls through to a hand-rolled scalar nested loop even when libopenblas-dev is installed. runner.py's Linux compile invocation also doesn't link any BLAS, so there's no opt-in short of editing both files.

This PR adds an explicit USE_OPENBLAS macro for the matvec dispatch and extends the Linux branch of _build_cpp_engine to detect openblas via pkg-config, defining USE_OPENBLAS and appending the cflags/libs when found. Silent fallback to the scalar loop when either pkg-config or openblas-dev are unavailable, so existing Linux builds without libopenblas behave identically.

Why

I forked the repo to build a comparison artifact for an unrelated experiment, and noticed the dense BLAS path was never exercised on my Linux sandbox even with libopenblas-dev installed. Without it, "dense BLAS vs sparse" comparisons on Linux are misleading — the "BLAS" path is actually scalar.

Scope

Two files, 13 insertions, 1 deletion. macOS Accelerate path unchanged.

 transformer_vm/model/transformer.cpp |  4 +++-
 transformer_vm/runner.py             | 10 ++++++++++

Test plan

  • Linux without libopenblas-dev: pkg-config returns non-zero, no -DUSE_OPENBLAS added, scalar fallback used (verified in sandbox).
  • Linux with libopenblas-dev: pkg-config --cflags --libs openblas returns flags, -DUSE_OPENBLAS defined, cblas_dgemv linked.
  • macOS path is byte-identical (the __APPLE__ branch in the preprocessor and the Darwin branch in runner.py are untouched).
  • Token output is bit-identical between the scalar and openblas paths on hello, addition, collatz, fibonacci, min_cost_matching (no FP-rounding divergence at these scales).

Notes

This is intentionally minimal and orthogonal to the larger perf work in #1 and #2. Happy to fold into either if you'd prefer; filed standalone since it's a small correctness fix that's useful regardless of how those land.

The matvec dispatch in transformer.cpp guards cblas_dgemv on __APPLE__,
so on Linux the dense projection path falls through to a hand-rolled
scalar nested loop regardless of whether libopenblas is installed.
runner.py's Linux compile invocation also doesn't link any BLAS, so
there's no way to opt in short of editing both files.

This adds an explicit USE_OPENBLAS macro for the matvec dispatch and
extends the Linux branch of _build_cpp_engine to detect openblas via
pkg-config, defining USE_OPENBLAS and adding the cflags/libs when
found. Falls back silently to the scalar loop when pkg-config or
openblas-dev are unavailable, so existing builds without libopenblas
behave identically to before.

macOS Accelerate path is unchanged.
@ryvn-technologies
Copy link
Copy Markdown

Ryvn Preview

Creating preview prerelease-Percepta-Core-transformer-vm for this pull request.


This comment will be automatically updated with preview details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant