Enable sccache wrapping of nvcc for full-arch CUDA builds by bernardladenthin · Pull Request #254 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-06-21T09:23:34Z

Summary

Enable sccache to wrap nvcc (CMAKE_CUDA_COMPILER_LAUNCHER=sccache) in CUDA builds, so per-architecture .cu device passes are cached alongside gcc C/C++ TUs. This was previously not cached due to sccache's limited nvcc support.
Remove single-arch validation shortcut from CI: Drop CUDA_FAST_BUILD and CUDA_ARCH from the crosscompile-linux-x86_64-cuda job. CI now always builds the full CMAKE_CUDA_ARCHITECTURES set on every run (PR/push/dispatch/publish), relying on the warm sccache cache for speed instead of a reduced arch set.
Keep CUDA_FAST_BUILD as a local-dev knob: The env var remains in build_cuda_linux.sh for developers who want single-arch builds locally, but CI no longer uses it.
Update build.sh to detect and wrap nvcc: Add logic to detect GGML_CUDA in cmake args and conditionally add -DCMAKE_CUDA_COMPILER_LAUNCHER=sccache. Include a "Compiler not supported" error signature in the mid-build retry fallback to handle cases where sccache cannot wrap nvcc.

Rationale

The ~70 min CUDA job's dominant cost is nvcc recompiling each .cu kernel once per architecture. Previously, only gcc C/C++ TUs were cached; nvcc passes were not. With sccache now wrapping nvcc, the per-arch device passes are cached over Depot, making warm runs much faster. This allows CI to always build the full arch set (release-safe everywhere) without the complexity of conditional single-arch builds for validation runs. The first (cold-cache) run still pays the full nvcc cost; subsequent warm runs benefit from the cache.

Test plan

CI is green on this branch (warm cache hits on CUDA job to be verified on second run)
Existing sccache probe and mid-build retry logic guards the change; if nvcc wrapping fails, the retry rebuilds without the launcher
sccache --show-stats should show CUDA hits on the second build to confirm the speedup

Related issues / PRs

Closes the CUDA caching gap identified in the sccache rollout.

Checklist

I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
My commits follow Conventional Commits
No security-sensitive changes

https://claude.ai/code/session_01PJGUpbfRCjbRcovCTq5v4u

Wrap nvcc with sccache (CMAKE_CUDA_COMPILER_LAUNCHER) for CUDA builds so the per-arch .cu device passes — the dominant cost of the ~70 min CUDA job — cache over Depot alongside the gcc host TUs, not just the C/C++ TUs. With the kernels cached, drop the single-arch validation shortcut: CI no longer sets CUDA_FAST_BUILD/CUDA_ARCH, so every run builds the full CMAKE_CUDA_ARCHITECTURES set (release-safe on PR/push as well as publish) and relies on the warm cache for speed. - build.sh: add -DCMAKE_CUDA_COMPILER_LAUNCHER=sccache, scoped to CUDA builds (GGML_CUDA in the cmake args), behind the existing probe. Broaden the mid-build retry trigger with "Compiler not supported" so an nvcc-hostile sccache falls back to an uncached green build instead of redding it. - publish.yml: remove CUDA_FAST_BUILD/CUDA_ARCH from the CUDA job and their DOCKCROSS_ARGS passthroughs; full arch every run. - CLAUDE.md: document nvcc caching + full-arch CI policy; CUDA_FAST_BUILD stays a local-dev-only knob. Warm-run verification of nvcc cache hits still pending. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PJGUpbfRCjbRcovCTq5v4u

withCacheReuse(int) and withSlotId(int) threw IllegalArgumentException with a static message string, which SpotBugs flags as WEM_WEAK_EXCEPTION_MESSAGING and fails the Build-and-analyze job. Include the offending value in each message so the exception carries dynamic context. Pre-existing on the base branch; surfaced on this PR's CI. The SessionTest assertion uses containsString and still matches. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PJGUpbfRCjbRcovCTq5v4u

sonarqubecloud · 2026-06-21T09:40:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bernardladenthin temporarily deployed to startgate June 21, 2026 09:23 — with GitHub Actions Inactive

bernardladenthin temporarily deployed to startgate June 21, 2026 09:38 — with GitHub Actions Inactive

bernardladenthin merged commit c85ff78 into main Jun 21, 2026
45 of 46 checks passed

bernardladenthin deleted the claude/determined-brahmagupta-si4qyu branch June 21, 2026 10:52

bernardladenthin mentioned this pull request Jun 21, 2026

CUDA CI: drop sccache debug diagnostics now that nvcc caching is proven #255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable sccache wrapping of nvcc for full-arch CUDA builds#254

Enable sccache wrapping of nvcc for full-arch CUDA builds#254
bernardladenthin merged 2 commits into
mainfrom
claude/determined-brahmagupta-si4qyu

bernardladenthin commented Jun 21, 2026

Uh oh!

sonarqubecloud Bot commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented Jun 21, 2026

Summary

Rationale

Test plan

Related issues / PRs

Checklist

Uh oh!

sonarqubecloud Bot commented Jun 21, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants