Skip to content

Enable sccache wrapping of nvcc for full-arch CUDA builds#254

Merged
bernardladenthin merged 2 commits into
mainfrom
claude/determined-brahmagupta-si4qyu
Jun 21, 2026
Merged

Enable sccache wrapping of nvcc for full-arch CUDA builds#254
bernardladenthin merged 2 commits into
mainfrom
claude/determined-brahmagupta-si4qyu

Conversation

@bernardladenthin

Copy link
Copy Markdown
Owner

Summary

  • Enable sccache to wrap nvcc (CMAKE_CUDA_COMPILER_LAUNCHER=sccache) in CUDA builds, so per-architecture .cu device passes are cached alongside gcc C/C++ TUs. This was previously not cached due to sccache's limited nvcc support.
  • Remove single-arch validation shortcut from CI: Drop CUDA_FAST_BUILD and CUDA_ARCH from the crosscompile-linux-x86_64-cuda job. CI now always builds the full CMAKE_CUDA_ARCHITECTURES set on every run (PR/push/dispatch/publish), relying on the warm sccache cache for speed instead of a reduced arch set.
  • Keep CUDA_FAST_BUILD as a local-dev knob: The env var remains in build_cuda_linux.sh for developers who want single-arch builds locally, but CI no longer uses it.
  • Update build.sh to detect and wrap nvcc: Add logic to detect GGML_CUDA in cmake args and conditionally add -DCMAKE_CUDA_COMPILER_LAUNCHER=sccache. Include a "Compiler not supported" error signature in the mid-build retry fallback to handle cases where sccache cannot wrap nvcc.

Rationale

The ~70 min CUDA job's dominant cost is nvcc recompiling each .cu kernel once per architecture. Previously, only gcc C/C++ TUs were cached; nvcc passes were not. With sccache now wrapping nvcc, the per-arch device passes are cached over Depot, making warm runs much faster. This allows CI to always build the full arch set (release-safe everywhere) without the complexity of conditional single-arch builds for validation runs. The first (cold-cache) run still pays the full nvcc cost; subsequent warm runs benefit from the cache.

Test plan

  • CI is green on this branch (warm cache hits on CUDA job to be verified on second run)
  • Existing sccache probe and mid-build retry logic guards the change; if nvcc wrapping fails, the retry rebuilds without the launcher
  • sccache --show-stats should show CUDA hits on the second build to confirm the speedup

Related issues / PRs

Closes the CUDA caching gap identified in the sccache rollout.

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_01PJGUpbfRCjbRcovCTq5v4u

Wrap nvcc with sccache (CMAKE_CUDA_COMPILER_LAUNCHER) for CUDA builds so the
per-arch .cu device passes — the dominant cost of the ~70 min CUDA job — cache
over Depot alongside the gcc host TUs, not just the C/C++ TUs. With the kernels
cached, drop the single-arch validation shortcut: CI no longer sets
CUDA_FAST_BUILD/CUDA_ARCH, so every run builds the full CMAKE_CUDA_ARCHITECTURES
set (release-safe on PR/push as well as publish) and relies on the warm cache
for speed.

- build.sh: add -DCMAKE_CUDA_COMPILER_LAUNCHER=sccache, scoped to CUDA builds
  (GGML_CUDA in the cmake args), behind the existing probe. Broaden the
  mid-build retry trigger with "Compiler not supported" so an nvcc-hostile
  sccache falls back to an uncached green build instead of redding it.
- publish.yml: remove CUDA_FAST_BUILD/CUDA_ARCH from the CUDA job and their
  DOCKCROSS_ARGS passthroughs; full arch every run.
- CLAUDE.md: document nvcc caching + full-arch CI policy; CUDA_FAST_BUILD stays
  a local-dev-only knob. Warm-run verification of nvcc cache hits still pending.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PJGUpbfRCjbRcovCTq5v4u
withCacheReuse(int) and withSlotId(int) threw IllegalArgumentException with a
static message string, which SpotBugs flags as WEM_WEAK_EXCEPTION_MESSAGING and
fails the Build-and-analyze job. Include the offending value in each message so
the exception carries dynamic context. Pre-existing on the base branch; surfaced
on this PR's CI. The SessionTest assertion uses containsString and still matches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PJGUpbfRCjbRcovCTq5v4u
@sonarqubecloud

Copy link
Copy Markdown

@bernardladenthin bernardladenthin merged commit c85ff78 into main Jun 21, 2026
45 of 46 checks passed
@bernardladenthin bernardladenthin deleted the claude/determined-brahmagupta-si4qyu branch June 21, 2026 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants