Skip to content

Commit 2cc53b3

Browse files
Merge pull request #252 from bernardladenthin/claude/admiring-lamport-bbzntq
Fix fork PR sccache failures with token-gated install and build retry
2 parents 3f5c140 + c4901d4 commit 2cc53b3

3 files changed

Lines changed: 53 additions & 7 deletions

File tree

.github/build.sh

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,37 @@ else
104104
fi
105105

106106
cmake -Bbuild $LAUNCH $@ || exit 1
107-
cmake --build build --config Release -j"${JOBS}" || exit 1
107+
108+
# Build. The pre-build probe only proves the cache was reachable at one instant; it cannot
109+
# foresee a cache outage that strikes *during* the build. When sccache is the launcher and its
110+
# backend fails mid-build — e.g. an intermittent Depot 403 on the server-startup .sccache_check,
111+
# or the tokenless 403 a fork PR hits because secrets are withheld — sccache makes every TU fatal
112+
# and reds the whole build. sccache exposes no "ignore backend errors" switch for that startup
113+
# check, so recover by retrying the build once WITHOUT the launcher: a from-scratch uncached -O3
114+
# build is content-identical and release-safe, so the cache can never red the build. The retry is
115+
# gated on the failure output actually showing an sccache cache error, so a genuine compile error
116+
# still fails fast (and is reported) instead of triggering a wasteful uncached rebuild.
117+
build_log="$(mktemp 2>/dev/null || echo "/tmp/jllama-build.$$.log")"
118+
cmake --build build --config Release -j"${JOBS}" 2>&1 | tee "$build_log"
119+
build_rc=${PIPESTATUS[0]}
120+
if [ "$build_rc" -ne 0 ]; then
121+
if [ -n "$LAUNCH" ] && grep -qiE 'sccache: error|Server startup failed|cache storage failed' "$build_log"; then
122+
echo "build.sh: build failed via an sccache cache error — retrying WITHOUT cache (clean reconfigure)."
123+
rm -f "$build_log"
124+
rm -rf build && mkdir -p build
125+
cmake -Bbuild $@ || exit 1
126+
cmake --build build --config Release -j"${JOBS}" || exit 1
127+
LAUNCH="" # cache disabled for this run; skip the stats query below
128+
else
129+
rm -f "$build_log"
130+
exit 1
131+
fi
132+
fi
133+
rm -f "$build_log"
108134

109135
# Only query stats when sccache was actually used as the launcher; if the probe rejected a
110-
# crashing sccache, re-invoking it here would just repeat the crash output (harmless but noisy).
136+
# crashing sccache (or the mid-build retry disabled it), re-invoking it here would just repeat
137+
# the crash output (harmless but noisy).
111138
if [ -n "$LAUNCH" ] && command -v sccache >/dev/null 2>&1; then
112139
sccache --show-stats || true
113140
fi

.github/workflows/publish.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -415,7 +415,7 @@ jobs:
415415
echo "=== Processor Details ==="
416416
system_profiler SPHardwareDataType
417417
- name: Install sccache (shared compiler cache)
418-
if: env.USE_CACHE == 'true'
418+
if: env.USE_CACHE == 'true' && env.SCCACHE_WEBDAV_TOKEN != ''
419419
continue-on-error: true
420420
run: brew install sccache
421421
- name: Build libraries
@@ -460,7 +460,7 @@ jobs:
460460
echo "=== Processor Details ==="
461461
system_profiler SPHardwareDataType
462462
- name: Install sccache (shared compiler cache)
463-
if: env.USE_CACHE == 'true'
463+
if: env.USE_CACHE == 'true' && env.SCCACHE_WEBDAV_TOKEN != ''
464464
continue-on-error: true
465465
run: brew install sccache
466466
- name: Build libraries
@@ -578,7 +578,7 @@ jobs:
578578
with:
579579
arch: x64
580580
- name: Install sccache (shared compiler cache)
581-
if: env.USE_CACHE == 'true'
581+
if: env.USE_CACHE == 'true' && env.SCCACHE_WEBDAV_TOKEN != ''
582582
continue-on-error: true
583583
shell: pwsh
584584
run: |
@@ -632,7 +632,7 @@ jobs:
632632
with:
633633
arch: x86
634634
- name: Install sccache (shared compiler cache)
635-
if: env.USE_CACHE == 'true'
635+
if: env.USE_CACHE == 'true' && env.SCCACHE_WEBDAV_TOKEN != ''
636636
continue-on-error: true
637637
shell: pwsh
638638
run: |
@@ -723,7 +723,7 @@ jobs:
723723
echo "=== Processor Details ==="
724724
system_profiler SPHardwareDataType
725725
- name: Install sccache (shared compiler cache)
726-
if: env.USE_CACHE == 'true'
726+
if: env.USE_CACHE == 'true' && env.SCCACHE_WEBDAV_TOKEN != ''
727727
continue-on-error: true
728728
run: brew install sccache
729729
- name: Build libraries

CLAUDE.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,25 @@ sccache as the launcher: it compiles a trivial TU *through* sccache, and only se
290290
when a job sets one) for diagnosis but never reds the build. This closes the gap the original
291291
absent-only guard left.
292292

293+
**The fork-PR `.sccache_check` 403 (mac-only symptom) and its two guards.** A fork PR (e.g.
294+
`vaiju1981/java-llama.cpp` → upstream) runs with secrets withheld, so `SCCACHE_WEBDAV_TOKEN`
295+
(`= secrets.DEPOT_TOKEN`) is **empty**. Depot rejects the unauthenticated server-startup
296+
`.sccache_check` with **403 Forbidden** (`PermissionDenied (temporary) … Forbidden`), and
297+
because sccache treats a failed startup check as fatal, *every* TU dies. The symptom looked
298+
**mac-only** purely because of an asymmetry in how sccache reaches `PATH`: the macOS jobs ran
299+
`brew install sccache` **unconditionally** (`if: USE_CACHE == 'true'`), whereas the
300+
Linux/dockcross/aarch64 jobs only **fetch** sccache when a token is present (the `[ -n
301+
"$SCCACHE_WEBDAV_TOKEN…" ]` guard in `build.sh`'s fetch block) — so on a tokenless fork PR
302+
mac was the only platform with sccache on `PATH` to misfire. Two independent guards now prevent
303+
it: **(1)** every `Install sccache` step is gated `if: env.USE_CACHE == 'true' && env.SCCACHE_WEBDAV_TOKEN
304+
!= ''`, so a tokenless fork PR never even installs sccache (mac now matches Linux); and **(2)**
305+
`build.sh`'s build step **retries once without the launcher** when the build fails *and* the
306+
output shows an sccache cache error (`sccache: error` / `Server startup failed` / `cache storage
307+
failed`) — a clean uncached `-O3` rebuild that is content-identical and release-safe. The retry
308+
is gated on that error signature so a genuine compile error still fails fast and is reported
309+
(no wasteful uncached rebuild). Guard (2) also covers an *intermittent* 403 that strikes a
310+
valid-token job mid-build, which the one-shot probe cannot foresee.
311+
293312
**Rollout.** **Phase 1 — DONE & proven: the 3 macOS build jobs** (slowest + OOM-prone) —
294313
`brew install sccache` + the env above + `BUILD_JOBS: 2`. macOS build dropped **~40 min → ~6 min**
295314
with a warm cache. **Phase 2 — DONE: all 5 dockcross cross-compile jobs** now have the same

0 commit comments

Comments
 (0)