Skip to content

Upgrade llama.cpp from b9739 to b9789 and refresh Windows patch#271

Merged
bernardladenthin merged 4 commits into
mainfrom
claude/keen-babbage-yjmgwh
Jun 25, 2026
Merged

Upgrade llama.cpp from b9739 to b9789 and refresh Windows patch#271
bernardladenthin merged 4 commits into
mainfrom
claude/keen-babbage-yjmgwh

Conversation

@bernardladenthin

Copy link
Copy Markdown
Owner

Summary

  • Upgrade llama.cpp pinned version from b9739 to b9789 across CMake, documentation, and CI configuration
  • Refresh patches/0001-win32-arg-parse-embed-guard.patch to apply cleanly against b9789's new count-guard form in common_params_parse, which upstream now ships as the exact variant that breaks this project's Windows server-integration tests
  • Update patch documentation in CLAUDE.md and docs/history/llama-cpp-breaking-changes.md to reflect the rationale and verify no project source changes are required for the version bump

Details

llama.cpp b9739 → b9789 changes absorbed:

All breaking changes in this range are consumed inside upstream-compiled translation units (chat.cpp, server-*.cpp, common/arg.cpp, etc.). Verified via grep that the project does not directly reference any of the changed symbols:

  • Partial-JSON parser deletion and PEG parser refactor (json-partial.h removed)
  • Message-span type restructuring in common/chat.h (common_chat_msg_span, common_chat_msg_delimiter, common_chat_split_by_role() removed)
  • Context-checkpointing refactor (task_params::n_before_user removed, replaced by message_spans)
  • New llama_model_n_layer_nextn() API (not called by project)
  • common_params_handle_models() signature change (not called directly by project)
  • Multi-model router refactor in server-models.cpp (project links but does not drive)
  • Backend-internal work (Hexagon, Vulkan, SYCL, OpenCL, WebGPU shaders)

Windows patch refresh:

Upstream's common_params_parse argv handling evolved from the original unconditional override (llama.cpp #24779 regression in b9739) to a count-guard form in b9789:

if (static_cast<int>(utf8.buf.size()) == argc) {
    argv = utf8.ptrs.data();
}

This count-guard is exactly the variant this project identified as breaking its Windows server-integration tests (argv length coincides with java.exe's command line). The patch was refreshed to drop the new form and keep (void) utf8;, ensuring the caller's already-UTF-8 argv is always used. The patch applies cleanly and reverse-cleanly (idempotency verified) against b9789.

Documentation updates:

  • CLAUDE.md: Updated pinned version reference and expanded patch rationale
  • docs/history/llama-cpp-breaking-changes.md: Added comprehensive b9739–b9789 changelog with verification notes
  • README.md, TODO.md, .github/workflows/publish.yml: Updated version badge and references

Test plan

  • Patch applies cleanly to b9789 common/arg.cpp and reverse-applies cleanly (idempotency verified)
  • Local build with GIT_TAG b9789 verified clean on Linux x86_64 (GCC 13.3): cmake -B build -DBUILD_TESTING=ON && cmake --build build --config Release -j$(nproc) links libjllama.so + jllama_test with zero warnings
  • ctest --test-dir build --output-on-failure reports 454/454 tests passing
  • OuteTTS build-time extraction and Windows patch both pass their fail-loud anchor checks against b9789
  • CI is green on this branch

Related issues / PRs

Refs upstream llama.cpp #24779 (Windows argv regression), llama.cpp #24780 (count-guard variant)

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg

Bump the pinned llama.cpp tag and refresh the Windows argv patch for the
upgraded source. Every upstream breaking change in this range is absorbed
inside upstream-compiled translation units; no project C++ source edits
were required.

- CMakeLists.txt: GIT_TAG + LLAMA_TAG b9739 -> b9789.
- README.md / CLAUDE.md / publish.yml / TODO.md: version badge, pinned-
  version notes, WebUI clone example, aarch64 GCC rationale.
- patches/0001-win32-arg-parse-embed-guard.patch: refreshed for b9789.
  Upstream replaced the original #24779 argv override with the count-guard
  form (if utf8.buf.size() == argc), which is exactly the variant that
  breaks the Windows server-integration tests, so the patch still drops it
  entirely and keeps "(void) utf8;". Re-verified to apply and reverse-apply
  cleanly (idempotent) against b9789 common/arg.cpp.
- docs/history/llama-cpp-breaking-changes.md: new b9739-b9789 rows
  (json-partial.{h,cpp} removed -> peg-parser; chat.h message-span
  restructure; server-task n_before_user -> message_spans;
  new llama_model_n_layer_nextn; mtmd/clip progress_callback;
  server-models child-process download refactor).

Verified locally on Linux x86_64 (GCC 13.3): cmake configure passes the
fail-loud OuteTTS extraction and refreshed-patch anchor checks against
b9789, the full Release build links libjllama.so + jllama_test with zero
warnings on any project translation unit, and ctest reports 454/454
passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
…ssion

PR #271 CI surfaced two model-backed Java-test failures from the
b9739 -> b9789 bump (every "Java Tests" job failed; all build/C++ jobs
were already green):

1. LlamaModelTest.testJsonSchemaToGrammar -- upstream json-schema-to-grammar
   changed where it emits the `space` rule: a closing object is now
   "... )? space }" (was "... )? } space") and a root-level string rule no
   longer appends a trailing space. Functionally equivalent, byte-different;
   updated the pinned expectation to the b9789 output. Verified locally
   against the built b9789 libjllama (jsonSchemaToGrammar is a pure JNI call,
   no model needed).

2. LoadProgressCallbackTest -- server_context::load_model now unconditionally
   installs the server's own load-progress reporter on
   params_base.load_progress_callback right before common_init_from_params,
   clobbering libjllama's LoadProgressCallback JNI trampoline (set on
   common_params.load_progress_callback before load_model). The callback
   stopped firing (zero updates) and returning false no longer aborted the
   load. New patches/0002-server-preserve-caller-load-progress-callback.patch
   guards the install behind `if (params_base.load_progress_callback ==
   nullptr)`, so a caller-supplied callback survives; standalone llama-server
   (null field) is unaffected. Same JNI-vs-standalone class as patch 0001.

Patch 0002 applies + reverse-applies cleanly against b9789 and compiles clean
(ctest 454/454). The model-backed LoadProgressCallbackTest cannot run in the
restricted sandbox (no HuggingFace model); CI will confirm.

Docs: CLAUDE.md patches table + docs/history breaking-changes rows updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
Rework patches/0001-win32-arg-parse-embed-guard.patch from "drop the Windows
argv override entirely" to the opt-in design intended for upstreaming:

- common/arg.cpp: common_params_parse() now parses exactly the argv it is
  given (no GetCommandLineW override). A new common_params_parse_main()
  wrapper carries the process-command-line UTF-8 recovery (llama.cpp #24779)
  for the standalone tools' main().
- common/arg.h: declare common_params_parse_main().

The embedded JNI caller (jllama.cpp) already calls common_params_parse()
directly, so it is respected by default and never overridden -- behaviorally
identical to the previous deterministic patch for our build. The ~20
standalone main() call-site flips (common_params_parse -> _main) are left to
the upstream PR, not this local patch: we don't ship those tools and a
20-file patch would be fragile across llama.cpp bumps.

Verified: applies forward + reverse (idempotent) against b9789, compiles
clean (no warnings on arg.cpp/arg.h), ctest 454/454. The Windows-specific
behavior validates on Windows CI as before.

Docs updated to the new patch shape: CLAUDE.md patches table,
docs/history/llama-cpp-breaking-changes.md, TODO.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
Expand patches/0001-win32-arg-parse-embed-guard.patch from the arg.cpp/arg.h
core into the full, submittable upstream change so it can be sent to
llama.cpp verbatim and then dropped here:

- common/arg.cpp + common/arg.h: common_params_parse() parses exactly the
  argv it is given; new common_params_parse_main() wrapper carries the
  Windows GetCommandLineW UTF-8 recovery (#24779) for the standalone tools.
- ~34 standalone main() call sites across tools/*, examples/* and the
  tests/* programs flip common_params_parse(argc, argv, ...) ->
  common_params_parse_main(argc, argv, ...).
- tests/test-arg-parser.cpp: regression case asserting common_params_parse
  honors a caller-supplied argv (the embedded/JNI contract).

Our build compiles llama.cpp as a subproject (LLAMA_BUILD_TOOLS/TESTS OFF),
so only the arg.{cpp,h} core is compiled here -- the flips + test are
applied but not built in normal CI, and our embedded path (jllama.cpp ->
common_params_parse) is behaviorally identical to before. Validated the
flips + test via a one-off -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON
build: the new test compiles and its asserts pass, and a flipped program
(test-thread-safety) builds; test-arg-parser's only failure is its live
ggml.ai download assertion (sandbox network, not the patch). Full patch
applies + reverse-applies cleanly against b9789 (37 files).

Docs updated (CLAUDE.md patches table, breaking-changes row, TODO.md).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
@sonarqubecloud

Copy link
Copy Markdown

@bernardladenthin bernardladenthin merged commit 212634e into main Jun 25, 2026
35 of 37 checks passed
@bernardladenthin bernardladenthin deleted the claude/keen-babbage-yjmgwh branch June 25, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants