Upgrade llama.cpp from b9739 to b9789 and refresh Windows patch#271
Merged
Conversation
Bump the pinned llama.cpp tag and refresh the Windows argv patch for the
upgraded source. Every upstream breaking change in this range is absorbed
inside upstream-compiled translation units; no project C++ source edits
were required.
- CMakeLists.txt: GIT_TAG + LLAMA_TAG b9739 -> b9789.
- README.md / CLAUDE.md / publish.yml / TODO.md: version badge, pinned-
version notes, WebUI clone example, aarch64 GCC rationale.
- patches/0001-win32-arg-parse-embed-guard.patch: refreshed for b9789.
Upstream replaced the original #24779 argv override with the count-guard
form (if utf8.buf.size() == argc), which is exactly the variant that
breaks the Windows server-integration tests, so the patch still drops it
entirely and keeps "(void) utf8;". Re-verified to apply and reverse-apply
cleanly (idempotent) against b9789 common/arg.cpp.
- docs/history/llama-cpp-breaking-changes.md: new b9739-b9789 rows
(json-partial.{h,cpp} removed -> peg-parser; chat.h message-span
restructure; server-task n_before_user -> message_spans;
new llama_model_n_layer_nextn; mtmd/clip progress_callback;
server-models child-process download refactor).
Verified locally on Linux x86_64 (GCC 13.3): cmake configure passes the
fail-loud OuteTTS extraction and refreshed-patch anchor checks against
b9789, the full Release build links libjllama.so + jllama_test with zero
warnings on any project translation unit, and ctest reports 454/454
passing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
…ssion PR #271 CI surfaced two model-backed Java-test failures from the b9739 -> b9789 bump (every "Java Tests" job failed; all build/C++ jobs were already green): 1. LlamaModelTest.testJsonSchemaToGrammar -- upstream json-schema-to-grammar changed where it emits the `space` rule: a closing object is now "... )? space }" (was "... )? } space") and a root-level string rule no longer appends a trailing space. Functionally equivalent, byte-different; updated the pinned expectation to the b9789 output. Verified locally against the built b9789 libjllama (jsonSchemaToGrammar is a pure JNI call, no model needed). 2. LoadProgressCallbackTest -- server_context::load_model now unconditionally installs the server's own load-progress reporter on params_base.load_progress_callback right before common_init_from_params, clobbering libjllama's LoadProgressCallback JNI trampoline (set on common_params.load_progress_callback before load_model). The callback stopped firing (zero updates) and returning false no longer aborted the load. New patches/0002-server-preserve-caller-load-progress-callback.patch guards the install behind `if (params_base.load_progress_callback == nullptr)`, so a caller-supplied callback survives; standalone llama-server (null field) is unaffected. Same JNI-vs-standalone class as patch 0001. Patch 0002 applies + reverse-applies cleanly against b9789 and compiles clean (ctest 454/454). The model-backed LoadProgressCallbackTest cannot run in the restricted sandbox (no HuggingFace model); CI will confirm. Docs: CLAUDE.md patches table + docs/history breaking-changes rows updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
Rework patches/0001-win32-arg-parse-embed-guard.patch from "drop the Windows argv override entirely" to the opt-in design intended for upstreaming: - common/arg.cpp: common_params_parse() now parses exactly the argv it is given (no GetCommandLineW override). A new common_params_parse_main() wrapper carries the process-command-line UTF-8 recovery (llama.cpp #24779) for the standalone tools' main(). - common/arg.h: declare common_params_parse_main(). The embedded JNI caller (jllama.cpp) already calls common_params_parse() directly, so it is respected by default and never overridden -- behaviorally identical to the previous deterministic patch for our build. The ~20 standalone main() call-site flips (common_params_parse -> _main) are left to the upstream PR, not this local patch: we don't ship those tools and a 20-file patch would be fragile across llama.cpp bumps. Verified: applies forward + reverse (idempotent) against b9789, compiles clean (no warnings on arg.cpp/arg.h), ctest 454/454. The Windows-specific behavior validates on Windows CI as before. Docs updated to the new patch shape: CLAUDE.md patches table, docs/history/llama-cpp-breaking-changes.md, TODO.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
Expand patches/0001-win32-arg-parse-embed-guard.patch from the arg.cpp/arg.h
core into the full, submittable upstream change so it can be sent to
llama.cpp verbatim and then dropped here:
- common/arg.cpp + common/arg.h: common_params_parse() parses exactly the
argv it is given; new common_params_parse_main() wrapper carries the
Windows GetCommandLineW UTF-8 recovery (#24779) for the standalone tools.
- ~34 standalone main() call sites across tools/*, examples/* and the
tests/* programs flip common_params_parse(argc, argv, ...) ->
common_params_parse_main(argc, argv, ...).
- tests/test-arg-parser.cpp: regression case asserting common_params_parse
honors a caller-supplied argv (the embedded/JNI contract).
Our build compiles llama.cpp as a subproject (LLAMA_BUILD_TOOLS/TESTS OFF),
so only the arg.{cpp,h} core is compiled here -- the flips + test are
applied but not built in normal CI, and our embedded path (jllama.cpp ->
common_params_parse) is behaviorally identical to before. Validated the
flips + test via a one-off -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_TESTS=ON
build: the new test compiles and its asserts pass, and a flipped program
(test-thread-safety) builds; test-arg-parser's only failure is its live
ggml.ai download assertion (sandbox network, not the patch). Full patch
applies + reverse-applies cleanly against b9789 (37 files).
Docs updated (CLAUDE.md patches table, breaking-changes row, TODO.md).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
patches/0001-win32-arg-parse-embed-guard.patchto apply cleanly against b9789's new count-guard form incommon_params_parse, which upstream now ships as the exact variant that breaks this project's Windows server-integration testsCLAUDE.mdanddocs/history/llama-cpp-breaking-changes.mdto reflect the rationale and verify no project source changes are required for the version bumpDetails
llama.cpp b9739 → b9789 changes absorbed:
All breaking changes in this range are consumed inside upstream-compiled translation units (
chat.cpp,server-*.cpp,common/arg.cpp, etc.). Verified via grep that the project does not directly reference any of the changed symbols:json-partial.hremoved)common/chat.h(common_chat_msg_span,common_chat_msg_delimiter,common_chat_split_by_role()removed)task_params::n_before_userremoved, replaced bymessage_spans)llama_model_n_layer_nextn()API (not called by project)common_params_handle_models()signature change (not called directly by project)server-models.cpp(project links but does not drive)Windows patch refresh:
Upstream's
common_params_parseargv handling evolved from the original unconditional override (llama.cpp #24779 regression in b9739) to a count-guard form in b9789:This count-guard is exactly the variant this project identified as breaking its Windows server-integration tests (argv length coincides with
java.exe's command line). The patch was refreshed to drop the new form and keep(void) utf8;, ensuring the caller's already-UTF-8 argv is always used. The patch applies cleanly and reverse-cleanly (idempotency verified) against b9789.Documentation updates:
CLAUDE.md: Updated pinned version reference and expanded patch rationaledocs/history/llama-cpp-breaking-changes.md: Added comprehensive b9739–b9789 changelog with verification notesREADME.md,TODO.md,.github/workflows/publish.yml: Updated version badge and referencesTest plan
common/arg.cppand reverse-applies cleanly (idempotency verified)GIT_TAG b9789verified clean on Linux x86_64 (GCC 13.3):cmake -B build -DBUILD_TESTING=ON && cmake --build build --config Release -j$(nproc)linkslibjllama.so+jllama_testwith zero warningsctest --test-dir build --output-on-failurereports 454/454 tests passingRelated issues / PRs
Refs upstream llama.cpp #24779 (Windows argv regression), llama.cpp #24780 (count-guard variant)
Checklist
CONTRIBUTING.mdandCODE_OF_CONDUCT.mdhttps://claude.ai/code/session_01SLQk4Fk7vk7R4f2za1KxYg