Releases · leehack/llamadart

Fixed multimodal chat-template rendering so templates that force-open reasoning, for example Qwen3.5 VLM prompts ending with <think>, preserve enable_thinking and stream generated reasoning through delta.thinking instead of delta.content.

Validation highlights:

dart pub publish --dry-run
GitHub CI on PR #240
Local real-model smokes: Qwen3.5 multimodal macOS repro and Gemma 4 GGUF chat features smoke
Tag-triggered Publish to pub.dev workflow
Tag-triggered Docs Version Cut workflow
GitHub Pages docs deploy
./tool/docs/validate_links.sh

Assets 2

24 Jun 09:55

leehack

v0.8.6

ed6283d

v0.8.6

0.8.6

Patch release for the llama.cpp native runtime sync to leehack/llamadart-native@b9776.

Updated the default llama.cpp native runtime pin to leehack/llamadart-native@b9776.
Regenerated the matching Dart FFI bindings.
Refreshed the llamadart_llama_cpp_flutter Apple SwiftPM checksum.
Aligned README and website native override docs for the new native pin and companion install snippets.

Validation highlights:

dart pub publish --dry-run
GitHub CI on PR #237 and post-merge main
LiteRT-LM Smoke on post-merge main
./tool/docs/validate_links.sh
Tag-triggered Publish to pub.dev workflow
Tag-triggered Docs Version Cut workflow

Assets 2

23 Jun 00:37

leehack

v0.8.5

39adb93

v0.8.5

0.8.5

Patch release for the mtmd split-library fallback ABI used by multimodal llama.cpp loads.

Fixed the split-library mtmd fallback ABI for image and byte-buffer multimodal inputs so Windows mtmd.dll and other split mtmd native bundles use the same bitmap helper signature as the generated native binding path.
This avoids corrupting the first mtmd bitmap-helper call for Gemma 4/MMProj style multimodal loads.
Added native symbol regression coverage for the fallback ABI.

Validation highlights:

dart pub publish --dry-run
GitHub CI on PR #235
LiteRT-LM Smoke on PR #235
./tool/docs/validate_links.sh

Assets 2

19 Jun 13:03

leehack

v0.8.3

ad48b13

v0.8.3

0.8.3

Patch release for Windows CUDA backend discovery in native-assets builds.

Fixed Windows CUDA backend discovery when the native asset bundle directory is
not on the app PATH.
llama.cpp backend modules are now loaded from their resolved bundle path in a
way that lets colocated CUDA redistributables such as cudart64_12.dll,
cublas64_12.dll, and cublasLt64_12.dll resolve correctly.

Validation highlights:

dart pub publish --dry-run
GitHub CI on PR #228 and post-merge main
Publish to pub.dev
Docs Version Cut
Docs Pages

Assets 2

18 Jun 13:08

leehack

v0.8.2

2507608

v0.8.2

Updated the default llama.cpp native runtime pin to
leehack/llamadart-native@b9694, regenerated matching Dart FFI bindings,
refreshed the llamadart_llama_cpp_flutter Apple SwiftPM checksum, and
updated the default WebGPU bridge asset pin to
leehack/llama-web-bridge-assets@v0.1.17 (llama.cpp b9699). The WebGPU
backend now caps unset large-model browser batches so Gemma 4 mem64 loads do
not fall back to context-sized compute buffers.
Added BackendGpuEnumeration.listGpuDevices({probeBackends}) (exposed via
LlamaEngine.listGpuDevices) to enumerate GPU-class devices for offload
selection: backend, per-backend mainGpu index, name, description, device
id, type, and free/total memory per device. With an empty probeBackends
only already-registered backends are inspected, so an unsupported GPU runtime
cannot crash the process during enumeration; pass specific backends to opt
into loading just those modules first. Web/WebGPU return an empty list.
Added Cohere2 MoE / North Code chat-template detection and parsing so
<|START_TEXT|> responses and <|START_ACTION|> tool-call arrays are
handled separately from older Command-R templates.

Assets 2

12 Jun 14:29

leehack

v0.8.1

e972106

v0.8.1

Fixed docs references that still pointed at
llamadart_litert_lm_flutter 0.0.1 and
the pre-native.1 LiteRT-LM release after the 0.8.0 native pin sync moved
LiteRT-LM Apple/runtime artifacts to v0.13.1-native.1.
Routed native .litertlm image/audio chat parts through LiteRT-LM
Conversation message JSON so bundles with native media processors can accept
LlamaImageContent / LlamaAudioContent path and encoded-byte inputs without
a separate mmproj projector.

Assets 2

10 Jun 22:40

leehack

v0.8.0

7d41702

v0.8.0

Flutter Apple runtime packaging:
- Split SwiftPM-linked Apple runtime packaging out of the core package into
  llamadart_llama_cpp_flutter for GGUF/llama.cpp and
  llamadart_litert_lm_flutter for .litertlm/LiteRT-LM. These companion
  packages live under packages/ in this repository and publish as separate
  pub.dev packages.
- Removed Flutter plugin metadata from llamadart so pure Dart/native-assets
  consumers can keep using the core package without taking a Flutter SDK
  constraint.
- Started the companion packages at 0.0.1; native pin sync bumps only the
  affected companion package patch version. Companion package publishing uses
  package-specific tags after the first manual pub.dev publish, and skips
  companion versions that already exist on pub.dev.
- Changed unset or empty llamadart_native_runtimes to include all available
  runtime families. For Flutter iOS/macOS app builds, installed companion
  packages decide Apple SPM runtimes; for every other build,
  llamadart_native_runtimes remains the selector.
- Updated the default llama.cpp native runtime pin to
  leehack/llamadart-native@b9587, regenerated matching Dart FFI bindings,
  and refreshed the llamadart_llama_cpp_flutter Apple SwiftPM checksum.
MTP benchmarking diagnostics:
- Added llama.cpp speculative decoding perf diagnostics for decode timing,
  draft/accepted token counts, draft verification timing, and acceptance
  rate so MTP benchmarks can separate backend decode cost from drafting
  overhead.
- Extended local macOS and chat app benchmark outputs with the new
  diagnostics and added focused llama.cpp MTP smoke/benchmark tools for
  baseline-vs-MTP comparisons.
llama.cpp MTP runtime support:
- Added SpeculativeDecodingConfig.mtp(draftModelPath: ...) for llama.cpp
  external draft-model MTP sessions, with draft model caching and cleanup
  tied to the target model lifetime.
- Removed the Android Vulkan MTP allow-list dart define and the model-name
  based Android Vulkan acceleration shortcut; Vulkan MTP now runs only when
  callers explicitly request Vulkan plus MTP in runtime parameters.
Structured output:
- Added responseFormat routing to LlamaEngine.create(...) for
  grammar-capable backends, deprecated the legacy chatTemplate(...)
  jsonSchema shortcut, and made strict response-format requests fail early
  on LiteRT-LM instead of silently degrading to unconstrained generation.
LiteRT-LM chat parity:
- Routed eligible native .litertlm text chat through LiteRT-LM Conversation
  APIs so structured history, system messages, tool declarations, and
  template extra context reach the runtime without a Dart-rendered prompt.
  Unsupported cases still fall back to the existing Dart chat-template path.
LiteRT-LM runtime tuning controls:
- Added opt-in native .litertlm ModelParams for
  liteRtLmActivationDataType, liteRtLmPrefillChunkSize,
  liteRtLmParallelFileSectionLoading, and liteRtLmDispatchLibDir,
  forwarding the pinned LiteRT-LM v0.13.1 engine-settings C APIs while
  keeping defaults unchanged.
- Extended the LiteRT-LM engine smoke tool with matching environment
  variables so real-model runs can validate load time, prefill throughput,
  decode throughput, and selected runtime settings.
- Documented support decisions for each candidate native knob and kept
  LiteRT-LM web rejecting these native-only settings explicitly.

Assets 2

Releases: leehack/llamadart

v0.8.10

What's Changed

Contributors

Uh oh!

v0.8.9

What's Changed

Contributors

Uh oh!

v0.8.8

What's Changed

Contributors

Uh oh!

v0.8.7

0.8.7

Uh oh!

v0.8.6

0.8.6

Uh oh!

v0.8.5

0.8.5

Uh oh!

v0.8.3

0.8.3

Uh oh!

v0.8.2

Uh oh!

v0.8.1

Uh oh!

v0.8.0

Uh oh!