Skip to content

Improve DMR support#2351

Merged
dgageot merged 1 commit intodocker:mainfrom
krissetto:improve-dmr-support
Apr 20, 2026
Merged

Improve DMR support#2351
dgageot merged 1 commit intodocker:mainfrom
krissetto:improve-dmr-support

Conversation

@krissetto
Copy link
Copy Markdown
Contributor

@krissetto krissetto commented Apr 8, 2026

Improve DMR support

  • provider_opts.context_size sets the engine's context window; max_tokens stays strictly per-request output instead of pulling-double duty (which was confusing).
  • Structured _configure request mirroring model-runner's BackendConfiguration (context-size, runtime-flags, speculative, llamacpp.reasoning-budget, vllm.{hf-overrides,gpu-memory-utilization}).
  • thinking_budget routed properly per backend: reasoning-budget for llama.cpp, thinking_token_budget per-request for vLLM, ignored on MLX/SGLang for now.
  • Fix session-title generation on reasoning models: DMR now honors NoThinking() by sending chat_template_kwargs.enable_thinking=false
  • clarified in the docs that sampling params belong on the regular model config, not in provider_opts.runtime_flags.

@krissetto krissetto force-pushed the improve-dmr-support branch 3 times, most recently from a6a876c to d2eac5f Compare April 17, 2026 17:11
@krissetto krissetto marked this pull request as ready for review April 17, 2026 17:30
@krissetto krissetto requested a review from a team as a code owner April 17, 2026 17:30
@krissetto
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two findings in the new code — one medium-severity validation gap and one low-severity clarity issue. All tests pass.

// the same rules as model-runner's inference.ParseKeepAlive:
// - Go duration strings: "5m", "1h", "30s"
// - "0" to unload immediately
// - Any negative value ("-1", "-1m") to keep loaded forever
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No description provided.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what?

Comment thread pkg/model/provider/dmr/client.go
- fixes session title generation
- adds 'context_size' provider_opt for DMR usage instead of giving 'max_tokens' double responsibility to avoid confusion
- improved thinking budget support and fix for NoThinking()
- improves how flags are sent to the DMR model/runtime configuration endpoint
- clarify docs on sampling/runtime params

Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
Assisted-By: docker-agent
@krissetto krissetto force-pushed the improve-dmr-support branch from a8837ee to d289608 Compare April 20, 2026 09:24
@dgageot dgageot merged commit aa8d245 into docker:main Apr 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants