Dynamic per-stage `think` control for Ollama reasoning models (`?think=true|false`)

## Summary

The current codebase generates `<think>...</think>` blocks with any Ollama reasoning model (Qwen 3.x, gpt-oss, DeepSeek-R1, etc.), then post-processes them out via `RemoveThinkTagFromAssistantMessages` in `Wrapper.py`. This is correct for cleanup but expensive at generation time — the framework pays the latency cost of thinking even when the call's purpose doesn't benefit from it.

A small dynamic affordance — per-stage `?think=true|false` in the model URL — would let users (and the framework) get the best of both behaviors with no breaking change.

## Background

Reasoning models default-on to thinking mode unless the chat API explicitly passes `think: false`. With the framework's URL parameter system (`ollama://model?param=value`) this should be a natural fit, except for one constraint: the existing URL parameter parser hard-casts every value through `float()`:

https://github.com/datacrystals/AIStoryWriter/blob/161b712400cd825f2d0a933cb0d9c362a48ea30b/Writer/Interface/Wrapper.py#L506-L508

```python
# Flatten QueryParams
for key in QueryParams:
    QueryParams[key] = float(QueryParams[key][0])
```

So `ollama://qwen3.5:122b-a10b?think=false` raises `ValueError: could not convert string to float: 'false'` before the value ever reaches the chat call. Booleans (and strings) need a small parser upgrade before this feature is possible.

## Motivation — why dynamic rather than a single Config flag

Different pipeline stages benefit from different `think` settings:

| Stage | Want thinking? | Why |
|---|---|---|
| Outline generation, story elements, per-chapter outlines | **off** | Wastes tokens on internal reasoning the framework cannot see; output is the artifact. |
| Scene-by-scene chapter writing (Stage 1) | **off** | Prose generation. Tokens should produce prose, not silent thinking. |
| Character / dialogue passes (Stage 2/3) | **off** | Same — refinement, prose-quality. |
| `LLMSummaryCheck` (chapter ↔ outline alignment verdict) | **on** | Structured judgment, true/false outcome. Thinking improves calibration. |
| `GetFeedbackOnChapter`, `GetChapterRating` (Stage 5 critique loop) | **on** | Critique is the canonical reasoning task; thinking lifts quality. |
| Eval / Info / Scrub structured-output passes | **on** | Same — better verdicts when the model is allowed to reason internally. |

A single global flag misses this. A per-stage flag in `Config.py` (one per `*_MODEL`) bloats the config surface. A per-URL `?think=true|false` rides on the existing URL-as-config mechanism, defaults to model-native behavior when omitted, and lets users mix-and-match across the 13 model knobs already in `Config.py`.

This also dovetails with a related architecture choice many users land on for v1 of dual-model setups: a fast generation model with thinking off (Qwen 3.x 122B, Mistral Medium 3.5, etc.) for the writing knobs, and a thinking-capable critic model (gpt-oss 120b, DeepSeek-R1 distills) for `EVAL_MODEL` / `REVISION_MODEL` / `CHECKER_MODEL`. The framework already supports per-stage model selection; per-stage `?think` completes the picture.

## Proposed implementation

Two small edits in `Writer/Interface/Wrapper.py`:

### 1. Smarter URL query parameter parser

Replace the `float`-only cast with bool → float → raw-string coercion so `?think=true`, `?think=false`, `?temperature=0.8`, and (future) string parameters all work:

```python
for key in QueryParams:
    raw = QueryParams[key][0]
    if raw.lower() == 'true':
        QueryParams[key] = True
    elif raw.lower() == 'false':
        QueryParams[key] = False
    else:
        try:
            QueryParams[key] = float(raw)
        except ValueError:
            QueryParams[key] = raw
```

### 2. Pop `think` from `ModelOptions` and pass it to the Ollama chat call

`think` is a chat-level kwarg in the `ollama` Python client, not a model option, so it has to live outside the `options=` dict and the `ValidParameters` allowlist:

```python
# Extract chat-level kwargs (separate from Ollama model options)
ChatExtras = {}
if "think" in ModelOptions:
    ChatExtras["think"] = bool(ModelOptions.pop("think"))

# ... existing ValidParameters check / num_ctx default / JSON-mode handling ...

Stream = self.Clients[_Model].chat(
    model=ProviderModel,
    messages=_Messages,
    stream=True,
    options=ModelOptions,
    **ChatExtras,
)
```

### Behavioral contract

- **No `?think=` in URL**: behavior unchanged. Default Ollama / model behavior preserved. No breaking change for anyone.
- **`?think=false`**: thinking suppressed; `<think>` blocks not generated; post-hoc strip becomes a no-op. Faster generation, no quality loss for non-reasoning tasks.
- **`?think=true`**: thinking forced on (useful when overriding a model that defaults off, e.g., gpt-oss when used as a critic).

## Compatibility

- Requires `ollama` Python client ≥ 0.4 (introduced the `think` kwarg). Current `requirements.txt` is unpinned at `ollama`; latest stable already supports this.
- No change to any non-Ollama provider path. The smarter parameter parser is provider-agnostic (it runs before the provider branch) but is strictly more permissive than the current parser — any URL that worked before still works.
- No change to `Config.py` defaults. Users opt in by adding `?think=true|false` to whichever model URLs they want.

PR to follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic per-stage `think` control for Ollama reasoning models (`?think=true|false`) #88

Summary

Background

Motivation — why dynamic rather than a single Config flag

Proposed implementation

1. Smarter URL query parameter parser

2. Pop `think` from `ModelOptions` and pass it to the Ollama chat call

Behavioral contract

Compatibility

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Stage	Want thinking?	Why
Outline generation, story elements, per-chapter outlines	off	Wastes tokens on internal reasoning the framework cannot see; output is the artifact.
Scene-by-scene chapter writing (Stage 1)	off	Prose generation. Tokens should produce prose, not silent thinking.
Character / dialogue passes (Stage 2/3)	off	Same — refinement, prose-quality.
`LLMSummaryCheck` (chapter ↔ outline alignment verdict)	on	Structured judgment, true/false outcome. Thinking improves calibration.
`GetFeedbackOnChapter`, `GetChapterRating` (Stage 5 critique loop)	on	Critique is the canonical reasoning task; thinking lifts quality.
Eval / Info / Scrub structured-output passes	on	Same — better verdicts when the model is allowed to reason internally.

Dynamic per-stage think control for Ollama reasoning models (?think=true|false) #88

Description

Summary

Background

Motivation — why dynamic rather than a single Config flag

Proposed implementation

1. Smarter URL query parameter parser

2. Pop think from ModelOptions and pass it to the Ollama chat call

Behavioral contract

Compatibility

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Dynamic per-stage `think` control for Ollama reasoning models (`?think=true|false`) #88

2. Pop `think` from `ModelOptions` and pass it to the Ollama chat call