Commit 6975522
state: R5 TQ_NO_Q4 quality vs speed — inconsistent, keep opt-in
Cross-model A/B: TQ_NO_Q4=1 costs 7-26% decode speed across Qwen3/Phi-3/Llama.
Quality win is prompt-dependent — clear improvement on one long prompt
("faraway land" → coherent village story) but no difference on short prompts.
Not flipping default. Notable side-finding: Llama-3.2-1B Q8_0 default path
emits 'café' UTF-8 artifact; NO_Q4 path produces clean text. Tracked as
separate follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 6727a74 commit 6975522
1 file changed
Lines changed: 25 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
6 | 31 | | |
7 | 32 | | |
8 | 33 | | |
| |||
0 commit comments