Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ checklist.

## [Unreleased]

## [4.8.58] - 2026-06-10

- **fix: auto-retry effort-capability 400s — older models meet pinned efforts gracefully.** The v4.8.57 autodetected catalog exposes models that predate the newer effort tiers (smoke catch, same day: `claude-opus-4-5-20251101` + the prod `DARIO_EFFORT=max` pin → 400 `"This model does not support effort level 'max'. Supported levels: high, low, medium."`). Same pattern as the context-1m and anthropic-beta rejection machinery: parse the supported set out of the 400 (`parseEffortRejection`), retry once with `output_config.effort` clamped to the strongest supported level (`bestSupportedEffort` — degrade as little as possible: xhigh > max > high > medium > low), and cache the supported set **per wire model** (effort support is a model property, not an account property) so the body-build path clamps up front on every later request. Value-only in-place mutation — JSON field order (a fingerprint surface) is untouched. When the 400 matches but the body has no `output_config.effort` to clamp, the original upstream error is forwarded unchanged. fable's effort intolerance is unaffected — it soft-refuses (200 + `stop_reason:"refusal"`), invisible to 400-based machinery, and stays handled by its measured `resolveEffort` clamp. New `test/effort-capability.mjs` (16 assertions); full suite 86/86. Lockfile re-synced in-PR this time.

## [4.8.57] - 2026-06-10

- **feat: upstream model autodetection — `/v1/models` reflects what Anthropic actually serves.** New `src/model-catalog.ts` asks `api.anthropic.com/v1/models` for the live model set (authenticated the same way the request path is: OAuth bearer, or `x-api-key` in `ANTHROPIC_UPSTREAM_API_KEY` mode), normalizes it (claude-only, legacy claude-3-x generations dropped, CC-style short ids preferred over dated duplicates, deterministic family-rank/version-desc order, unknown future families kept and ranked last — a brand-new family appears on the next refresh without a dario release), and serves it from a stale-while-revalidate TTL cache (1h, `DARIO_MODEL_CATALOG_TTL_MS`; failed fetches back off 5min). Every failure path — cold start offline, auth broken, upstream 4xx/5xx, empty/garbage listing — falls back to the baked list, so the route always answers and behaves exactly as before when upstream is unreachable. Catalog is prewarmed at proxy startup so the first client call is served from cache.
Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@askalf/dario",
"version": "4.8.57",
"version": "4.8.58",
"description": "Use your Claude Pro/Max subscription in any tool — Cursor, Cline, Aider, the Agent SDK, your scripts — at subscription pricing, not per-token API bills. One local Anthropic + OpenAI-compatible endpoint.",
"type": "module",
"bin": {
Expand Down
107 changes: 107 additions & 0 deletions src/proxy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,39 @@ export function stripContext1mTag(model: string): string {
return model.replace(/\[1m\]$/i, '');
}

/**
* Parse upstream's effort-capability rejection:
*
* 400 {"type":"invalid_request_error","message":"This model does not
* support effort level 'max'. Supported levels: high, low, medium."}
*
* Observed live 2026-06-10 on `claude-opus-4-5-20251101` — the autodetected
* catalog exposes models that predate the newer effort tiers, and a pinned
* DARIO_EFFORT (the box pins `max`) hard-400s on them. Returns the rejected
* level plus the model's supported set, or null when the body is some other
* 400. NOTE: fable's effort intolerance is different in kind — a SOFT
* refusal (200 + stop_reason:"refusal"), invisible to this machinery — and
* stays handled by its measured clamp in resolveEffort.
* Exported for tests.
*/
export function parseEffortRejection(body: string): { rejected: string; supported: string[] } | null {
const m = body.match(/does not support effort level '([^']+)'\.?\s*Supported levels:\s*([a-z,\s]+)/i);
if (!m) return null;
const supported = m[2]!.split(',').map((s) => s.trim().toLowerCase()).filter((s) => s.length > 0);
return supported.length > 0 ? { rejected: m[1]!, supported } : null;
}

/**
* Pick the strongest effort level a model says it supports. Preference is
* descending capability — the caller asked for more than the model can do,
* so degrade as little as possible. Exported for tests.
*/
export const EFFORT_PREFERENCE: readonly string[] = ['xhigh', 'max', 'high', 'medium', 'low'];
export function bestSupportedEffort(supported: readonly string[]): string {
for (const e of EFFORT_PREFERENCE) if (supported.includes(e)) return e;
return supported[0] ?? 'high';
}

/**
* Resolve an inbound API path to its upstream target + forwarding mode.
* Allowlist semantics — anything unlisted is 403'd (prevents SSRF through
Expand Down Expand Up @@ -1154,6 +1187,12 @@ export async function startProxy(opts: ProxyOptions = {}): Promise<void> {
// re-pay the 400 round-trip. Keyed by account alias (pool) or `__default__`.
const unavailableBetas = new Map<string, Set<string>>();
const ACCOUNT_KEY_SINGLE = '__default__';
// Per-model effort capability cache — same pay-the-round-trip-once pattern
// as context1mUnavailable, but keyed by WIRE MODEL id: effort support is a
// model property, not an account property. Populated from upstream's
// "does not support effort level" 400 (see parseEffortRejection); consulted
// up front at body-build time so capped models never re-pay the rejection.
const effortSupportByModel = new Map<string, string[]>();

// Beta flag set — sourced from the live template when the capture recorded
// one (schema v2+), else falls back to the v2.1.104 bundled default. Same
Expand Down Expand Up @@ -1927,6 +1966,18 @@ export async function startProxy(opts: ProxyOptions = {}): Promise<void> {
// does on /v1/messages.
r.model = stripContext1mTag(r.model);
}
// Effort capability clamp — when a prior request taught us this
// model's supported effort set (autodetected catalogs expose
// models that predate newer tiers), rewrite output_config.effort
// up front instead of re-paying the 400 round-trip. In-place value
// mutation: field order (a fingerprint surface) is untouched.
if (typeof r.model === 'string') {
const supportedEfforts = effortSupportByModel.get(r.model);
const oc = r.output_config as { effort?: unknown } | undefined;
if (supportedEfforts && oc && typeof oc.effort === 'string' && !supportedEfforts.includes(oc.effort)) {
oc.effort = bestSupportedEffort(supportedEfforts);
}
}
finalBody = Buffer.from(JSON.stringify(r));
} catch { /* not JSON, send as-is */ }
}
Expand Down Expand Up @@ -2197,6 +2248,62 @@ export async function startProxy(opts: ProxyOptions = {}): Promise<void> {
pool.updateRateLimits(poolAccount.alias, retrySnapshot);
}
}
} else if (upstream.status === 400 && parseEffortRejection(peekedBody) && finalBody) {
// Effort-capability rejection — the model predates the requested
// effort tier (e.g. opus-4-5 + a DARIO_EFFORT=max pin; surfaced by
// the autodetected catalog). Clamp output_config.effort to the
// strongest level the error says the model supports, retry once,
// and cache the supported set per model so the up-front clamp
// handles every later request without the round-trip.
const rejection = parseEffortRejection(peekedBody)!;
const clamped = bestSupportedEffort(rejection.supported);
let retried = false;
try {
const rb = JSON.parse(finalBody.toString('utf8')) as Record<string, unknown>;
const wireModel = typeof rb.model === 'string' ? rb.model : '';
const oc = rb.output_config as { effort?: unknown } | undefined;
if (wireModel && oc && typeof oc.effort === 'string') {
const firstRejection = !effortSupportByModel.has(wireModel);
effortSupportByModel.set(wireModel, rejection.supported);
if (verbose && firstRejection) console.log(`[dario] #${requestCount} effort '${rejection.rejected}' rejected by ${wireModel} — retrying with '${clamped}' (supported set cached per model)`);
oc.effort = clamped; // in-place value mutation — field order untouched
finalBody = Buffer.from(JSON.stringify(rb));
const retry = await fetch(targetBase, {
method: req.method ?? 'POST',
headers: passthrough ? headers : orderHeadersForOutbound(headers),
body: new Uint8Array(finalBody),
signal: upstreamAbort.signal,
});
upstream = retry;
peekedBody = null;
retried = true;
if (pool && poolAccount) {
const retrySnapshot = parseRateLimits(upstream.headers);
if (upstream.status === 429) {
pool.markRejected(poolAccount.alias, retrySnapshot);
} else {
pool.updateRateLimits(poolAccount.alias, retrySnapshot);
}
}
}
} catch { /* body not JSON — forward the original 400 below */ }
if (!retried) {
// Couldn't rebuild the body (no output_config.effort / not JSON)
// — the upstream body is already consumed, so forward it here;
// the chain's terminal 400 branch won't run for us.
const responseHeaders: Record<string, string> = {
'Content-Type': upstream.headers.get('content-type') ?? 'application/json',
'Access-Control-Allow-Origin': corsOrigin,
...SECURITY_HEADERS,
};
for (const [key, value] of upstream.headers.entries()) {
if (key === 'request-id') responseHeaders[key] = value;
}
requestCount++;
res.writeHead(400, responseHeaders);
res.end(peekedBody);
return;
}
} else if (isLongContextError) {
// Cache the rejection so future requests on this account skip
// context-1m up front instead of re-paying the 400/429 round-trip.
Expand Down
54 changes: 54 additions & 0 deletions test/effort-capability.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env node
// Effort-capability rejection parsing + clamp choice.
//
// The autodetected model catalog (v4.8.57) exposes models that predate the
// newer effort tiers; with a pinned DARIO_EFFORT (the prod box pins `max`)
// they hard-400: "This model does not support effort level 'max'.
// Supported levels: high, low, medium." (observed live 2026-06-10 on
// claude-opus-4-5-20251101). dario now parses that rejection, retries with
// the strongest supported level, and caches the supported set per model.
// NOTE: fable's effort intolerance is a SOFT refusal (200 + refusal stop)
// and stays handled by its measured resolveEffort clamp — different layer.

import assert from 'node:assert';
import { parseEffortRejection, bestSupportedEffort, EFFORT_PREFERENCE } from '../dist/proxy.js';

let passed = 0;
function check(name, cond) {
assert.ok(cond, name);
passed++;
}

// --- parseEffortRejection — the live-observed wire shape ---
const live = JSON.stringify({
type: 'error',
error: {
type: 'invalid_request_error',
message: "This model does not support effort level 'max'. Supported levels: high, low, medium.",
},
request_id: 'req_011CbvJwPBuypezxSTiFphUU',
});
const r = parseEffortRejection(live);
check('live shape parses', r !== null);
check('rejected level extracted', r.rejected === 'max');
check('supported set extracted', JSON.stringify(r.supported) === JSON.stringify(['high', 'low', 'medium']));

const xhigh = parseEffortRejection("does not support effort level 'xhigh'. Supported levels: high");
check('single supported level parses', xhigh.rejected === 'xhigh' && xhigh.supported.length === 1 && xhigh.supported[0] === 'high');

check('case-insensitive match', parseEffortRejection("DOES NOT SUPPORT EFFORT LEVEL 'MAX'. SUPPORTED LEVELS: HIGH, MEDIUM") !== null);
check('unrelated 400 → null', parseEffortRejection('{"error":{"message":"long context beta is not yet available"}}') === null);
check('empty body → null', parseEffortRejection('') === null);
check('beta rejection → null', parseEffortRejection('Unexpected value(s) `afk-mode-2026-01-31` for the `anthropic-beta` header') === null);

// --- bestSupportedEffort — degrade as little as possible ---
check('max rejected, high/low/medium supported → high', bestSupportedEffort(['high', 'low', 'medium']) === 'high');
check('xhigh preferred when present', bestSupportedEffort(['medium', 'xhigh', 'low']) === 'xhigh');
check('max preferred over high', bestSupportedEffort(['high', 'max']) === 'max');
check('single option', bestSupportedEffort(['low']) === 'low');
check('unknown-only set falls back to first entry', bestSupportedEffort(['turbo']) === 'turbo');
check('empty set falls back to high', bestSupportedEffort([]) === 'high');
check('preference order is descending capability',
JSON.stringify(EFFORT_PREFERENCE) === JSON.stringify(['xhigh', 'max', 'high', 'medium', 'low']));

console.log(`✅ effort-capability: ${passed} assertions passed`);
Loading