fix(llm-trace): stash litellm tool-call usage so token cost survives arg-parse failures by r266-tech · Pull Request #2444 · vectorize-io/hindsight

r266-tech · 2026-06-28T08:54:23Z

Completes the tool-calling path for #2396 (closes #2387 — keep the provider-billed token cost on error traces).

#2396 made LiteLLMLLM.call() stash the response usage right after the billed completion so that a local failure (e.g. a length check or JSON parse) doesn't drop the tokens the provider already charged for. call_with_tools() was left without that stash:

the response is awaited and billed at the top of the try,
then tool-call arguments are parsed with an unguarded arguments = json.loads(arguments) (litellm_llm.py:423),
a malformed arguments string raises json.JSONDecodeError after billing; the broad except Exception classifies it non-retryable and re-raises, and the wrapper error path reads current_response_usage() → None → records input/output_tokens = 0/0.

That is exactly the accounting loss #2387 set out to eliminate, just on the tool-calling route — and LiteLLMRouterLLM inherits it (it overrides _acompletion/_build_common_kwargs/_stage_label/_resolve_completion_model, not call_with_tools).

This adds the same stash_response_usage(_usage_from_litellm_response(response)) immediately after the billed response, mirroring call() and the anthropic/gemini call_with_tools paths. On the success path the usage is recomputed downstream as before; on the error path the provider cost now survives so the total-vs-effective-vs-failed token split #2387 asked for stays accurate.

…arg-parse failures (completes vectorize-io#2396)

Add a real-provider regression test for the fix in this PR: the existing wrapper-level tools test uses a provider that already stashes, so it does not guard LiteLLMLLM.call_with_tools. This drives the real provider with a billed response whose tool arguments are malformed JSON and asserts the error trace keeps the provider-reported tokens (input/output/cached). The LiteLLMRouterLLM subclass inherits call_with_tools, so it is covered too. Verified it fails (input_tokens=None) when the stash line is removed.

r266-tech and others added 2 commits June 28, 2026 16:54

fix(llm-trace): stash litellm tool-call usage so token cost survives …

87fcf3c

…arg-parse failures (completes vectorize-io#2396)

nicoloboschi merged commit b7080a1 into vectorize-io:main Jun 30, 2026
177 of 188 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm-trace): stash litellm tool-call usage so token cost survives arg-parse failures#2444

fix(llm-trace): stash litellm tool-call usage so token cost survives arg-parse failures#2444
nicoloboschi merged 2 commits into
vectorize-io:mainfrom
r266-tech:fix/litellm-tool-call-usage-stash

r266-tech commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

r266-tech commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants