fix(llm-trace): stash litellm tool-call usage so token cost survives arg-parse failures#2444
Merged
nicoloboschi merged 2 commits intoJun 30, 2026
Conversation
…arg-parse failures (completes vectorize-io#2396)
Add a real-provider regression test for the fix in this PR: the existing wrapper-level tools test uses a provider that already stashes, so it does not guard LiteLLMLLM.call_with_tools. This drives the real provider with a billed response whose tool arguments are malformed JSON and asserts the error trace keeps the provider-reported tokens (input/output/cached). The LiteLLMRouterLLM subclass inherits call_with_tools, so it is covered too. Verified it fails (input_tokens=None) when the stash line is removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Completes the tool-calling path for #2396 (closes #2387 — keep the provider-billed token cost on error traces).
#2396 made
LiteLLMLLM.call()stash the response usage right after the billed completion so that a local failure (e.g. a length check or JSON parse) doesn't drop the tokens the provider already charged for.call_with_tools()was left without that stash:try,arguments = json.loads(arguments)(litellm_llm.py:423),argumentsstring raisesjson.JSONDecodeErrorafter billing; the broadexcept Exceptionclassifies it non-retryable and re-raises, and the wrapper error path readscurrent_response_usage()→None→ recordsinput/output_tokens = 0/0.That is exactly the accounting loss #2387 set out to eliminate, just on the tool-calling route — and
LiteLLMRouterLLMinherits it (it overrides_acompletion/_build_common_kwargs/_stage_label/_resolve_completion_model, notcall_with_tools).This adds the same
stash_response_usage(_usage_from_litellm_response(response))immediately after the billed response, mirroringcall()and theanthropic/geminicall_with_toolspaths. On the success path the usage is recomputed downstream as before; on the error path the provider cost now survives so the total-vs-effective-vs-failed token split #2387 asked for stays accurate.