Skip to content

Conversation

@Templight41
Copy link

Problem:
The ADK dev UI lacks visibility into token usage and associated costs when interacting with Gemini models. Users cannot track how many tokens are being consumed or estimate the cost of their API calls, making it difficult to monitor usage and optimize prompts.

Solution:
Implemented a comprehensive token usage and cost tracking feature that:

  • Fetches live pricing from Google Cloud's Gemini API pricing page on first use, then caches permanently
  • Adds a cost_usd field to LlmResponse and Event models that gets populated during streaming
  • Displays token usage (input/output tokens) and estimated USD cost in the dev UI
  • Provides a clean, theme-matched button UI that shows cost and token count
  • Includes a detailed popover breakdown showing input tokens, output tokens, and cost
  • Intercepts SSE streams to track cumulative usage across the session
  • Supports all Gemini models with automatic model detection and pricing lookup
  • Falls back to hardcoded defaults (accurate as of December 2025) only if API fetch fails

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Summary of pytest results:

tests/unittests/utils/test_gemini_pricing.py::TestModelPricing::test_calculate_cost_low_tier PASSED
tests/unittests/utils/test_gemini_pricing.py::TestModelPricing::test_calculate_cost_high_tier PASSED
tests/unittests/utils/test_gemini_pricing.py::TestModelPricing::test_calculate_cost_with_cache PASSED
tests/unittests/utils/test_gemini_pricing.py::TestModelPricing::test_calculate_cost_flash_model PASSED
tests/unittests/utils/test_gemini_pricing.py::TestGeminiPricingService::test_get_pricing_exact_match PASSED
tests/unittests/utils/test_gemini_pricing.py::TestGeminiPricingService::test_get_pricing_fuzzy_match PASSED
tests/unittests/utils/test_gemini_pricing.py::TestGeminiPricingService::test_get_pricing_with_prefix PASSED
tests/unittests/utils/test_gemini_pricing.py::TestGeminiPricingService::test_get_pricing_unknown_model PASSED
tests/unittests/utils/test_gemini_pricing.py::TestCalculateTokenCost::test_calculate_token_cost_gemini_25_pro PASSED
tests/unittests/utils/test_gemini_pricing.py::TestCalculateTokenCost::test_calculate_token_cost_gemini_25_flash PASSED
tests/unittests/utils/test_gemini_pricing.py::TestCalculateTokenCost::test_calculate_token_cost_with_cache PASSED
tests/unittests/utils/test_gemini_pricing.py::TestCalculateTokenCost::test_calculate_token_cost_unknown_model PASSED

12 passed in 1.39s

Manual End-to-End (E2E) Tests:

Setup:

  1. Activated development venv: source .venv/bin/activate
  2. Started dev UI: adk web test
  3. Opened browser at http://localhost:8000

Test Cases:

  1. Live pricing fetch on first use:

    • Server logs on startup show:
    INFO - Fetching latest Gemini pricing from https://cloud.google.com/vertex-ai/generative-ai/pricing
    INFO - Successfully fetched pricing for 5 models from API
    
    • Subsequent requests use cached pricing (no additional fetches)
  2. Token cost calculation in SSE events:

    • Sent message to agent: "Hello! How can I help you today?"
    • Verified costUsd field appears in SSE events
    • Example event payload:
    {
      "content": {...},
      "usageMetadata": {
        "candidatesTokenCount": 9,
        "promptTokenCount": 35,
        "totalTokenCount": 44
      },
      "costUsd": 0.000033,
      "author": "root_agent"
    }
  3. UI Display:

    • Token usage button appears with format: $0.00 | 0 tokens
    • Button updates in real-time as tokens are consumed
    • Clicking button opens popover showing:
      • Input tokens
      • Output tokens
      • Total cost
      • Event count
    • Reset button clears all accumulated usage
  4. Server-side logging:

    DEBUG - Calculating token cost: model=gemini-2.5-flash, prompt=35, output=9, cached=0
    DEBUG - Token cost calculated: $0.000033
    
  5. Fallback behavior:

    • Tested with network disabled: falls back to hardcoded defaults
    • No errors or crashes when pricing fetch fails
    • Warning logged: "Failed to fetch Gemini pricing: ..., using hardcoded defaults"
  6. Multiple model support:

    • Tested with gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash
    • Verified pricing lookup works for all Gemini model variants
    • Fuzzy matching works (e.g., "gemini-2.5-flash-001" matches "gemini-2.5-flash")

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Implementation Details:

  1. Backend (Python):

    • Created src/google/adk/utils/gemini_pricing.py with live pricing fetch and caching
    • Modified base_llm_flow.py to calculate costs in _finalize_model_response_event()
    • Added cost_usd field to LlmResponse model with Pydantic alias costUsd for JSON serialization
    • Costs are calculated based on actual token counts from usage_metadata
    • Pricing is fetched from Google Cloud API on first request, then cached permanently for the session
  2. Frontend (JavaScript):

    • Created src/google/adk/cli/browser/token-usage-display.js injected into dev UI
    • Intercepts SSE /run_sse responses to extract cost data
    • Displays cumulative session usage with button + popover UI
    • Matches ADK dev UI theme with consistent styling
  3. Pricing Architecture:

    • First request: Fetches live pricing from https://cloud.google.com/vertex-ai/generative-ai/pricing
    • Subsequent requests: Uses cached pricing (no repeated API calls)
    • Fallback: Uses hardcoded defaults (December 2025) only if fetch fails
    • Supports tiered pricing (low tier: <200K tokens, high tier: >200K tokens)
    • Handles cached token pricing separately
    • Validates parsed pricing to reject invalid data
  4. Testing Strategy:

    • Added enable_fetch parameter to GeminiPricingService for test control
    • Tests use enable_fetch=False to avoid network calls and use hardcoded defaults
    • All tests pass consistently without external dependencies

Files Changed:

  • src/google/adk/utils/gemini_pricing.py (NEW - 400+ lines)
  • src/google/adk/cli/browser/token-usage-display.js (NEW - 465 lines)
  • tests/unittests/utils/test_gemini_pricing.py (NEW - 12 comprehensive tests)
  • src/google/adk/models/llm_response.py (added cost_usd field)
  • src/google/adk/flows/llm_flows/base_llm_flow.py (added cost calculation logic)
  • src/google/adk/cli/adk_web_server.py (added script injection endpoint)

Design Decisions:

  1. Why one-time fetch vs. periodic refresh?

    • Google rarely changes pricing (historically ~1-2 times per year)
    • Reduces network overhead and latency
    • Simpler implementation with fewer edge cases
    • Still provides live pricing on first use
  2. Why HTML parsing instead of official API?

    • No official Gemini pricing API exists
    • HTML parsing is best-effort with robust fallback
    • Validates parsed data to prevent garbage values
    • Always falls back to known-good defaults
  3. Why inject JavaScript instead of modifying Angular app?

    • Dev UI frontend is pre-compiled Angular with no source available
    • JavaScript injection is non-invasive and doesn't require rebuilding UI
    • Easy to maintain and update independently

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Templight41, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature to the ADK development UI that provides real-time visibility into Gemini model token usage and associated costs. Previously, users lacked the ability to monitor token consumption and estimate API call expenses, hindering usage optimization. The solution integrates both backend logic for calculating costs based on live or cached pricing data and a frontend display that shows cumulative token counts and estimated USD costs directly within the UI, enhancing user awareness and control over their LLM interactions.

Highlights

  • Token Usage and Cost Tracking: Implemented a comprehensive feature to track and display token usage (input/output) and estimated USD cost for Gemini models directly within the ADK dev UI.
  • Live Pricing & Caching: The system fetches live Gemini model pricing from Google Cloud's pricing page on first use, caching it permanently for the session. It includes a robust fallback to hardcoded defaults if the API fetch fails.
  • Dev UI Integration: A new theme-matched button and detailed popover breakdown are integrated into the ADK dev UI, showing cumulative token counts and estimated costs, enhancing user visibility.
  • Backend Cost Calculation: A cost_usd field has been added to LlmResponse and Event models, which is populated during streaming by calculating costs based on actual token counts and the dynamically fetched or cached pricing information.
  • SSE Stream Interception: Frontend JavaScript intercepts Server-Sent Events (SSE) to track cumulative usage across the session, providing real-time updates without requiring modifications to the core Angular application.
  • Comprehensive Testing: New unit tests for the pricing logic and detailed manual end-to-end tests ensure the feature's reliability, covering live pricing fetch, cost calculation, UI display, fallback behavior, and multiple model support.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the web [Component] This issue will be transferred to adk-web label Dec 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for displaying token usage and cost in the developer UI. The implementation is well-structured, with a clear separation between the backend pricing logic and the frontend display script. The use of live pricing with a fallback to hardcoded defaults is a robust design choice. My review includes suggestions to improve the maintainability of the JavaScript code, enhance the robustness of the HTML script injection, and fix a potential bug in the pricing page parser. Overall, this is a great addition to the project.

@seanzhou1023
Copy link
Collaborator

Thanks for the PR, could you please open PR to https://github.com/google/adk-web for UI related changes?

@Templight41
Copy link
Author

Thanks for the PR, could you please open PR to https://github.com/google/adk-web for UI related changes?

Sure I'll work on it

Templight41 and others added 2 commits December 17, 2025 12:18
Removed frontend-specific code as per maintainer feedback:
- Deleted src/google/adk/cli/browser/token-usage-display.js
- Removed JavaScript injection endpoint from adk_web_server.py

Backend API remains intact:
- Token cost calculation in base_llm_flow.py
- cost_usd field in LlmResponse model
- Gemini pricing service with live API fetching
- All unit tests passing (12/12)

Frontend implementation will be done in the separate adk-web repository.
@Templight41
Copy link
Author

@seanzhou1023 I've updated the code here to remove frontend which I will continue on adk-web. Please take a look.

Templight41 and others added 2 commits December 17, 2025 12:28
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Templight41
Copy link
Author

@seanzhou1023 google/adk-web#323 Created PR for frontend change

@Templight41 Templight41 changed the title Feat/token usage display Feature: token usage display Dec 17, 2025
@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Dec 17, 2025
@ryanaiagent ryanaiagent self-assigned this Dec 17, 2025
@ryanaiagent ryanaiagent added the request clarification [Status] The maintainer need clarification or more information from the author label Dec 17, 2025
@ryanaiagent
Copy link
Collaborator

Hi @Templight41, Thank you for your contribution! We appreciate you taking the time to submit this pull request.
Could you clarify couple of questions.
I'm concerned about the long-term maintenance of scraping the pricing page directly. What happens when the Google Cloud team updates the page layout or CSS classes?
How does this implementation handle localization or different currency formats?

@Templight41
Copy link
Author

Hi @Templight41, Thank you for your contribution! We appreciate you taking the time to submit this pull request. Could you clarify couple of questions. I'm concerned about the long-term maintenance of scraping the pricing page directly. What happens when the Google Cloud team updates the page layout or CSS classes? How does this implementation handle localization or different currency formats?

This web scraping does not rely on CSS classes so it would be unaffected there, but i agree on the page layout causing an issue to this as this tool looks for the table with the models. I was unable to find any APIs to fetch the model pricing and had to rely on web scraping as the last resort.

For currency formats, USD seems just right as google itself provides the pricing in USD only. To get local currency formats we would require currency conversion based on the live conversion rates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation request clarification [Status] The maintainer need clarification or more information from the author web [Component] This issue will be transferred to adk-web

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants