Skip to content

Releases: Metabuilder-Labs/tokenjam

v0.4.1 — concurrency + cycle-aware run-rate + Reuse HTTP fallback

20 Jun 00:38
d77000d

Choose a tag to compare

A quality + follow-up release on top of v0.4.0's marquee. Six issues closed, each as a focused PR that landed cleanly. Pre-release pass: 10/10 PASS, 0 FAIL, 0 UNCLEAR.

🔧 Daemon DB concurrency — the Overview no longer crashes under fan-out

DuckDBBackend.conn is now a per-thread DuckDB cursor (threading.local) over one shared database. Cursors from connect().cursor() are independent connections safe for concurrent use across threads, which is the documented DuckDB pattern.

Pre-fix symptom: with tj serve running, the Overview's parallel endpoint fan-out + Starlette's sync-route threadpool would race on a single shared connection and SIGABRT the daemon. We worked around it by fetching the Overview's panels sequentially. That workaround is now gone — the Overview fetches all six panels via a single Promise.all with deliberately asymmetric error handling (/cost is load-bearing and rejects on failure; the other five panels each carry a .catch fallback so one failing panel renders empty rather than blanking the whole screen).

Empirically verified: 90 concurrent reads against the Overview endpoint set complete with zero errors. Pre-fix this would crash the process. Issue #124, PR #161.

🗓 Cycle-aware run-rate

tj cost and the Lens spend chart's run-rate projection now honor [budget.<provider>] cycle_start_day when configured. Calendar-month users (the default) still see "by end of June"; users with a billing cycle that starts mid-month see "by Jul 15" — a dated form, because "by end of July" would mislead when the cycle ends mid-month.

core/cycle.py is shared between the existing budget_projection analyzer and the cost-API caption, so both surfaces use one piece of cycle math. The API exposes a cycle block on /api/v1/cost; the UI consumes it. Issue #138, PR #158.

📈 tj cost chart no longer silently empties on long windows

buildCostSeries previously had if (xs.length > 5000) return null as a guard against pathological grids — fine for a 90-day cap today, but if anyone ever requests hourly bucketing over a multi-year window the chart would just go blank with no explanation.

It now coarsens up an hour → day → week ladder until the grid fits, computed closed-form so an oversized array is never allocated. When coarsening fires, a footnote on the chart explains it: "Showing day buckets — this range is too long for hourly detail." Genuinely empty windows still render the existing "No spend in this window" state. Issue #139, PR #163.

📊 Status: active (compute) time alongside wall-clock

Pre-fix, a resumed Claude Code session that ran across days showed Duration: 3087m — easy to misread as a runaway. v0.4.1 distinguishes:

  • Active — Σ span durations. The work actually done.
  • Elapsed — wall-clock from session start to last activity. Resumed Claude Code sessions span days; this can be far larger than Active.

Visible in tj status (Duration: active 12m 3s · elapsed 2d 3h), in tj status --json (both fields), and in the Lens Status tile (two distinct rows with tooltips). A new fmtDurLong formatter renders multi-day spans as 2d 3h instead of 3087m. Issue #147, PR #164.

🔁 tj report --reuse works while the daemon is running

The Reuse report needed direct DB access to fetch each cluster's planning-call completion text — so when tj serve held the write lock, the command errored out and pointed the user at tj stop. v0.4.1 adds a dedicated GET /api/v1/reuse/clusters endpoint that returns the Reuse finding plus the skeleton-rendering extras (planning_texts + pricing_mode).

tj report --reuse now dispatches like tj optimize: direct DB connection when available, ApiBackend.fetch_reuse_clusters when the daemon owns the lock. Renderer accepts pre-fetched planning texts as an alternative to a connection. The endpoint is dedicated rather than bolted onto /optimize because per-cluster planning text can be many KB and the Overview polls /optimize every 30s — we don't make every poll pay for report-only data.

tj report --trim retains the same direct-DB limitation, now explicitly documented in CLAUDE.md. Issue #154, PR #165.

🎨 Recoverable Waste tile consistency

Three small but visible inconsistencies on the Lens Overview's tile band:

  • reuse rendered lowercase while the other analyzer names were title-cased. Added an explicit ANALYZER_META entry plus a centralized capitalize() helper used in the fallback path — so the next analyzer that ships will auto-capitalize instead of falling through to its raw lowercase registry key.
  • Trim's — not ready had a leading em-dash while the other tile states didn't. Dropped to plain Not ready so the three states share a prefix-free scheme.
  • Cache title bold investigated and locked with a regression guard. Couldn't reproduce in current source, but the guard test now asserts no state-specific bold-title rule can be silently added.

Issue #162, PR #166.

Upgrade

pipx upgrade tokenjam
tj stop && tj serve &
tj --version   # expect 0.4.1

Existing installs keep their config, data, and daemon setup. No breaking changes.

Pre-release verification

A 10-step focused pre-release pass (tests/agent-pre-release-v0.4.1.md) was executed by a sub-agent against the live daemon. Result: 10/10 PASS, 0 FAIL, 0 UNCLEAR.

Coverage included the marquee #124 concurrency reproduction (90 concurrent reads, no crash), /api/v1/cost.cycle block, tj report --reuse with daemon running, the new active/elapsed status fields, the 90d cost chart render, and the tile consistency fixes — plus regression checks on TokenMaxx, all five analyzers, and the API framing block.

Full log committed to tests/results/agent-pre-release-v0.4.1-20260620T002546Z.md as a release-time record.

Honesty discipline

The release continues the v0.4.0 framing rules:

  • Run-rate captions remain "linear, not a forecast" — no smoothing, no seasonality, no anomaly bands; just a date-cycle projection
  • Coarsened chart windows are explicit about the coarsening, never silent
  • The Reuse HTTP fallback is documented as paying-for-what-you-use (dedicated endpoint instead of bolting it onto /optimize); no hidden cost on Overview polls
  • Status Active vs Elapsed labels distinguish work-done from wall-clock; the misleading bare "Duration" label is gone

Full changelog

v0.4.0...v0.4.1

v0.4.0 — TokenJam Lens + Reuse

19 Jun 21:21
5f1d8a4

Choose a tag to compare

First minor bump since v0.3.0. Significant new product surface — substantially more than the 0.3.x patch cadence — so we bumped the minor.

🔭 TokenJam Lens — the local UI is a product

The local dashboard you get from tj serve has been rebranded and rebuilt as TokenJam Lens. It's the same offline-first single-file SPA, but with a real triage front door instead of a list of tables.

  • Overview screen — the new default landing route. Three bands: spend hero (with a real chart and a "to end of month" run-rate projection that's explicitly not a forecast), recoverable-waste tiles (one per analyzer, registry-driven so future analyzers auto-appear), and health-at-a-glance (alerts, drift, budgets, recent activity).
  • Optimize detail tab — every analyzer's findings rendered in one place, with ?finding=<name> deep-links from the Overview tiles.
  • Real charts — uPlot, vendored offline (zero CDN loads, the dashboard works air-gapped). Hover tooltips, theme-aware (light/dark/system), tick granularity scales with the window (hours/days/weeks).
  • URL state as the source of truth — every filter (window, group-by, agent, finding) lives in the hash, so back/forward/reload work cleanly and cross-screen drill-through carries context.
  • Cost transparency — the tj cost table and the Cost screen now show CACHE R and CACHE W columns alongside input/output tokens, so a cache-heavy $1.44 span no longer looks like it came out of nowhere. (The hidden ~91% cost driver on Claude Code-style workloads finally has a name.)

🔁 Reuse — the 5th analyzer

Agents re-plan the same work constantly. Reuse detects clusters of sessions that share a planning skeleton (the first LLM call before any tool call) and surfaces what that repeated planning costs.

  • tj optimize reuse — clusters sessions by structural similarity (tool-sequence signature, plus prompt-prefix hashing when [capture] prompts = true); produces two honest numbers per cluster: cache-reuse savings (what you'd recover by reusing the skeleton) and script-replacement savings (the upper bound if you converted the planning to a deterministic template).
  • tj report --reuse — renders the clusters as an HTML report plus per-cluster Markdown sidecars with variable slots highlighted ({{slot_N}}), idempotent on cluster_id so re-runs overwrite cleanly. The Markdown is copy-paste-usable as a Claude Code slash command or saved prompt.
  • Honesty by construction: "skeleton match," "recoverable," "review before reusing" — never "saves you."

Known limitation: tj report --reuse needs direct DB access today, so tj stop first if the daemon is running. HTTP fallback is tracked for v0.4.1 (#154).

💰 Cost transparency

  • cache_write_tokens is now surfaced everywhere — in tj cost, on the web Cost screen, in the trace-detail view, and via /api/v1/cost. Previously it was billed but invisible above the DB layer.
  • Plan-tier-aware renderingcore/framing.py is the single source of truth for whether to show dollars (api), token-share (subscription), tokens only (local), or a "may overstate" qualifier (unknown). The CLI and the REST API consume it identically.
  • Analyzer recoverable contract — every savings analyzer now emits estimated_recoverable_usd + estimated_recoverable_tokens + estimate_basis + estimate_confidence on a single time basis (window), so Overview tiles are directly comparable.
  • Honest run-rate — the Lens chart and tj cost projection use a window-average run-rate × days remaining in cycle, captioned "linear run-rate, not a forecast." No EWMA, no seasonality, no anomaly bands.

🔒 Security

  • A committed .tj/config.toml with a live ingest_secret (in repo since v0.2.0) is now untracked. Limited blast radius — local network ingest token only — but a CI test now guards against re-staging it. See PR #145 for the full advisory.

🧹 Quality

  • 17 v0.3.5 post-release findings closed (#141) from an external contributor; 5 rounds of Lens UI bug fixes; 9 individual fix PRs across the release. The full pre-release smoke pass ran twice and is committed under tests/results/ as a record.

Honesty discipline

This release is the most public-facing one yet. The framing language is deliberate everywhere:

  • Recoverable amounts are estimated, never saved.
  • Cache hits at 100% efficacy show "✓ Already optimized," not "no findings."
  • Subscription users see token-share framing, not dollar figures.
  • Forecasting is bounded to a single linear projection captioned "not a forecast."

Upgrade

pipx upgrade tokenjam
tj stop && tj serve &
tj --version   # expect 0.4.0

Existing installs keep their config, data, and daemon setup. Open the dashboard:

open http://127.0.0.1:7391/

…and you'll land on the new Overview.

Full changelog

v0.3.5...v0.4.0

v0.3.5

16 Jun 00:54
fbafb6f

Choose a tag to compare

First-run polish + bug fixes surfaced during the v0.3.5 pre-release playbook. No breaking changes — pipx upgrade tokenjam, then tj stop && tj serve & to reload the daemon.

Bug fixes

  • #101tj mcp works out of the box on a fresh install. fastmcp moved from the [mcp] extra into base dependencies, so pipx install tokenjam is enough to wire TokenJam into Claude Code without remembering an extra. The [mcp] extra is kept as a no-op for back-compat. The MCP server now also raises a clean, actionable ImportError pointing at the fix if fastmcp is somehow missing.
  • #98 — "No pricing data" warning no longer spams during backfill. Warns once per (provider, model) per process. Verified against a 20,000-span Claude Code backfill: zero warnings where pre-0.3.5 emitted hundreds. Deprecated Anthropic base models (claude-sonnet-4, claude-opus-4, claude-opus-4-1, claude-haiku-3-5) added to pricing/models.toml so dated variants like claude-sonnet-4-20250514 resolve correctly via the YYYYMMDD-stripping fallback instead of falling through to defaults.
  • #106 — UI footer no longer shows a 9-release-old version. tokenjam/__init__.py now reads from importlib.metadata.version("tokenjam") (single source of truth = pyproject.toml). New GET /api/v1/version endpoint; the UI footer fetches it on init. Same-origin — offline-UI guarantee preserved.
  • #106GET /health endpoint added as a conventional uptime probe. Returns {"status":"ok","version":"..."}. /api/v1/status continues to be the agent overview.
  • #106tj tokenmaxx plan-multiplier renders from project subdirs. When run from a directory whose .tj/config.toml has no [budget] section, _config_declared_plan now falls back to reading ~/.config/tj/config.toml directly. Previously dropped silently to api-pricing framing even when the user had plan = "max_5x" configured globally via tj onboard.
  • #105tj report --trim not-ready hint renders [capture] literally. Rich's print parser was silently swallowing the [capture] substring as an invalid style tag, hiding the section name the user needs to enable.

DX

  • tj --help epilog now shows the canonical upgrade incantation (pipx upgrade tokenjamtj stop && tj serve &tj --version).
  • docs/installation.md documents that [mcp] is no longer needed.
  • Pre-release and post-release manual test playbooks updated to cover the new surface (six-tier tokenmaxx ladder, offline-UI DevTools check, cache cost-correctness verification).

Upgrade

pipx upgrade tokenjam
tj stop && tj serve &
tj --version   # expect 0.3.5

Full changelog

v0.3.4...v0.3.5

v0.3.4 — Six-tier ladder + cache cost-accuracy fixes + offline UI

15 Jun 18:07
78584e6

Choose a tag to compare

A mix of a user-visible product change (the tokenmaxx tier rename + expansion) and three credibility-grade infrastructure fixes — two for cost accuracy on the cache path, one for the local-first promise.

TokenMaxx ladder expanded to 6 tiers

The top tier was previously TokenGigaChad at 20×+ — but almost every Claude Code power user lands there, so the headline lost its bite. Two changes:

  • Top tier raised to 50×+ so it reflects genuinely extreme usage, not everyday heavy use
  • New tier in the middle for the 20–50× range
  • Top two tiers renamed to drop the Chad branding
Multiplier (subscription users) Absolute /mo (API users) Tier
< 1× < $100 💧 TokenSipper
1× – 4× $100 – $400 🥱 TokenModerator
4× – 10× $400 – $1,000 💸 TokenMaxxer
10× – 20× $1,000 – $2,000 🔥 TokenSuperMaxxer (was TokenChad)
20× – 50× $2,000 – $5,000 🔥🔥 TokenMegaMaxxer (new)
50×+ $5,000+ 🔥🔥🔥 TokenGigaMaxxer (was TokenGigaChad)

Breaking note: the JSON output's `tier` field carries the new label string verbatim. Any consumer scripting against `TokenChad` / `TokenGigaChad` in `tj tokenmaxx --json` must update.

Cost-accuracy fixes (cache path)

Two related cache-billing fixes from community contributor @sjhddh, plus a follow-up to persist the raw counts.

Cache-only spans no longer costed at $0

A prompt-cache hit (`input_tokens=0`, `output_tokens=0`, but `cache_read_tokens > 0`) bills the cache-read rate, but `calculate_cost()` and `CostEngine.process_span()` were short-circuiting on input/output token counts alone — dropping the span as a no-op. The better your caching, the more cost went missing. The early-return guards now check all four token counts. (PR #90, kudos @sjhddh.)

Cache-creation tokens now costed on the live OTLP ingest path

The SDK integrations emit `gen_ai.usage.cache_creation_tokens` and the pricing table carried a `cache_write_per_mtok` rate, but the live parsers (`parse_otlp_span` + `convert_otel_span`) only read cache-read tokens. So every cache-write emitted via SDK was silently dropped, and the higher-rate cost never charged. `NormalizedSpan` now carries `cache_write_tokens`; both parsers populate it; `process_span` charges it. (PR #92, also @sjhddh.)

`cache_write_tokens` now persisted and threaded everywhere

  • New `cache_write_tokens` column on the `spans` table (migration 5)
  • The 3 remaining ingest paths (Langfuse, Helicone, Claude Code log adapter) now thread cache-creation tokens through `NormalizedSpan` for consistency with the live OTLP path
  • Codex's `cached_token_count` deliberately not mapped to `cache_write` — OpenAI's automatic prompt caching only bills cache-reads at a discount and has no separate cache-creation billing

The `cache` analyzer can now compute creation vs read breakdowns from real data. (PR #95, closes #93 + #94.)

Web UI works fully offline

The `tj serve` dashboard at `http://127.0.0.1:7391/\` was loading three things from external CDNs at render time, which broke the "local-first, no data egress" promise for any user running TokenJam in an air-gapped environment to verify exactly that claim:

  • Favicon SVG from `tokenjam.dev` → now inlined as a `data:` URL
  • Geist + Geist Mono fonts from `fonts.googleapis.com` → removed; system-font fallbacks already in CSS
  • Preact + hooks + htm from `esm.sh` → vendored under `tokenjam/ui/vendor/`, served via FastAPI `StaticFiles`, wired up via `<script type="importmap">` (JS source unchanged)

New regression tests catch any future external-URL slip. (PR #88, closes #87.)

`pipx install tokenjam` is the recommended install path

`pip install tokenjam` failed on macOS with Homebrew Python (PEP 668) and Debian 12+ / Ubuntu 24+. The unhelpful error broke the 3-command quickstart flow promised on `tokenjam.dev/tokenmaxxing`. All install snippets across the README, `docs/`, blog tutorials, and `examples/` now lead with `pipx install tokenjam`. Cross-platform pipx-install fallback table includes macOS / Debian/Ubuntu / Windows / generic. (PR #88 + PR #89, closes #86.)

Install

```
pipx install tokenjam==0.3.4
```

TypeScript SDK in lockstep as `@tokenjam/sdk@0.3.4`.


Thanks to @sjhddh for two excellent cost-accuracy fixes (PRs #90 + #92) and to everyone who flagged install + offline-UI issues during the 0.3.x rollout.

v0.3.3 — TokenMaxx report polish + Opus 4.5 pricing

09 Jun 18:34
d7145ee

Choose a tag to compare

A launch-readiness polish release for the tj tokenmaxx social moment, plus one more pricing-accuracy fix from a community contributor.

TokenMaxx Report — visual + structural polish

The tj tokenmaxx output is now a bordered report panel designed to be a clean screenshot artifact:

╭─ TokenJam TokenMaxxing Report ──────────────────────────────────────────────╮
│                                                                              │
│  🔥🔥 You're a TokenGigaChad.                                                │
│                                                                              │
│  Touch grass. Then run tj optimize.                                          │
│                                                                              │
│  \$4056.82 in last 30d across 33 sessions.                                    │
│  That's 40.6× your Max 5x plan cost (\$100/mo flat).                          │
│                                                                              │
│  💡 No obvious savings flagged yet — run tj optimize for the full report     │
│  once you have more data.                                                    │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
  Share your tier: screenshot the above and tag @tokenjamdev
  • Spend now renders at 2 decimals (\$4056.82, was \$4056.8200)
  • Plan fee strips decimals when whole-dollar (\$100, was \$100.0000)
  • `tj optimize` rendered bold-green wherever it appears
  • Share line in teal, points at `@tokenjamdev`

TokenMaxx — plan-relative tier ladder

Tiers are now based on the multiplier vs your plan cost, so the tier name means the same thing across Pro / Max-5x / Max-20x users:

Multiplier (subscription) Absolute /mo (API) Tier
< 1× < $100 💧 TokenSipper
1× – 4× $100 – $400 🥱 TokenModerator
4× – 10× $400 – $1000 💸 TokenMaxxer
10× – 20× $1000 – $2000 🔥 TokenChad
20×+ $2000+ 🔥🔥 TokenGigaChad

API users (no plan to multiply against) fall back to absolute USD thresholds, calibrated against Max-5x = $100/mo so the tier name carries the same meaning in either world. A Pro user at 15× their plan and a Max-5x user at 15× their plan are both TokenChads — the tier reflects "how hard you're maxxing," not raw spend.

Pricing fix: Claude Opus 4.5

tokenjam/pricing/models.toml had Opus 4.5 at the old \$15 / \$75 tier. Anthropic moved 4.5 to \$5 / \$25 (same tier as 4.6 / 4.7 / 4.8). Users on 4.5 were seeing ~3× inflated cost figures; fixed in this release.

The repo-root pricing/models.toml (orphaned since v0.1.x) was also removed — the runtime only reads tokenjam/pricing/models.toml, and the duplicate file was confusing contributors. CLAUDE.md, CONTRIBUTING.md, and the fallback warning string all now point at the real path.

Thanks to @kelter-antunes for the catch and the dedupe.

Install

```
pip install tokenjam==0.3.3
```

TypeScript SDK in lockstep as `@tokenjam/sdk@0.3.3`.

v0.3.2 — TokenMaxx + Opus pricing accuracy

09 Jun 02:49
ee0e0a1

Choose a tag to compare

A user-facing feature, a cost-accuracy fix from a community contributor, and an upgrade-safe pricing escape hatch.

New: tj tokenmaxx

A shareable spend-tier command. Reads your last 30 days of spend, classifies it into an ironic tier (TokenSipper / TokenModerator / TokenMaxxer / TokenChad / TokenGigaChad), and surfaces the downsize savings figure inline so the score is always paired with an action.

🔥🔥 You're a TokenGigaChad.
   "Touch grass. Then run `tj optimize`."

$3502.72 in last 30d across 33 sessions.
That's 35.0× your Max 5x plan cost ($100.00/mo flat).

💡 $340/mo of that looks recoverable. Run `tj optimize` to see candidates.

The tier ladder, monthly USD spend:

Spend Tier One-liner
< $50 💧 TokenSipper "Are you even using AI?"
$50–$200 🥱 TokenModerator "Mostly reasonable. Try harder."
$200–$500 💸 TokenMaxxer "You're paying Anthropic's rent."
$500–$1500 🔥 TokenChad "You're paying their interns' rent too."
$1500+ 🔥🔥 TokenGigaChad "Touch grass. Then run tj optimize."

Plan-aware: when [budget.<provider>] plan = "max_5x" (or pro / plus / max_20x) is declared, the output renders the multiplier vs the plan's flat fee — the figure that actually travels socially. API users see absolute spend; team / enterprise see plan label only (contract-priced fees).

--json flag for machine-readable output.

Pricing fix: Claude Opus 4.5 / 4.6 / 4.7 / 4.8

The packaged pricing table had Claude Opus models at the old $15 / $75 per MTok tier. Anthropic dropped Opus 4.5 onward to $5 / $25 per MTok (with $0.50 cache read, $6.25 5-minute cache write). The packaged rates now match Anthropic's published pricing exactly. Users running Opus 4.5–4.8 were seeing ~3× inflated cost figures; this fixes that.

Verified against platform.claude.com/docs/en/docs/about-claude/pricing. New regression tests guard against future drift.

Thanks to @kelter-antunes for catching and fixing this.

New: User pricing override file

You can now override packaged rates without editing site-packages (which pip install --upgrade clobbers). Resolution order:

  1. TJ_PRICING_FILE env var (if set)
  2. ~/.config/tj/pricing.toml (if it exists)

Override entries are merged per provider/model over the packaged table — same TOML schema. Missing or malformed override files log a warning and fall back to packaged rates; never breaks cost calculation.

# ~/.config/tj/pricing.toml
[anthropic.some-future-model]
input_per_mtok = 4.00
output_per_mtok = 20.00
cache_read_per_mtok = 0.40
cache_write_per_mtok = 5.00

Thanks to @kelter-antunes for this too.

Install

pip install tokenjam==0.3.2

TypeScript SDK in lockstep as @tokenjam/sdk@0.3.2.

v0.3.1 — Optimize CLI ergonomics

29 May 23:51
af4a46f

Choose a tag to compare

Fast-follow on v0.3.0 to make the optimize CLI match how users think and read about the products.

CLI changes

Positional analyzer args (was: --finding NAME)

tj optimize                       # run all (unchanged)
tj optimize downsize              # run one
tj optimize downsize cache trim   # run several

The old --finding NAME flag is removed. There are no aliases.

Analyzer names match the product names

Old New
model-downgrade downsize
cache-efficacy cache
workflow-restructure script
prompt-bloat trim

cache-recommend (sub-finding of Cache) and budget-projection (infra concept) keep their names.

tj report --bloattj report --trim

Same motivation — "bloat" wasn't used anywhere outside the CLI.

What's unchanged

Honesty discipline: MODEL_DOWNGRADE_CAVEAT and all "structural match — review before applying" framing stay in place. This is a CLI rename only — no behavior or rendering changes.

Install

pip install tokenjam==0.3.1

TypeScript SDK published in lockstep as @tokenjam/sdk@0.3.1.

Closes #74.

v0.3.0 — Layer-9 cost optimization product

29 May 15:56
46439a7

Choose a tag to compare

TokenJam pivots from "observability for AI agents" to a focused cost-optimization product. The OTel-native ingest pipeline and local-first architecture stay; four named analyzers ship on top.

New: cost-optimization analyzers

  • Downsize (tj optimize --finding model-downgrade) — structural candidate detection for cheaper-model routing. Honesty-disciplined: every flagged session is labeled "structural match — review before switching," never "safe to downgrade."
  • Cache (cache-efficacy + cache-recommend) — current caching ratio per (provider, model) and Anthropic-only breakpoint suggestions for stable prefixes.
  • Script (workflow-restructure) — clusters of deterministic (tool_name, arg_shape) sessions that look replaceable with a script.
  • Trim (prompt-bloat) — LLMLingua-2 token-significance classifier behind the optional tokenjam[bloat] extra (~2GB torch + transformers).

All analyzers self-register via @register("name") in tokenjam/core/optimize/analyzers/. Run all with tj optimize; scope to one with --finding <name> (repeatable).

New: backfill adapters

tj backfill langfuse|helicone|otlp — ingest from external observability platforms via live API or JSON dump. Idempotent re-runs via deterministic span IDs. Joins existing tj backfill claude-code.

New: HTTP API for analyzers

/api/v1/optimize + /api/v1/cost/compare so tj optimize works alongside a running tj serve (previously crashed on the DuckDB write-lock).

New: read-only policy preview

tj policy list consolidates [alerts], [capture], [budget.<provider>], per-agent budget / drift / sensitive_actions / output_schema config into one table. The unified add | edit | apply surface lands next sprint.

New: honest-output rendering

Plan-tier-aware optimize output:

  • API users — dollar-denominated savings projections (unchanged)
  • Subscription users — implied API value + token-share framing; never dollar "spend"
  • Local users — token-only framing for capacity planning
  • Unknown-plan users — dollar figures suppressed with a tj onboard --reconfigure hint

tj optimize --export-config claude-code writes a JSONC routing snippet (with the structural-heuristic caveat baked in as comments) to ~/.config/tokenjam/exports/. Never touches ~/.claude/settings.json.

Codex CLI integration

tj onboard --codex writes [otel] + [mcp_servers.tj] to ~/.codex/config.toml. The new /v1/logs endpoint normalizes Codex event logs (sse_event, user_prompt, tool_decision, etc.) into spans for cost / drift / alerting.

Notable fixes

  • SDK: fail-loud ERROR-level logging on 401 span exports with the configured-secret fingerprint, so silent data-loss is impossible. Was previously a single low-volume warning per batch.
  • API: /metrics aggregates by agent_id before emitting Prometheus rows. Previously emitted duplicate label sets that broke strict scrapers.
  • Doctor: tj doctor no longer reports "DuckDB not writable" when the daemon legitimately holds the write lock.
  • Onboard: explicit error on bare --reconfigure (was a silent early-return); secret-divergence warning when project-local and global configs disagree.
  • Drift / budget / backfill: every CLI command now works under both direct-DB and API-shim modes consistently.
  • Optimize: --compare last-7d and last-30d now override --since so the analysis window matches the comparison period (was 30d-vs-30d when --since defaulted to 30d).
  • Export: routing snippet now written as .jsonc with properly-indented comments — parseable by strict JSONC tooling.
  • Policy list: --json accepted as both root flag and command-level flag; [capture] row always shown (even when all toggles off — an explicit "off" is still a policy choice).

Install

pip install tokenjam==0.3.0
# Optional Trim analyzer (large download — pulls torch + transformers):
pip install 'tokenjam[bloat]==0.3.0'

TypeScript SDK published in lockstep as @tokenjam/sdk@0.3.0.

Acknowledgements

This release ran the full v0.3.x manual pre-release playbook before tagging — 14 sections covering analyzers, backfill, period comparison, config export, policy preview, server + HTTP fallback, web UI, and cleanup. The 8 polish findings surfaced during the playbook are all addressed in this release.

v0.2.3 — tj optimize

18 May 00:30
6ea3a82

Choose a tag to compare

Features

tj optimize — cost-saving recommendations from existing data

Two analyzers run over your captured spans:

  • Model-downgrade candidates — flags sessions whose structural shape (short input, short output, few tool calls) matches a class of work where a cheaper model in the same provider family is worth reviewing. Surfaces example traces; never claims quality equivalence (the caveat line is in the dataclass default so it can't be removed by accident).
  • Budget projection — per-provider monthly projection against any [budget.<provider>] ceiling. Scopes spend by provider, shows exhaustion date, projected overage, and what the run rate would drop to if you acted on the downgrade candidates.
tj optimize                                # both analyzers, last 30d
tj optimize --only budget
tj optimize --budget anthropic --budget-usd 50
tj optimize --json

Runs alongside a live tj serve via a read-only DuckDB fallback. Also exposed as the new get_optimize_report MCP tool — your coding agent can ask "where could I save money?" mid-session.

tj backfill claude-code

Reads ~/.claude/projects/*.jsonl and ingests historical sessions into the local DB. Idempotent (deterministic span IDs). Auto-invoked at the end of tj onboard --claude-code so first-time users have history immediately and tj optimize returns real numbers on first run.

[budget.<provider>] config

New TOML section for periodic monthly budgets. Distinct from [defaults.budget] / [agents.X.budget] (per-agent alert thresholds). tj onboard --claude-code writes a sensible default [budget.anthropic] usd = 200.

MCP server — 14 tools (up from 13)

Added get_optimize_report.

Fixes

  • Pricing lookup tolerates dated claude-<family>-<ver>-YYYYMMDD model-name suffixes Anthropic ships (e.g. claude-haiku-4-5-20251001).
  • Pricing table: added claude-opus-4-7 and claude-opus-4-5.

Docs

  • README now leads with the tj optimize UX (verbatim output) and embeds five Web UI screenshots.
  • CLAUDE.md gains a new rule codifying the honesty constraint on optimize output.
  • Manual test runbooks updated with steps for the new commands.

Install / upgrade

pip install --upgrade tokenjam
npm install @tokenjam/sdk@0.2.3

Full changelog: v0.2.2...v0.2.3

v0.2.2

12 May 23:31
432dd8f

Choose a tag to compare

Highlights

New SDK surface

  • tokenjam.sdk.TokenJamClient — public HTTP client that POSTs a single LLM call as an OTLP JSON span to a running tj serve, without depending on the in-process OTel TracerProvider. Designed to be embedded in foreign codebases — most notably the upstream BerriAI/litellm named-callback machinery, which will let any LiteLLM user enable TokenJam with litellm.success_callback = ["tokenjam"]. The single public method, emit_litellm_span(kwargs, response_obj, start_time, end_time, success), translates LiteLLM's callback payload into provider / model / input / output / cache token attributes, attaches a precomputed cost from kwargs["response_cost"] (or response._hidden_params), and tags the span with tj_agent_id / tj_session_id if supplied via kwargs["metadata"]. Non-blocking by design — every error is logged at debug and the event is dropped, so the client can never propagate an exception into the caller's request path. For in-process tokenjam users, patch_litellm() remains the preferred path. (#61)

Upgrading

pip install --upgrade tokenjam==0.2.2. No breaking changes; the TypeScript SDK is unchanged but bumped in lockstep at @tokenjam/sdk@0.2.2.