Releases: Metabuilder-Labs/tokenjam
v0.4.1 — concurrency + cycle-aware run-rate + Reuse HTTP fallback
A quality + follow-up release on top of v0.4.0's marquee. Six issues closed, each as a focused PR that landed cleanly. Pre-release pass: 10/10 PASS, 0 FAIL, 0 UNCLEAR.
🔧 Daemon DB concurrency — the Overview no longer crashes under fan-out
DuckDBBackend.conn is now a per-thread DuckDB cursor (threading.local) over one shared database. Cursors from connect().cursor() are independent connections safe for concurrent use across threads, which is the documented DuckDB pattern.
Pre-fix symptom: with tj serve running, the Overview's parallel endpoint fan-out + Starlette's sync-route threadpool would race on a single shared connection and SIGABRT the daemon. We worked around it by fetching the Overview's panels sequentially. That workaround is now gone — the Overview fetches all six panels via a single Promise.all with deliberately asymmetric error handling (/cost is load-bearing and rejects on failure; the other five panels each carry a .catch fallback so one failing panel renders empty rather than blanking the whole screen).
Empirically verified: 90 concurrent reads against the Overview endpoint set complete with zero errors. Pre-fix this would crash the process. Issue #124, PR #161.
🗓 Cycle-aware run-rate
tj cost and the Lens spend chart's run-rate projection now honor [budget.<provider>] cycle_start_day when configured. Calendar-month users (the default) still see "by end of June"; users with a billing cycle that starts mid-month see "by Jul 15" — a dated form, because "by end of July" would mislead when the cycle ends mid-month.
core/cycle.py is shared between the existing budget_projection analyzer and the cost-API caption, so both surfaces use one piece of cycle math. The API exposes a cycle block on /api/v1/cost; the UI consumes it. Issue #138, PR #158.
📈 tj cost chart no longer silently empties on long windows
buildCostSeries previously had if (xs.length > 5000) return null as a guard against pathological grids — fine for a 90-day cap today, but if anyone ever requests hourly bucketing over a multi-year window the chart would just go blank with no explanation.
It now coarsens up an hour → day → week ladder until the grid fits, computed closed-form so an oversized array is never allocated. When coarsening fires, a footnote on the chart explains it: "Showing day buckets — this range is too long for hourly detail." Genuinely empty windows still render the existing "No spend in this window" state. Issue #139, PR #163.
📊 Status: active (compute) time alongside wall-clock
Pre-fix, a resumed Claude Code session that ran across days showed Duration: 3087m — easy to misread as a runaway. v0.4.1 distinguishes:
- Active — Σ span durations. The work actually done.
- Elapsed — wall-clock from session start to last activity. Resumed Claude Code sessions span days; this can be far larger than Active.
Visible in tj status (Duration: active 12m 3s · elapsed 2d 3h), in tj status --json (both fields), and in the Lens Status tile (two distinct rows with tooltips). A new fmtDurLong formatter renders multi-day spans as 2d 3h instead of 3087m. Issue #147, PR #164.
🔁 tj report --reuse works while the daemon is running
The Reuse report needed direct DB access to fetch each cluster's planning-call completion text — so when tj serve held the write lock, the command errored out and pointed the user at tj stop. v0.4.1 adds a dedicated GET /api/v1/reuse/clusters endpoint that returns the Reuse finding plus the skeleton-rendering extras (planning_texts + pricing_mode).
tj report --reuse now dispatches like tj optimize: direct DB connection when available, ApiBackend.fetch_reuse_clusters when the daemon owns the lock. Renderer accepts pre-fetched planning texts as an alternative to a connection. The endpoint is dedicated rather than bolted onto /optimize because per-cluster planning text can be many KB and the Overview polls /optimize every 30s — we don't make every poll pay for report-only data.
tj report --trim retains the same direct-DB limitation, now explicitly documented in CLAUDE.md. Issue #154, PR #165.
🎨 Recoverable Waste tile consistency
Three small but visible inconsistencies on the Lens Overview's tile band:
reuserendered lowercase while the other analyzer names were title-cased. Added an explicitANALYZER_METAentry plus a centralizedcapitalize()helper used in the fallback path — so the next analyzer that ships will auto-capitalize instead of falling through to its raw lowercase registry key.- Trim's
— not readyhad a leading em-dash while the other tile states didn't. Dropped to plainNot readyso the three states share a prefix-free scheme. - Cache title bold investigated and locked with a regression guard. Couldn't reproduce in current source, but the guard test now asserts no state-specific bold-title rule can be silently added.
Upgrade
pipx upgrade tokenjam
tj stop && tj serve &
tj --version # expect 0.4.1Existing installs keep their config, data, and daemon setup. No breaking changes.
Pre-release verification
A 10-step focused pre-release pass (tests/agent-pre-release-v0.4.1.md) was executed by a sub-agent against the live daemon. Result: 10/10 PASS, 0 FAIL, 0 UNCLEAR.
Coverage included the marquee #124 concurrency reproduction (90 concurrent reads, no crash), /api/v1/cost.cycle block, tj report --reuse with daemon running, the new active/elapsed status fields, the 90d cost chart render, and the tile consistency fixes — plus regression checks on TokenMaxx, all five analyzers, and the API framing block.
Full log committed to tests/results/agent-pre-release-v0.4.1-20260620T002546Z.md as a release-time record.
Honesty discipline
The release continues the v0.4.0 framing rules:
- Run-rate captions remain "linear, not a forecast" — no smoothing, no seasonality, no anomaly bands; just a date-cycle projection
- Coarsened chart windows are explicit about the coarsening, never silent
- The Reuse HTTP fallback is documented as paying-for-what-you-use (dedicated endpoint instead of bolting it onto
/optimize); no hidden cost on Overview polls - Status
ActivevsElapsedlabels distinguish work-done from wall-clock; the misleading bare "Duration" label is gone
Full changelog
v0.4.0 — TokenJam Lens + Reuse
First minor bump since v0.3.0. Significant new product surface — substantially more than the 0.3.x patch cadence — so we bumped the minor.
🔭 TokenJam Lens — the local UI is a product
The local dashboard you get from tj serve has been rebranded and rebuilt as TokenJam Lens. It's the same offline-first single-file SPA, but with a real triage front door instead of a list of tables.
- Overview screen — the new default landing route. Three bands: spend hero (with a real chart and a "to end of month" run-rate projection that's explicitly not a forecast), recoverable-waste tiles (one per analyzer, registry-driven so future analyzers auto-appear), and health-at-a-glance (alerts, drift, budgets, recent activity).
- Optimize detail tab — every analyzer's findings rendered in one place, with
?finding=<name>deep-links from the Overview tiles. - Real charts — uPlot, vendored offline (zero CDN loads, the dashboard works air-gapped). Hover tooltips, theme-aware (light/dark/system), tick granularity scales with the window (hours/days/weeks).
- URL state as the source of truth — every filter (window, group-by, agent, finding) lives in the hash, so back/forward/reload work cleanly and cross-screen drill-through carries context.
- Cost transparency — the
tj costtable and the Cost screen now show CACHE R and CACHE W columns alongside input/output tokens, so a cache-heavy$1.44span no longer looks like it came out of nowhere. (The hidden ~91% cost driver on Claude Code-style workloads finally has a name.)
🔁 Reuse — the 5th analyzer
Agents re-plan the same work constantly. Reuse detects clusters of sessions that share a planning skeleton (the first LLM call before any tool call) and surfaces what that repeated planning costs.
tj optimize reuse— clusters sessions by structural similarity (tool-sequence signature, plus prompt-prefix hashing when[capture] prompts = true); produces two honest numbers per cluster: cache-reuse savings (what you'd recover by reusing the skeleton) and script-replacement savings (the upper bound if you converted the planning to a deterministic template).tj report --reuse— renders the clusters as an HTML report plus per-cluster Markdown sidecars with variable slots highlighted ({{slot_N}}), idempotent oncluster_idso re-runs overwrite cleanly. The Markdown is copy-paste-usable as a Claude Code slash command or saved prompt.- Honesty by construction: "skeleton match," "recoverable," "review before reusing" — never "saves you."
Known limitation: tj report --reuse needs direct DB access today, so tj stop first if the daemon is running. HTTP fallback is tracked for v0.4.1 (#154).
💰 Cost transparency
cache_write_tokensis now surfaced everywhere — intj cost, on the web Cost screen, in the trace-detail view, and via/api/v1/cost. Previously it was billed but invisible above the DB layer.- Plan-tier-aware rendering —
core/framing.pyis the single source of truth for whether to show dollars (api), token-share (subscription), tokens only (local), or a "may overstate" qualifier (unknown). The CLI and the REST API consume it identically. - Analyzer recoverable contract — every savings analyzer now emits
estimated_recoverable_usd+estimated_recoverable_tokens+estimate_basis+estimate_confidenceon a single time basis (window), so Overview tiles are directly comparable. - Honest run-rate — the Lens chart and
tj costprojection use a window-average run-rate × days remaining in cycle, captioned "linear run-rate, not a forecast." No EWMA, no seasonality, no anomaly bands.
🔒 Security
- A committed
.tj/config.tomlwith a liveingest_secret(in repo since v0.2.0) is now untracked. Limited blast radius — local network ingest token only — but a CI test now guards against re-staging it. See PR #145 for the full advisory.
🧹 Quality
- 17 v0.3.5 post-release findings closed (#141) from an external contributor; 5 rounds of Lens UI bug fixes; 9 individual fix PRs across the release. The full pre-release smoke pass ran twice and is committed under
tests/results/as a record.
Honesty discipline
This release is the most public-facing one yet. The framing language is deliberate everywhere:
- Recoverable amounts are estimated, never saved.
- Cache hits at 100% efficacy show "✓ Already optimized," not "no findings."
- Subscription users see token-share framing, not dollar figures.
- Forecasting is bounded to a single linear projection captioned "not a forecast."
Upgrade
pipx upgrade tokenjam
tj stop && tj serve &
tj --version # expect 0.4.0Existing installs keep their config, data, and daemon setup. Open the dashboard:
open http://127.0.0.1:7391/…and you'll land on the new Overview.
Full changelog
v0.3.5
First-run polish + bug fixes surfaced during the v0.3.5 pre-release playbook. No breaking changes — pipx upgrade tokenjam, then tj stop && tj serve & to reload the daemon.
Bug fixes
- #101 —
tj mcpworks out of the box on a fresh install.fastmcpmoved from the[mcp]extra into base dependencies, sopipx install tokenjamis enough to wire TokenJam into Claude Code without remembering an extra. The[mcp]extra is kept as a no-op for back-compat. The MCP server now also raises a clean, actionableImportErrorpointing at the fix iffastmcpis somehow missing. - #98 — "No pricing data" warning no longer spams during backfill. Warns once per
(provider, model)per process. Verified against a 20,000-span Claude Code backfill: zero warnings where pre-0.3.5 emitted hundreds. Deprecated Anthropic base models (claude-sonnet-4,claude-opus-4,claude-opus-4-1,claude-haiku-3-5) added topricing/models.tomlso dated variants likeclaude-sonnet-4-20250514resolve correctly via the YYYYMMDD-stripping fallback instead of falling through to defaults. - #106 — UI footer no longer shows a 9-release-old version.
tokenjam/__init__.pynow reads fromimportlib.metadata.version("tokenjam")(single source of truth =pyproject.toml). NewGET /api/v1/versionendpoint; the UI footer fetches it on init. Same-origin — offline-UI guarantee preserved. - #106 —
GET /healthendpoint added as a conventional uptime probe. Returns{"status":"ok","version":"..."}./api/v1/statuscontinues to be the agent overview. - #106 —
tj tokenmaxxplan-multiplier renders from project subdirs. When run from a directory whose.tj/config.tomlhas no[budget]section,_config_declared_plannow falls back to reading~/.config/tj/config.tomldirectly. Previously dropped silently to api-pricing framing even when the user hadplan = "max_5x"configured globally viatj onboard. - #105 —
tj report --trimnot-ready hint renders[capture]literally. Rich's print parser was silently swallowing the[capture]substring as an invalid style tag, hiding the section name the user needs to enable.
DX
tj --helpepilog now shows the canonical upgrade incantation (pipx upgrade tokenjam→tj stop && tj serve &→tj --version).docs/installation.mddocuments that[mcp]is no longer needed.- Pre-release and post-release manual test playbooks updated to cover the new surface (six-tier tokenmaxx ladder, offline-UI DevTools check, cache cost-correctness verification).
Upgrade
pipx upgrade tokenjam
tj stop && tj serve &
tj --version # expect 0.3.5Full changelog
v0.3.4 — Six-tier ladder + cache cost-accuracy fixes + offline UI
A mix of a user-visible product change (the tokenmaxx tier rename + expansion) and three credibility-grade infrastructure fixes — two for cost accuracy on the cache path, one for the local-first promise.
TokenMaxx ladder expanded to 6 tiers
The top tier was previously TokenGigaChad at 20×+ — but almost every Claude Code power user lands there, so the headline lost its bite. Two changes:
- Top tier raised to 50×+ so it reflects genuinely extreme usage, not everyday heavy use
- New tier in the middle for the 20–50× range
- Top two tiers renamed to drop the Chad branding
| Multiplier (subscription users) | Absolute /mo (API users) | Tier |
|---|---|---|
| < 1× | < $100 | 💧 TokenSipper |
| 1× – 4× | $100 – $400 | 🥱 TokenModerator |
| 4× – 10× | $400 – $1,000 | 💸 TokenMaxxer |
| 10× – 20× | $1,000 – $2,000 | 🔥 TokenSuperMaxxer (was TokenChad) |
| 20× – 50× | $2,000 – $5,000 | 🔥🔥 TokenMegaMaxxer (new) |
| 50×+ | $5,000+ | 🔥🔥🔥 TokenGigaMaxxer (was TokenGigaChad) |
Breaking note: the JSON output's `tier` field carries the new label string verbatim. Any consumer scripting against `TokenChad` / `TokenGigaChad` in `tj tokenmaxx --json` must update.
Cost-accuracy fixes (cache path)
Two related cache-billing fixes from community contributor @sjhddh, plus a follow-up to persist the raw counts.
Cache-only spans no longer costed at $0
A prompt-cache hit (`input_tokens=0`, `output_tokens=0`, but `cache_read_tokens > 0`) bills the cache-read rate, but `calculate_cost()` and `CostEngine.process_span()` were short-circuiting on input/output token counts alone — dropping the span as a no-op. The better your caching, the more cost went missing. The early-return guards now check all four token counts. (PR #90, kudos @sjhddh.)
Cache-creation tokens now costed on the live OTLP ingest path
The SDK integrations emit `gen_ai.usage.cache_creation_tokens` and the pricing table carried a `cache_write_per_mtok` rate, but the live parsers (`parse_otlp_span` + `convert_otel_span`) only read cache-read tokens. So every cache-write emitted via SDK was silently dropped, and the higher-rate cost never charged. `NormalizedSpan` now carries `cache_write_tokens`; both parsers populate it; `process_span` charges it. (PR #92, also @sjhddh.)
`cache_write_tokens` now persisted and threaded everywhere
- New `cache_write_tokens` column on the `spans` table (migration 5)
- The 3 remaining ingest paths (Langfuse, Helicone, Claude Code log adapter) now thread cache-creation tokens through `NormalizedSpan` for consistency with the live OTLP path
- Codex's `cached_token_count` deliberately not mapped to `cache_write` — OpenAI's automatic prompt caching only bills cache-reads at a discount and has no separate cache-creation billing
The `cache` analyzer can now compute creation vs read breakdowns from real data. (PR #95, closes #93 + #94.)
Web UI works fully offline
The `tj serve` dashboard at `http://127.0.0.1:7391/\` was loading three things from external CDNs at render time, which broke the "local-first, no data egress" promise for any user running TokenJam in an air-gapped environment to verify exactly that claim:
- Favicon SVG from `tokenjam.dev` → now inlined as a `data:` URL
- Geist + Geist Mono fonts from `fonts.googleapis.com` → removed; system-font fallbacks already in CSS
- Preact + hooks + htm from `esm.sh` → vendored under `tokenjam/ui/vendor/`, served via FastAPI `StaticFiles`, wired up via `<script type="importmap">` (JS source unchanged)
New regression tests catch any future external-URL slip. (PR #88, closes #87.)
`pipx install tokenjam` is the recommended install path
`pip install tokenjam` failed on macOS with Homebrew Python (PEP 668) and Debian 12+ / Ubuntu 24+. The unhelpful error broke the 3-command quickstart flow promised on `tokenjam.dev/tokenmaxxing`. All install snippets across the README, `docs/`, blog tutorials, and `examples/` now lead with `pipx install tokenjam`. Cross-platform pipx-install fallback table includes macOS / Debian/Ubuntu / Windows / generic. (PR #88 + PR #89, closes #86.)
Install
```
pipx install tokenjam==0.3.4
```
TypeScript SDK in lockstep as `@tokenjam/sdk@0.3.4`.
Thanks to @sjhddh for two excellent cost-accuracy fixes (PRs #90 + #92) and to everyone who flagged install + offline-UI issues during the 0.3.x rollout.
v0.3.3 — TokenMaxx report polish + Opus 4.5 pricing
A launch-readiness polish release for the tj tokenmaxx social moment, plus one more pricing-accuracy fix from a community contributor.
TokenMaxx Report — visual + structural polish
The tj tokenmaxx output is now a bordered report panel designed to be a clean screenshot artifact:
╭─ TokenJam TokenMaxxing Report ──────────────────────────────────────────────╮
│ │
│ 🔥🔥 You're a TokenGigaChad. │
│ │
│ Touch grass. Then run tj optimize. │
│ │
│ \$4056.82 in last 30d across 33 sessions. │
│ That's 40.6× your Max 5x plan cost (\$100/mo flat). │
│ │
│ 💡 No obvious savings flagged yet — run tj optimize for the full report │
│ once you have more data. │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
Share your tier: screenshot the above and tag @tokenjamdev
- Spend now renders at 2 decimals (
\$4056.82, was\$4056.8200) - Plan fee strips decimals when whole-dollar (
\$100, was\$100.0000) - `tj optimize` rendered bold-green wherever it appears
- Share line in teal, points at `@tokenjamdev`
TokenMaxx — plan-relative tier ladder
Tiers are now based on the multiplier vs your plan cost, so the tier name means the same thing across Pro / Max-5x / Max-20x users:
| Multiplier (subscription) | Absolute /mo (API) | Tier |
|---|---|---|
| < 1× | < $100 | 💧 TokenSipper |
| 1× – 4× | $100 – $400 | 🥱 TokenModerator |
| 4× – 10× | $400 – $1000 | 💸 TokenMaxxer |
| 10× – 20× | $1000 – $2000 | 🔥 TokenChad |
| 20×+ | $2000+ | 🔥🔥 TokenGigaChad |
API users (no plan to multiply against) fall back to absolute USD thresholds, calibrated against Max-5x = $100/mo so the tier name carries the same meaning in either world. A Pro user at 15× their plan and a Max-5x user at 15× their plan are both TokenChads — the tier reflects "how hard you're maxxing," not raw spend.
Pricing fix: Claude Opus 4.5
tokenjam/pricing/models.toml had Opus 4.5 at the old \$15 / \$75 tier. Anthropic moved 4.5 to \$5 / \$25 (same tier as 4.6 / 4.7 / 4.8). Users on 4.5 were seeing ~3× inflated cost figures; fixed in this release.
The repo-root pricing/models.toml (orphaned since v0.1.x) was also removed — the runtime only reads tokenjam/pricing/models.toml, and the duplicate file was confusing contributors. CLAUDE.md, CONTRIBUTING.md, and the fallback warning string all now point at the real path.
Thanks to @kelter-antunes for the catch and the dedupe.
Install
```
pip install tokenjam==0.3.3
```
TypeScript SDK in lockstep as `@tokenjam/sdk@0.3.3`.
v0.3.2 — TokenMaxx + Opus pricing accuracy
A user-facing feature, a cost-accuracy fix from a community contributor, and an upgrade-safe pricing escape hatch.
New: tj tokenmaxx
A shareable spend-tier command. Reads your last 30 days of spend, classifies it into an ironic tier (TokenSipper / TokenModerator / TokenMaxxer / TokenChad / TokenGigaChad), and surfaces the downsize savings figure inline so the score is always paired with an action.
🔥🔥 You're a TokenGigaChad.
"Touch grass. Then run `tj optimize`."
$3502.72 in last 30d across 33 sessions.
That's 35.0× your Max 5x plan cost ($100.00/mo flat).
💡 $340/mo of that looks recoverable. Run `tj optimize` to see candidates.
The tier ladder, monthly USD spend:
| Spend | Tier | One-liner |
|---|---|---|
| < $50 | 💧 TokenSipper | "Are you even using AI?" |
| $50–$200 | 🥱 TokenModerator | "Mostly reasonable. Try harder." |
| $200–$500 | 💸 TokenMaxxer | "You're paying Anthropic's rent." |
| $500–$1500 | 🔥 TokenChad | "You're paying their interns' rent too." |
| $1500+ | 🔥🔥 TokenGigaChad | "Touch grass. Then run tj optimize." |
Plan-aware: when [budget.<provider>] plan = "max_5x" (or pro / plus / max_20x) is declared, the output renders the multiplier vs the plan's flat fee — the figure that actually travels socially. API users see absolute spend; team / enterprise see plan label only (contract-priced fees).
--json flag for machine-readable output.
Pricing fix: Claude Opus 4.5 / 4.6 / 4.7 / 4.8
The packaged pricing table had Claude Opus models at the old $15 / $75 per MTok tier. Anthropic dropped Opus 4.5 onward to $5 / $25 per MTok (with $0.50 cache read, $6.25 5-minute cache write). The packaged rates now match Anthropic's published pricing exactly. Users running Opus 4.5–4.8 were seeing ~3× inflated cost figures; this fixes that.
Verified against platform.claude.com/docs/en/docs/about-claude/pricing. New regression tests guard against future drift.
Thanks to @kelter-antunes for catching and fixing this.
New: User pricing override file
You can now override packaged rates without editing site-packages (which pip install --upgrade clobbers). Resolution order:
TJ_PRICING_FILEenv var (if set)~/.config/tj/pricing.toml(if it exists)
Override entries are merged per provider/model over the packaged table — same TOML schema. Missing or malformed override files log a warning and fall back to packaged rates; never breaks cost calculation.
# ~/.config/tj/pricing.toml
[anthropic.some-future-model]
input_per_mtok = 4.00
output_per_mtok = 20.00
cache_read_per_mtok = 0.40
cache_write_per_mtok = 5.00Thanks to @kelter-antunes for this too.
Install
pip install tokenjam==0.3.2
TypeScript SDK in lockstep as @tokenjam/sdk@0.3.2.
v0.3.1 — Optimize CLI ergonomics
Fast-follow on v0.3.0 to make the optimize CLI match how users think and read about the products.
CLI changes
Positional analyzer args (was: --finding NAME)
tj optimize # run all (unchanged)
tj optimize downsize # run one
tj optimize downsize cache trim # run several
The old --finding NAME flag is removed. There are no aliases.
Analyzer names match the product names
| Old | New |
|---|---|
model-downgrade |
downsize |
cache-efficacy |
cache |
workflow-restructure |
script |
prompt-bloat |
trim |
cache-recommend (sub-finding of Cache) and budget-projection (infra concept) keep their names.
tj report --bloat → tj report --trim
Same motivation — "bloat" wasn't used anywhere outside the CLI.
What's unchanged
Honesty discipline: MODEL_DOWNGRADE_CAVEAT and all "structural match — review before applying" framing stay in place. This is a CLI rename only — no behavior or rendering changes.
Install
pip install tokenjam==0.3.1
TypeScript SDK published in lockstep as @tokenjam/sdk@0.3.1.
Closes #74.
v0.3.0 — Layer-9 cost optimization product
TokenJam pivots from "observability for AI agents" to a focused cost-optimization product. The OTel-native ingest pipeline and local-first architecture stay; four named analyzers ship on top.
New: cost-optimization analyzers
- Downsize (
tj optimize --finding model-downgrade) — structural candidate detection for cheaper-model routing. Honesty-disciplined: every flagged session is labeled "structural match — review before switching," never "safe to downgrade." - Cache (
cache-efficacy+cache-recommend) — current caching ratio per (provider, model) and Anthropic-only breakpoint suggestions for stable prefixes. - Script (
workflow-restructure) — clusters of deterministic (tool_name, arg_shape) sessions that look replaceable with a script. - Trim (
prompt-bloat) — LLMLingua-2 token-significance classifier behind the optionaltokenjam[bloat]extra (~2GB torch + transformers).
All analyzers self-register via @register("name") in tokenjam/core/optimize/analyzers/. Run all with tj optimize; scope to one with --finding <name> (repeatable).
New: backfill adapters
tj backfill langfuse|helicone|otlp — ingest from external observability platforms via live API or JSON dump. Idempotent re-runs via deterministic span IDs. Joins existing tj backfill claude-code.
New: HTTP API for analyzers
/api/v1/optimize + /api/v1/cost/compare so tj optimize works alongside a running tj serve (previously crashed on the DuckDB write-lock).
New: read-only policy preview
tj policy list consolidates [alerts], [capture], [budget.<provider>], per-agent budget / drift / sensitive_actions / output_schema config into one table. The unified add | edit | apply surface lands next sprint.
New: honest-output rendering
Plan-tier-aware optimize output:
- API users — dollar-denominated savings projections (unchanged)
- Subscription users — implied API value + token-share framing; never dollar "spend"
- Local users — token-only framing for capacity planning
- Unknown-plan users — dollar figures suppressed with a
tj onboard --reconfigurehint
tj optimize --export-config claude-code writes a JSONC routing snippet (with the structural-heuristic caveat baked in as comments) to ~/.config/tokenjam/exports/. Never touches ~/.claude/settings.json.
Codex CLI integration
tj onboard --codex writes [otel] + [mcp_servers.tj] to ~/.codex/config.toml. The new /v1/logs endpoint normalizes Codex event logs (sse_event, user_prompt, tool_decision, etc.) into spans for cost / drift / alerting.
Notable fixes
- SDK: fail-loud ERROR-level logging on 401 span exports with the configured-secret fingerprint, so silent data-loss is impossible. Was previously a single low-volume warning per batch.
- API:
/metricsaggregates byagent_idbefore emitting Prometheus rows. Previously emitted duplicate label sets that broke strict scrapers. - Doctor:
tj doctorno longer reports "DuckDB not writable" when the daemon legitimately holds the write lock. - Onboard: explicit error on bare
--reconfigure(was a silent early-return); secret-divergence warning when project-local and global configs disagree. - Drift / budget / backfill: every CLI command now works under both direct-DB and API-shim modes consistently.
- Optimize:
--compare last-7dandlast-30dnow override--sinceso the analysis window matches the comparison period (was 30d-vs-30d when--sincedefaulted to 30d). - Export: routing snippet now written as
.jsoncwith properly-indented comments — parseable by strict JSONC tooling. - Policy list:
--jsonaccepted as both root flag and command-level flag;[capture]row always shown (even when all toggles off — an explicit "off" is still a policy choice).
Install
pip install tokenjam==0.3.0
# Optional Trim analyzer (large download — pulls torch + transformers):
pip install 'tokenjam[bloat]==0.3.0'
TypeScript SDK published in lockstep as @tokenjam/sdk@0.3.0.
Acknowledgements
This release ran the full v0.3.x manual pre-release playbook before tagging — 14 sections covering analyzers, backfill, period comparison, config export, policy preview, server + HTTP fallback, web UI, and cleanup. The 8 polish findings surfaced during the playbook are all addressed in this release.
v0.2.3 — tj optimize
Features
tj optimize — cost-saving recommendations from existing data
Two analyzers run over your captured spans:
- Model-downgrade candidates — flags sessions whose structural shape (short input, short output, few tool calls) matches a class of work where a cheaper model in the same provider family is worth reviewing. Surfaces example traces; never claims quality equivalence (the caveat line is in the dataclass default so it can't be removed by accident).
- Budget projection — per-provider monthly projection against any
[budget.<provider>]ceiling. Scopes spend by provider, shows exhaustion date, projected overage, and what the run rate would drop to if you acted on the downgrade candidates.
tj optimize # both analyzers, last 30d
tj optimize --only budget
tj optimize --budget anthropic --budget-usd 50
tj optimize --json
Runs alongside a live tj serve via a read-only DuckDB fallback. Also exposed as the new get_optimize_report MCP tool — your coding agent can ask "where could I save money?" mid-session.
tj backfill claude-code
Reads ~/.claude/projects/*.jsonl and ingests historical sessions into the local DB. Idempotent (deterministic span IDs). Auto-invoked at the end of tj onboard --claude-code so first-time users have history immediately and tj optimize returns real numbers on first run.
[budget.<provider>] config
New TOML section for periodic monthly budgets. Distinct from [defaults.budget] / [agents.X.budget] (per-agent alert thresholds). tj onboard --claude-code writes a sensible default [budget.anthropic] usd = 200.
MCP server — 14 tools (up from 13)
Added get_optimize_report.
Fixes
- Pricing lookup tolerates dated
claude-<family>-<ver>-YYYYMMDDmodel-name suffixes Anthropic ships (e.g.claude-haiku-4-5-20251001). - Pricing table: added
claude-opus-4-7andclaude-opus-4-5.
Docs
- README now leads with the
tj optimizeUX (verbatim output) and embeds five Web UI screenshots. - CLAUDE.md gains a new rule codifying the honesty constraint on optimize output.
- Manual test runbooks updated with steps for the new commands.
Install / upgrade
pip install --upgrade tokenjam
npm install @tokenjam/sdk@0.2.3
Full changelog: v0.2.2...v0.2.3
v0.2.2
Highlights
New SDK surface
tokenjam.sdk.TokenJamClient— public HTTP client that POSTs a single LLM call as an OTLP JSON span to a runningtj serve, without depending on the in-process OTel TracerProvider. Designed to be embedded in foreign codebases — most notably the upstream BerriAI/litellm named-callback machinery, which will let any LiteLLM user enable TokenJam withlitellm.success_callback = ["tokenjam"]. The single public method,emit_litellm_span(kwargs, response_obj, start_time, end_time, success), translates LiteLLM's callback payload into provider / model / input / output / cache token attributes, attaches a precomputed cost fromkwargs["response_cost"](orresponse._hidden_params), and tags the span withtj_agent_id/tj_session_idif supplied viakwargs["metadata"]. Non-blocking by design — every error is logged atdebugand the event is dropped, so the client can never propagate an exception into the caller's request path. For in-process tokenjam users,patch_litellm()remains the preferred path. (#61)
Upgrading
pip install --upgrade tokenjam==0.2.2. No breaking changes; the TypeScript SDK is unchanged but bumped in lockstep at @tokenjam/sdk@0.2.2.