Skip to content

bb1nfosec/skim

skim

skim

Stop paying for tokens you never meant to send.

The runtime layer that sits between your AI tools and the LLM API — stripping waste, injecting caching, and showing you exactly where every token goes.


PyPI Downloads Python License Zero deps


⚡ Quickstart  ·  🔍 How it works  ·  📊 Dashboard  ·  🏢 Enterprise  ·  ⌨️ CLI  ·  📚 Docs  ·  ▶️ Live Demo


Note

One env var. Zero code changes. Claude Code reads a package-lock.json — 122k tokens, $0.37 — just to answer a question about a 200-line file. History compounds. Your context window fills silently and quality degrades while you fly blind. skim fixes this in the API call path, in real time.


flowchart LR
    A["🤖 Claude Code<br/>Cursor · your app"] -->|ANTHROPIC_BASE_URL| B1

    subgraph SKIM ["⚡ skim proxy"]
        direction TB
        B1["✂️ strip lock files<br/>& build artifacts"]
        B2["◈ inject prompt caching<br/>50–90% cheaper"]
        B3["🛡️ enforce budgets<br/>hard 429 block"]
        B4["📊 live dashboard<br/>+ local SQLite"]
        B1 --> B2 --> B3 --> B4
    end

    B4 --> C["☁️ Anthropic<br/>OpenAI · Gemini"]

    style A fill:#161920,stroke:#6c63ff,color:#e4e6f0
    style SKIM fill:#0d0f14,stroke:#6c63ff,color:#6c63ff
    style C fill:#161920,stroke:#00d4aa,color:#e4e6f0
    style B1 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B2 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B3 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B4 fill:#161920,stroke:#252a3a,color:#e4e6f0
Loading

⚡ Quickstart

1. Install

pip install skim-llm

2. Start the proxy

skim proxy

Browser opens automatically to your live dashboard.

3. Point your tool at it

export ANTHROPIC_API_KEY=sk-ant-...   # required for Claude Code
export ANTHROPIC_BASE_URL=http://localhost:7474

That's it. Every call now flows through skim.

┌────────────────────────────────────┐
│  skim v0.5.0  — runtime token proxy │
├────────────────────────────────────┤
│  listening  localhost:7474          │
│  dashboard  localhost:7474/dashboard│
│  filtering  ✓ on                    │
│  caching    ✓ on                    │
├────────────────────────────────────┤
│  ⠋ LIVE  waiting for calls...       │
└────────────────────────────────────┘

Tip

skim auto-detects your planx-api-key for API users, Authorization: Bearer for OAuth clients — and routes each accordingly, with full waste filtering and tracking either way.

Warning

Claude Code on a Pro/Max subscription cannot use a local proxy. Subscription traffic ignores ANTHROPIC_BASE_URL and routes straight to Anthropic — the proxy will sit on "waiting for calls". To intercept Claude Code, use API-key auth (export ANTHROPIC_API_KEY=sk-ant-… alongside ANTHROPIC_BASE_URL, in the same shell before launching claude). skim also works as-is with Cursor, the SDK, and any OpenAI-compatible tool.


🔍 How it works

✂️

Waste filtering

Detects lock files, build artifacts & generated code inside tool_result blocks and strips them before they hit your context.

package-lock.json → a 12-token note instead of 122k tokens.

Caching injection

Wraps your system prompt + large context with cache_control automatically.

First call caches it. Every call after is free. CLAUDE.md loads at zero cost on calls 2+.

📊

Live dashboard

Opens in your browser on start. No login, no setup. Persists to ~/.skim/events.db.

Real-time SSE updates — watch tokens & cost as they happen.

Auto-detected waste signatures
File Detected by
package-lock.json "lockfileVersion" + "resolved": "https://"
yarn.lock # yarn lockfile v1 + resolved
pnpm-lock.yaml lockfileVersion: + resolution:
Cargo.lock @generated + [[package]]
poetry.lock @generated + [[package]]
composer.lock "content-hash": + "packages":

Plus anything in your project's .llmignore. Stripped blocks are replaced with a one-line note showing what was removed and how to disable it.

How plan detection works

One method, _auth_type(), owns all routing logic:

_auth_type() → ("apikey", key)    # API plan      → filtering + caching + tracking
             → ("oauth",  token)  # Pro/Max plan  → filtering + tracking (no cache injection)
             → ("", "")           # no auth       → 401

Adding a new plan type (enterprise SSO, team tokens) is a single elif. Caching injection is skipped for Pro/OAuth because the Pro plan manages its own cache layer.


📊 Dashboard

Five fully-built pages. Dark theme, live charts, real-time SSE updates — no refresh button needed.

🟣 Overview ⚡ Sessions 📈 Usage 🤖 Models 💰 Savings
tokens, cost,
savings, cache
full call log,
searchable
hourly +
daily charts
cost/1k,
cache %, waste %
cumulative
savings & ROI
skim proxy              # local dashboard, zero setup, opens in browser

The local dashboard works for everyone — solo devs, Pro users, anyone. Data never leaves your machine unless you explicitly connect a team server.


🏢 Enterprise

Important

Everything below is open-source and self-hosted — same pip package, no paywall, no telemetry.

🛡️ Budget enforcement

Hard-block calls that exceed token/cost limits. Proxy returns 429 before forwarding.

skim admin budget set --owner-type team \
  --owner-id engineering --usd 500 --period monthly

🔔 Webhook alerts

Slack (& Teams) or any HTTP endpoint on budget events.

skim admin webhooks add --channel slack \
  --url https://hooks.slack.com/...

✉️ User invites

Self-registration via single-use links. No manual accounts.

skim admin users invite --email new@corp.com \
  --role user --team platform

🔑 Scoped API keys

ingest · read · admin — with expiry dates and revocation.

👥 RBAC

admin · team_admin · user — enforced data isolation per role.

📋 Audit log

Every sensitive action logged immutably. Queryable by action + date.

skim admin audit --days 30 --action auth.login

📤 Data export

CSV event logs + JSON summaries for accounting & BI.

skim admin export --days 30 --out report.csv
Team deployment in 3 commands
# 1. Run the server (auto-creates admin, uses gunicorn if installed)
pip install 'skim-llm[web]'
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475

# 2. Each developer connects their proxy
export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-...     # generate in Settings

# 3. Manage from anywhere
skim admin users list

Auth: local password · LDAP/AD (SKIM_LDAP_*) · Google/GitHub/Azure/Okta (SKIM_OIDC_*)

Full guide → docs/enterprise.md · docs/deployment.md


⌨️ CLI Reference

🔬 Static analysis  no API key

skim scan       # token cost per file
skim analyze    # detect waste patterns
skim fix        # auto-write .llmignore
skim check      # CI budget gate
skim generate   # .llmignore + CLAUDE.md
skim secrets    # leaked credential scan

⚙️ Runtime & ops

skim proxy      # the interceptor
skim server     # team dashboard + API
skim admin      # manage users/budgets/keys
skim audit      # local operation log
skim hooks      # git pre-commit gate
skim baseline   # token regression checks
Example — skim fix auto-cleanup
  skim fix  —  ./my-project
  ──────────────────────────────────────────────────────
  Before  : 166.8k tokens  (83.4% ctx)  $0.50/session

  Pattern              Severity    Tokens saved  Rules
  ────────────────────────────────────────────────────
  Lock files           HIGH           160.3k     +7
  Test snapshots       MEDIUM           4.1k     +2

  ✓ Written to .llmignore

  After   : 6.5k tokens  (3.2% ctx)  $0.02/session
  Saved   : 160.3k tokens  (96.1% reduction)  $0.48/session
  Now     : 51 sessions / $1

🐍 Python API

from adapters import ClaudeAdapter

claude = ClaudeAdapter(
    model="claude-sonnet-4-6",
    system_prompt="You are a terse coding assistant.",
    enable_caching=True,          # prompt caching, automatic
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037

Adapters: ClaudeAdapter · OpenAIAdapter · GeminiAdapter · OllamaAdapter


📦 Install

pip install skim-llm                    # core — zero hard deps
pip install 'skim-llm[tiktoken]'        # accurate token counting
pip install 'skim-llm[web]'             # dashboard server
pip install 'skim-llm[web,sso,ldap]'    # enterprise auth
pip install 'skim-llm[all]'             # everything

📚 Documentation

Guide What it covers
Quickstart Zero to running in 2 minutes
Proxy Deep-dive — every feature, every flag
Dashboard Local & team dashboards
Enterprise Budgets, webhooks, invites, RBAC, audit
Admin CLI skim admin complete reference
REST API All 31 endpoints with schemas
Configuration Every env var & .skimrc option
Deployment Docker, systemd, nginx, scaling
MCP Setup Claude Desktop integration

🔌 MCP Server

{ "mcpServers": { "skim": { "command": "skim-mcp" } } }

Tools: scan_tokens · analyze_context · check_budget · fix_context · generate_llmignore



GitHub · PyPI · Issues · Changelog · Live Demo

Built for developers who'd rather not pay for noise. · MIT License

⭐ Star the repo if skim saved you some tokens.

About

Runtime token proxy + intelligence dashboard for LLM tools. Intercepts API calls, strips waste, tracks costs — for individual developers and Fortune 500 teams.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors