The runtime layer that sits between your AI tools and the LLM API — stripping waste, injecting caching, and showing you exactly where every token goes.
⚡ Quickstart ·
🔍 How it works ·
📊 Dashboard ·
🏢 Enterprise ·
⌨️ CLI ·
📚 Docs ·
Note
One env var. Zero code changes. Claude Code reads a package-lock.json — 122k tokens, $0.37 — just to answer a question about a 200-line file. History compounds. Your context window fills silently and quality degrades while you fly blind. skim fixes this in the API call path, in real time.
flowchart LR
A["🤖 Claude Code<br/>Cursor · your app"] -->|ANTHROPIC_BASE_URL| B1
subgraph SKIM ["⚡ skim proxy"]
direction TB
B1["✂️ strip lock files<br/>& build artifacts"]
B2["◈ inject prompt caching<br/>50–90% cheaper"]
B3["🛡️ enforce budgets<br/>hard 429 block"]
B4["📊 live dashboard<br/>+ local SQLite"]
B1 --> B2 --> B3 --> B4
end
B4 --> C["☁️ Anthropic<br/>OpenAI · Gemini"]
style A fill:#161920,stroke:#6c63ff,color:#e4e6f0
style SKIM fill:#0d0f14,stroke:#6c63ff,color:#6c63ff
style C fill:#161920,stroke:#00d4aa,color:#e4e6f0
style B1 fill:#161920,stroke:#252a3a,color:#e4e6f0
style B2 fill:#161920,stroke:#252a3a,color:#e4e6f0
style B3 fill:#161920,stroke:#252a3a,color:#e4e6f0
style B4 fill:#161920,stroke:#252a3a,color:#e4e6f0
|
1. Install pip install skim-llm2. Start the proxy skim proxyBrowser opens automatically to your live dashboard. 3. Point your tool at it export ANTHROPIC_API_KEY=sk-ant-... # required for Claude Code
export ANTHROPIC_BASE_URL=http://localhost:7474 |
That's it. Every call now flows through skim. |
Tip
skim auto-detects your plan — x-api-key for API users, Authorization: Bearer for OAuth clients — and routes each accordingly, with full waste filtering and tracking either way.
Warning
Claude Code on a Pro/Max subscription cannot use a local proxy. Subscription traffic ignores ANTHROPIC_BASE_URL and routes straight to Anthropic — the proxy will sit on "waiting for calls". To intercept Claude Code, use API-key auth (export ANTHROPIC_API_KEY=sk-ant-… alongside ANTHROPIC_BASE_URL, in the same shell before launching claude). skim also works as-is with Cursor, the SDK, and any OpenAI-compatible tool.
|
Waste filtering Detects lock files, build artifacts & generated code inside
|
Caching injection Wraps your system prompt + large context with First call caches it. Every call after is free. CLAUDE.md loads at zero cost on calls 2+. |
Live dashboard Opens in your browser on start. No login, no setup. Persists to Real-time SSE updates — watch tokens & cost as they happen. |
Auto-detected waste signatures
| File | Detected by |
|---|---|
package-lock.json |
"lockfileVersion" + "resolved": "https://" |
yarn.lock |
# yarn lockfile v1 + resolved |
pnpm-lock.yaml |
lockfileVersion: + resolution: |
Cargo.lock |
@generated + [[package]] |
poetry.lock |
@generated + [[package]] |
composer.lock |
"content-hash": + "packages": |
Plus anything in your project's .llmignore. Stripped blocks are replaced with a one-line note showing what was removed and how to disable it.
How plan detection works
One method, _auth_type(), owns all routing logic:
_auth_type() → ("apikey", key) # API plan → filtering + caching + tracking
→ ("oauth", token) # Pro/Max plan → filtering + tracking (no cache injection)
→ ("", "") # no auth → 401Adding a new plan type (enterprise SSO, team tokens) is a single elif. Caching injection is skipped for Pro/OAuth because the Pro plan manages its own cache layer.
Five fully-built pages. Dark theme, live charts, real-time SSE updates — no refresh button needed.
| 🟣 Overview | ⚡ Sessions | 📈 Usage | 🤖 Models | 💰 Savings |
|---|---|---|---|---|
| tokens, cost, savings, cache |
full call log, searchable |
hourly + daily charts |
cost/1k, cache %, waste % |
cumulative savings & ROI |
skim proxy # local dashboard, zero setup, opens in browserThe local dashboard works for everyone — solo devs, Pro users, anyone. Data never leaves your machine unless you explicitly connect a team server.
Important
Everything below is open-source and self-hosted — same pip package, no paywall, no telemetry.
|
Hard-block calls that exceed token/cost limits. Proxy returns skim admin budget set --owner-type team \
--owner-id engineering --usd 500 --period monthlySlack (& Teams) or any HTTP endpoint on budget events. skim admin webhooks add --channel slack \
--url https://hooks.slack.com/...Self-registration via single-use links. No manual accounts. skim admin users invite --email new@corp.com \
--role user --team platform |
Every sensitive action logged immutably. Queryable by action + date. skim admin audit --days 30 --action auth.loginCSV event logs + JSON summaries for accounting & BI. skim admin export --days 30 --out report.csv |
Team deployment in 3 commands
# 1. Run the server (auto-creates admin, uses gunicorn if installed)
pip install 'skim-llm[web]'
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475
# 2. Each developer connects their proxy
export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-... # generate in Settings
# 3. Manage from anywhere
skim admin users listAuth: local password · LDAP/AD (SKIM_LDAP_*) · Google/GitHub/Azure/Okta (SKIM_OIDC_*)
Full guide → docs/enterprise.md · docs/deployment.md
|
🔬 Static analysis no API key skim scan # token cost per file
skim analyze # detect waste patterns
skim fix # auto-write .llmignore
skim check # CI budget gate
skim generate # .llmignore + CLAUDE.md
skim secrets # leaked credential scan |
⚙️ Runtime & ops skim proxy # the interceptor
skim server # team dashboard + API
skim admin # manage users/budgets/keys
skim audit # local operation log
skim hooks # git pre-commit gate
skim baseline # token regression checks |
Example — skim fix auto-cleanup
skim fix — ./my-project
──────────────────────────────────────────────────────
Before : 166.8k tokens (83.4% ctx) $0.50/session
Pattern Severity Tokens saved Rules
────────────────────────────────────────────────────
Lock files HIGH 160.3k +7
Test snapshots MEDIUM 4.1k +2
✓ Written to .llmignore
After : 6.5k tokens (3.2% ctx) $0.02/session
Saved : 160.3k tokens (96.1% reduction) $0.48/session
Now : 51 sessions / $1
from adapters import ClaudeAdapter
claude = ClaudeAdapter(
model="claude-sonnet-4-6",
system_prompt="You are a terse coding assistant.",
enable_caching=True, # prompt caching, automatic
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037Adapters: ClaudeAdapter · OpenAIAdapter · GeminiAdapter · OllamaAdapter
pip install skim-llm # core — zero hard deps
pip install 'skim-llm[tiktoken]' # accurate token counting
pip install 'skim-llm[web]' # dashboard server
pip install 'skim-llm[web,sso,ldap]' # enterprise auth
pip install 'skim-llm[all]' # everything |
| Guide | What it covers |
|---|---|
| Quickstart | Zero to running in 2 minutes |
| Proxy | Deep-dive — every feature, every flag |
| Dashboard | Local & team dashboards |
| Enterprise | Budgets, webhooks, invites, RBAC, audit |
| Admin CLI | skim admin complete reference |
| REST API | All 31 endpoints with schemas |
| Configuration | Every env var & .skimrc option |
| Deployment | Docker, systemd, nginx, scaling |
| MCP Setup | Claude Desktop integration |
{ "mcpServers": { "skim": { "command": "skim-mcp" } } }Tools: scan_tokens · analyze_context · check_budget · fix_context · generate_llmignore