Open-source Prompt Firewall — deflect up to 95% of redundant LLM traffic before it leaves your infrastructure.
Pure Rust · Single Binary · Zero Hidden Telemetry · Air-Gappable
# Install (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh
# Configure your L3 provider (example: Groq)
isartor set-key -p groq
# Verify the provider and run the post-install showcase
isartor check
isartor demo
# Connect your AI tool (pick one)
# (or start the gateway directly if you're ready)
isartor up
isartor connect copilot # GitHub Copilot CLI
isartor connect claude # Claude Code
isartor connect claude-desktop # Claude Desktop
isartor connect cursor # Cursor IDE
isartor connect openclaw # OpenClaw
isartor connect codex # OpenAI Codex CLI
isartor connect gemini # Gemini CLI
isartor connect claude-copilot # Claude Code + GitHub CopilotThe best first-run path is: install → set key → check → demo → connect tool. isartor demo still works without an API key, but with a configured provider it now also shows a live upstream round-trip before the cache replay.
Terminal walkthrough: install Isartor, start the gateway, then run the demo showcase.
More install options (Docker · Windows · Build from source)
docker run -p 8080:8080 \
-e HF_HOME=/tmp/huggingface \
-v isartor-hf:/tmp/huggingface \
ghcr.io/isartor-ai/isartor:latest~120 MB compressed. Includes the
all-MiniLM-L6-v2embedding model and a statically linked Rust binary.
irm https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.ps1 | iexgit clone https://github.com/isartor-ai/Isartor.git
cd Isartor && cargo build --release
./target/release/isartor upIf you already know your provider credentials, the day-one path is:
curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh
isartor set-key -p groq
isartor check
isartor demo
isartor up --detach
isartor connect copilotAI coding agents and personal assistants repeat themselves — a lot. Copilot, Claude Code, Cursor, and OpenClaw send the same system instructions, the same context preambles, and often the same user prompts across every turn of a conversation. Standard API gateways forward all of it to cloud LLMs regardless.
Isartor sits between your tools and the cloud. It intercepts every prompt and runs a cascade of local algorithms — from sub-millisecond hashing to in-process neural inference — to resolve requests before they reach the network. Only the genuinely hard prompts make it through.
The result: lower costs, lower latency, and less data leaving your perimeter.
| Without Isartor | With Isartor | |
|---|---|---|
| Repeated prompts | Full cloud round-trip every time | Answered locally in < 1 ms |
| Similar prompts ("Price?" / "Cost?") | Full cloud round-trip every time | Matched semantically, answered locally in 1–5 ms |
| System instructions (CLAUDE.md, copilot-instructions) | Sent in full on every request | Deduplicated and compressed per session |
| Simple FAQ / data extraction | Routed to GPT-4 / Claude | Resolved by embedded SLM in 50–200 ms |
| Complex reasoning | Routed to cloud | Routed to cloud ✓ |
Every request passes through five layers. Only prompts that survive the full stack reach the cloud.
Request ──► L1a Exact Cache ──► L1b Semantic Cache ──► L2 SLM Router ──► L2.5 Context Optimiser ──► L3 Cloud
│ hit │ hit │ simple │ compressed │
▼ ▼ ▼ ▼ ▼
Instant Instant Local Answer Smaller Prompt Cloud Answer
| Layer | What It Does | How | Latency |
|---|---|---|---|
| L1a Exact Cache | Traps duplicate prompts and agent loops | ahash deterministic hashing |
< 1 ms |
| L1b Semantic Cache | Catches paraphrases ("Price?" ≈ "Cost?") | Cosine similarity via pure-Rust candle embeddings |
1–5 ms |
| L2 SLM Router | Resolves simple queries locally | Embedded Small Language Model (Qwen-1.5B via candle GGUF) |
50–200 ms |
| L2.5 Context Optimiser | Compresses repeated instructions per session | Dedup + minify (CLAUDE.md, copilot-instructions) | < 1 ms |
| L3 Cloud Logic | Routes complex prompts to OpenAI / Anthropic / Azure | Load balancing with retry and fallback | Network-bound |
| Workload | Deflection Rate | Detail |
|---|---|---|
| Warm agent session (Claude Code, 20 prompts) | 95% | L1a 80% · L1b 10% · L2 5% · L3 5% |
| Repetitive FAQ loop (1,000 prompts) | 60% | L1a 41% · L1b 19% · L3 40% |
| Diverse code-generation tasks (78 prompts) | 38% | Exact-match duplicates only; all unique tasks route to L3 |
P50 latency for a cache hit: 0.3 ms. Full benchmark methodology →
One command connects your favourite tool. No proxy, no MITM, no CA certificates.
| Tool | Command | Mechanism |
|---|---|---|
| GitHub Copilot CLI | isartor connect copilot |
MCP server (stdio or HTTP/SSE at /mcp/) |
| GitHub Copilot in VS Code | isartor connect copilot-vscode |
Managed settings.json debug overrides |
| OpenClaw | isartor connect openclaw |
Managed OpenClaw provider config (openclaw.json) |
| Claude Code | isartor connect claude |
ANTHROPIC_BASE_URL override |
| Claude Desktop | isartor connect claude-desktop |
Managed local MCP registration (isartor mcp) |
| Claude Code + Copilot | isartor connect claude-copilot |
Claude base URL + Copilot-backed L3 |
| Cursor IDE | isartor connect cursor |
Base URL + MCP registration at /mcp/ |
| OpenAI Codex CLI | isartor connect codex |
OPENAI_BASE_URL override |
| Gemini CLI | isartor connect gemini |
GEMINI_API_BASE_URL override |
| OpenCode | isartor connect opencode |
Global provider + auth config |
| Any OpenAI-compatible tool | isartor connect generic |
Configurable env var override |
OpenClaw note: use Isartor's OpenAI-compatible /v1 base path, not the root :8080 URL. If you change Isartor's gateway API key later, rerun isartor connect openclaw so OpenClaw's per-agent model registry refreshes too.
This is the honest version: Isartor is not trying to be every kind of AI platform. It is optimized for local-first prompt deflection in front of coding tools and OpenAI-compatible clients.
| Product | Public positioning | Best fit | Where Isartor differs |
|---|---|---|---|
| Isartor | Open-source prompt firewall and local deflection gateway | Teams that want redundant prompt traffic resolved locally before it hits the cloud | Single Rust binary, client connectors, exact+semantic cache, context compression, coding-agent-first workflow |
| LiteLLM | Open-source multi-provider LLM gateway with routing, fallbacks, and spend tracking | Teams that want one OpenAI-style API across many providers and models | LiteLLM is gateway/routing-first; Isartor is deflection-first and focuses on reducing traffic before cloud routing |
| Portkey | AI gateway, observability, guardrails, governance, and prompt management platform | Teams that want a broader managed production control plane for GenAI apps | Portkey emphasizes platform governance and observability; Isartor emphasizes local cache/SLM deflection in a self-hosted binary |
| Bifrost | Enterprise AI gateway with governance, guardrails, and MCP gateway positioning | Teams that want enterprise control, security, and production gateway features | Bifrost is enterprise-gateway oriented; Isartor is optimized for prompt firewall behavior and lightweight local deployment |
| Helicone | Routing, debugging, and observability for AI apps | Teams that primarily want analytics, traces, and request inspection | Helicone is observability-first; Isartor is designed to stop repeat traffic from leaving your perimeter in the first place |
The short version:
- choose Isartor when your problem is repeated coding-agent traffic, prompt firewalling, and local-first savings
- choose LiteLLM when your main problem is multi-provider routing and unified model access
- choose Portkey / Bifrost / Helicone when your center of gravity is broader gateway control, observability, or enterprise governance
Isartor is fully OpenAI-compatible and Anthropic-compatible. Point any existing SDK at it by changing one URL:
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-isartor-api-key",
)
# First call → routed to cloud (L3), cached on return
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain the builder pattern in Rust"}],
)
# Second identical call → answered from L1a cache in < 1 ms
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain the builder pattern in Rust"}],
)Works with the official Python/Node SDKs, LangChain, LlamaIndex, AutoGen, CrewAI, OpenClaw, or any OpenAI-compatible client.
The same binary adapts from a developer laptop to a multi-replica Kubernetes deployment. Switch modes entirely through environment variables — no code changes, no recompilation.
| Component | Laptop (Single Binary) | Enterprise (K8s) |
|---|---|---|
| L1a Cache | In-memory LRU | Redis cluster (shared across replicas) |
| L1b Embeddings | In-process candle BertModel |
External TEI sidecar |
| L2 SLM | Embedded candle GGUF inference |
Remote vLLM / TGI (GPU pool) |
| L2.5 Optimiser | In-process | In-process |
| L3 Cloud | Direct to provider | Direct to provider |
# Flip to enterprise mode — just env vars, same binary
export ISARTOR__CACHE_BACKEND=redis
export ISARTOR__REDIS_URL=redis://redis-cluster.svc:6379
export ISARTOR__ROUTER_BACKEND=vllm
export ISARTOR__VLLM_URL=http://vllm.svc:8000Built-in OpenTelemetry traces and Prometheus metrics — no extra instrumentation.
- Distributed traces — root span
gateway_requestwith child spans per layer (l1a_exact_cache,l1b_semantic_cache,l2_classify_intent,context_optimise,l3_cloud_llm). - Prometheus metrics —
isartor_request_duration_seconds,isartor_layer_duration_seconds,isartor_requests_total. - ROI tracking —
isartor_tokens_saved_totalcounts tokens that never left your infrastructure. Pipe it into Grafana to prove savings.
export ISARTOR__ENABLE_MONITORING=true
export ISARTOR__OTEL_EXPORTER_ENDPOINT=http://otel-collector:4317Today, Isartor is dogfooded by the Isartor AI engineering team for:
- connector development across Copilot, Claude, Cursor, and OpenClaw flows
- benchmark and release validation runs
- local-first prompt deflection testing during coding-agent workflows
We are keeping this section intentionally conservative until external teams explicitly opt in to being listed.
isartor up Start the API gateway
isartor up --detach Start in background
isartor logs --follow Follow detached Isartor logs
isartor up copilot Start gateway + Copilot CONNECT proxy
isartor stop Stop a running instance
isartor demo Run the post-install showcase (cache-only or live + cache)
isartor init Generate a commented config scaffold
isartor set-key -p openai Configure your LLM provider API key
isartor stats Prompt totals, layer hits, routing history
isartor stats --by-tool Per-tool cache hits, latency, errors
isartor update Self-update to the latest release
isartor connect <tool> Connect an AI tool (see integrations above)
📚 isartor-ai.github.io/Isartor
| Getting Started | Installation, first request, config basics |
| Architecture | Deflection Stack deep dive, trait provider pattern |
| Integrations | Copilot, Cursor, Claude, Codex, Gemini, generic |
| Deployment | Minimal → Sidecar → Enterprise (K8s) → Air-Gapped |
| Configuration | Every environment variable and config key |
| Observability | Spans, metrics, Grafana dashboards |
| Performance Tuning | Deflection measurement, SLO/SLA templates |
| Troubleshooting | Common issues, diagnostics, FAQ |
| Contributing | Dev setup, PR guidelines |
| Governance | Independence, license stability, decision-making |
Contributions welcome! See CONTRIBUTING.md for dev setup and PR guidelines.
cargo build && cargo test --all-features
cargo clippy --all-targets --all-features -- -D warningsApache License, Version 2.0 — see LICENSE.
Isartor is and will remain open source. No bait-and-switch relicensing. See GOVERNANCE.md for the full commitment.
If Isartor saves you tokens, consider giving it a ⭐

