Skip to content

roadius2/Isartor

 
 

Isartor

Isartor

Open-source Prompt Firewall — deflect up to 95% of redundant LLM traffic before it leaves your infrastructure.

Pure Rust · Single Binary · Zero Hidden Telemetry · Air-Gappable

CI codecov License Release Downloads Discord Container Docs


Quick Start

# Install (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh

# Configure your L3 provider (example: Groq)
isartor set-key -p groq

# Verify the provider and run the post-install showcase
isartor check
isartor demo

# Connect your AI tool (pick one)
# (or start the gateway directly if you're ready)
isartor up
isartor connect copilot          # GitHub Copilot CLI
isartor connect claude           # Claude Code
isartor connect claude-desktop   # Claude Desktop
isartor connect cursor           # Cursor IDE
isartor connect openclaw         # OpenClaw
isartor connect codex            # OpenAI Codex CLI
isartor connect gemini           # Gemini CLI
isartor connect claude-copilot   # Claude Code + GitHub Copilot

The best first-run path is: install → set key → check → demo → connect tool. isartor demo still works without an API key, but with a configured provider it now also shows a live upstream round-trip before the cache replay.

See Isartor in the Terminal

Animated terminal walkthrough showing install, isartor up, and isartor demo

Terminal walkthrough: install Isartor, start the gateway, then run the demo showcase.

More install options (Docker · Windows · Build from source)

Docker

docker run -p 8080:8080 \
  -e HF_HOME=/tmp/huggingface \
  -v isartor-hf:/tmp/huggingface \
  ghcr.io/isartor-ai/isartor:latest

~120 MB compressed. Includes the all-MiniLM-L6-v2 embedding model and a statically linked Rust binary.

Windows (PowerShell)

irm https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.ps1 | iex

Build from source

git clone https://github.com/isartor-ai/Isartor.git
cd Isartor && cargo build --release
./target/release/isartor up

How It Works

If you already know your provider credentials, the day-one path is:

curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh
isartor set-key -p groq
isartor check
isartor demo
isartor up --detach
isartor connect copilot

Why Isartor?

AI coding agents and personal assistants repeat themselves — a lot. Copilot, Claude Code, Cursor, and OpenClaw send the same system instructions, the same context preambles, and often the same user prompts across every turn of a conversation. Standard API gateways forward all of it to cloud LLMs regardless.

Isartor sits between your tools and the cloud. It intercepts every prompt and runs a cascade of local algorithms — from sub-millisecond hashing to in-process neural inference — to resolve requests before they reach the network. Only the genuinely hard prompts make it through.

The result: lower costs, lower latency, and less data leaving your perimeter.

Without Isartor With Isartor
Repeated prompts Full cloud round-trip every time Answered locally in < 1 ms
Similar prompts ("Price?" / "Cost?") Full cloud round-trip every time Matched semantically, answered locally in 1–5 ms
System instructions (CLAUDE.md, copilot-instructions) Sent in full on every request Deduplicated and compressed per session
Simple FAQ / data extraction Routed to GPT-4 / Claude Resolved by embedded SLM in 50–200 ms
Complex reasoning Routed to cloud Routed to cloud ✓

The Deflection Stack

Every request passes through five layers. Only prompts that survive the full stack reach the cloud.

Request ──► L1a Exact Cache ──► L1b Semantic Cache ──► L2 SLM Router ──► L2.5 Context Optimiser ──► L3 Cloud
                 │ hit                │ hit                 │ simple             │ compressed               │
                 ▼                    ▼                     ▼                    ▼                          ▼
              Instant             Instant             Local Answer      Smaller Prompt             Cloud Answer
Layer What It Does How Latency
L1a Exact Cache Traps duplicate prompts and agent loops ahash deterministic hashing < 1 ms
L1b Semantic Cache Catches paraphrases ("Price?" ≈ "Cost?") Cosine similarity via pure-Rust candle embeddings 1–5 ms
L2 SLM Router Resolves simple queries locally Embedded Small Language Model (Qwen-1.5B via candle GGUF) 50–200 ms
L2.5 Context Optimiser Compresses repeated instructions per session Dedup + minify (CLAUDE.md, copilot-instructions) < 1 ms
L3 Cloud Logic Routes complex prompts to OpenAI / Anthropic / Azure Load balancing with retry and fallback Network-bound

Benchmark results

Workload Deflection Rate Detail
Warm agent session (Claude Code, 20 prompts) 95% L1a 80% · L1b 10% · L2 5% · L3 5%
Repetitive FAQ loop (1,000 prompts) 60% L1a 41% · L1b 19% · L3 40%
Diverse code-generation tasks (78 prompts) 38% Exact-match duplicates only; all unique tasks route to L3

P50 latency for a cache hit: 0.3 ms. Full benchmark methodology →


AI Tool Integrations

One command connects your favourite tool. No proxy, no MITM, no CA certificates.

Tool Command Mechanism
GitHub Copilot CLI isartor connect copilot MCP server (stdio or HTTP/SSE at /mcp/)
GitHub Copilot in VS Code isartor connect copilot-vscode Managed settings.json debug overrides
OpenClaw isartor connect openclaw Managed OpenClaw provider config (openclaw.json)
Claude Code isartor connect claude ANTHROPIC_BASE_URL override
Claude Desktop isartor connect claude-desktop Managed local MCP registration (isartor mcp)
Claude Code + Copilot isartor connect claude-copilot Claude base URL + Copilot-backed L3
Cursor IDE isartor connect cursor Base URL + MCP registration at /mcp/
OpenAI Codex CLI isartor connect codex OPENAI_BASE_URL override
Gemini CLI isartor connect gemini GEMINI_API_BASE_URL override
OpenCode isartor connect opencode Global provider + auth config
Any OpenAI-compatible tool isartor connect generic Configurable env var override

Full integration guides →

OpenClaw note: use Isartor's OpenAI-compatible /v1 base path, not the root :8080 URL. If you change Isartor's gateway API key later, rerun isartor connect openclaw so OpenClaw's per-agent model registry refreshes too.


How Isartor Compares

This is the honest version: Isartor is not trying to be every kind of AI platform. It is optimized for local-first prompt deflection in front of coding tools and OpenAI-compatible clients.

Product Public positioning Best fit Where Isartor differs
Isartor Open-source prompt firewall and local deflection gateway Teams that want redundant prompt traffic resolved locally before it hits the cloud Single Rust binary, client connectors, exact+semantic cache, context compression, coding-agent-first workflow
LiteLLM Open-source multi-provider LLM gateway with routing, fallbacks, and spend tracking Teams that want one OpenAI-style API across many providers and models LiteLLM is gateway/routing-first; Isartor is deflection-first and focuses on reducing traffic before cloud routing
Portkey AI gateway, observability, guardrails, governance, and prompt management platform Teams that want a broader managed production control plane for GenAI apps Portkey emphasizes platform governance and observability; Isartor emphasizes local cache/SLM deflection in a self-hosted binary
Bifrost Enterprise AI gateway with governance, guardrails, and MCP gateway positioning Teams that want enterprise control, security, and production gateway features Bifrost is enterprise-gateway oriented; Isartor is optimized for prompt firewall behavior and lightweight local deployment
Helicone Routing, debugging, and observability for AI apps Teams that primarily want analytics, traces, and request inspection Helicone is observability-first; Isartor is designed to stop repeat traffic from leaving your perimeter in the first place

The short version:

  • choose Isartor when your problem is repeated coding-agent traffic, prompt firewalling, and local-first savings
  • choose LiteLLM when your main problem is multi-provider routing and unified model access
  • choose Portkey / Bifrost / Helicone when your center of gravity is broader gateway control, observability, or enterprise governance

Drop-In for Any OpenAI SDK

Isartor is fully OpenAI-compatible and Anthropic-compatible. Point any existing SDK at it by changing one URL:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-isartor-api-key",
)

# First call → routed to cloud (L3), cached on return
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain the builder pattern in Rust"}],
)

# Second identical call → answered from L1a cache in < 1 ms
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain the builder pattern in Rust"}],
)

Works with the official Python/Node SDKs, LangChain, LlamaIndex, AutoGen, CrewAI, OpenClaw, or any OpenAI-compatible client.


Scales from Laptop to Cluster

The same binary adapts from a developer laptop to a multi-replica Kubernetes deployment. Switch modes entirely through environment variables — no code changes, no recompilation.

Component Laptop (Single Binary) Enterprise (K8s)
L1a Cache In-memory LRU Redis cluster (shared across replicas)
L1b Embeddings In-process candle BertModel External TEI sidecar
L2 SLM Embedded candle GGUF inference Remote vLLM / TGI (GPU pool)
L2.5 Optimiser In-process In-process
L3 Cloud Direct to provider Direct to provider
# Flip to enterprise mode — just env vars, same binary
export ISARTOR__CACHE_BACKEND=redis
export ISARTOR__REDIS_URL=redis://redis-cluster.svc:6379
export ISARTOR__ROUTER_BACKEND=vllm
export ISARTOR__VLLM_URL=http://vllm.svc:8000

Observability

Built-in OpenTelemetry traces and Prometheus metrics — no extra instrumentation.

  • Distributed traces — root span gateway_request with child spans per layer (l1a_exact_cache, l1b_semantic_cache, l2_classify_intent, context_optimise, l3_cloud_llm).
  • Prometheus metricsisartor_request_duration_seconds, isartor_layer_duration_seconds, isartor_requests_total.
  • ROI trackingisartor_tokens_saved_total counts tokens that never left your infrastructure. Pipe it into Grafana to prove savings.
export ISARTOR__ENABLE_MONITORING=true
export ISARTOR__OTEL_EXPORTER_ENDPOINT=http://otel-collector:4317

Observability guide →


Trusted By

Today, Isartor is dogfooded by the Isartor AI engineering team for:

  • connector development across Copilot, Claude, Cursor, and OpenClaw flows
  • benchmark and release validation runs
  • local-first prompt deflection testing during coding-agent workflows

We are keeping this section intentionally conservative until external teams explicitly opt in to being listed.


CLI Reference

isartor up                     Start the API gateway
isartor up --detach            Start in background
isartor logs --follow          Follow detached Isartor logs
isartor up copilot             Start gateway + Copilot CONNECT proxy
isartor stop                   Stop a running instance
isartor demo                   Run the post-install showcase (cache-only or live + cache)
isartor init                   Generate a commented config scaffold
isartor set-key -p openai      Configure your LLM provider API key
isartor stats                  Prompt totals, layer hits, routing history
isartor stats --by-tool        Per-tool cache hits, latency, errors
isartor update                 Self-update to the latest release
isartor connect <tool>         Connect an AI tool (see integrations above)

Documentation

📚 isartor-ai.github.io/Isartor

Getting Started Installation, first request, config basics
Architecture Deflection Stack deep dive, trait provider pattern
Integrations Copilot, Cursor, Claude, Codex, Gemini, generic
Deployment Minimal → Sidecar → Enterprise (K8s) → Air-Gapped
Configuration Every environment variable and config key
Observability Spans, metrics, Grafana dashboards
Performance Tuning Deflection measurement, SLO/SLA templates
Troubleshooting Common issues, diagnostics, FAQ
Contributing Dev setup, PR guidelines
Governance Independence, license stability, decision-making

Contributing

Contributions welcome! See CONTRIBUTING.md for dev setup and PR guidelines.

cargo build && cargo test --all-features
cargo clippy --all-targets --all-features -- -D warnings

License

Apache License, Version 2.0 — see LICENSE.

Isartor is and will remain open source. No bait-and-switch relicensing. See GOVERNANCE.md for the full commitment.


If Isartor saves you tokens, consider giving it a ⭐

About

Pure-Rust Prompt Firewall that eliminates unnecessary LLM cloud calls. Deflects repetitive agentic traffic locally for $0 using semantic caching + embedded SLM. -> Documents: https://isartor-ai.github.io/Isartor/

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Rust 81.4%
  • Python 13.6%
  • Shell 4.0%
  • Dockerfile 0.3%
  • PowerShell 0.3%
  • Makefile 0.3%
  • Other 0.1%