You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enterprise-grade offline LLM API simulator for testing and development.
Overview
LLM-Simulator provides a drop-in replacement for production LLM APIs, enabling cost-effective, deterministic, and comprehensive testing of LLM-powered applications. It simulates OpenAI, Anthropic, and Google Gemini APIs with realistic latency, streaming support, and chaos engineering capabilities.
Structured Logging - JSON log format with trace correlation
Health Endpoints - Liveness (/health) and readiness (/ready) probes
High Performance
10,000+ RPS - Optimized async architecture
<5ms Overhead - Minimal latency impact
Graceful Shutdown - Connection draining support
Installation
From Source
# Clone the repository
git clone https://github.com/llm-devops/llm-simulator.git
cd llm-simulator
# Build release binary
cargo build --release
# Binary will be at ./target/release/llm-simulator
Requirements
Rust 1.75 or later
Linux, macOS, or Windows
Quick Start
Start the Server
# Start with default settings
llm-simulator serve
# Start with custom port and chaos enabled
llm-simulator serve --port 9090 --chaos --chaos-probability 0.1
# Start with authentication
llm-simulator serve --require-auth --api-key "sk-test-key"# Start with deterministic responses
llm-simulator serve --seed 42
# Single health check
llm-simulator health --url http://localhost:8080
# Watch mode with 5-second interval
llm-simulator health --url http://localhost:8080 --watch --interval 5
# Check readiness
llm-simulator health --url http://localhost:8080 --ready
The project includes a Rust SDK for programmatic access:
use llm_simulator::sdk::{Client,Provider};#[tokio::main]asyncfnmain() -> anyhow::Result<()>{// Create a clientlet client = Client::builder().base_url("http://localhost:8080").api_key("sk-test-key").default_model("gpt-4").timeout(std::time::Duration::from_secs(30)).max_retries(3).build()?;// Send a chat completion requestlet response = client
.chat().model("gpt-4").system("You are a helpful assistant.").message("What is the capital of France?").temperature(0.7).max_tokens(100).send().await?;println!("Response: {}", response.content());println!("Tokens used: {}", response.total_tokens());Ok(())}
Streaming
use futures::StreamExt;use llm_simulator::sdk::Client;#[tokio::main]asyncfnmain() -> anyhow::Result<()>{let client = Client::new("http://localhost:8080")?;letmut stream = client
.stream().model("gpt-4").message("Tell me a story").start().await?;whileletSome(chunk) = stream.next().await{ifletOk(c) = chunk {print!("{}", c.content);}}Ok(())}
Embeddings
use llm_simulator::sdk::Client;#[tokio::main]asyncfnmain() -> anyhow::Result<()>{let client = Client::new("http://localhost:8080")?;let result = client
.embeddings().model("text-embedding-3-small").input("Hello, world!").dimensions(1536).send().await?;println!("Embedding dimensions: {}", result.dimensions());println!("Tokens used: {}", result.total_tokens());Ok(())}
# Run all tests
cargo test# Run with output
cargo test -- --nocapture
# Run specific test suite
cargo test --test integration_tests
cargo test --test property_tests
# Run benchmarks
cargo bench
Performance
Benchmark results on a typical development machine:
Metric
Value
Throughput
15,000+ RPS
P50 Latency
0.8ms
P99 Latency
3.2ms
Memory Usage
~50MB base
License
This project is licensed under the LLM DevOps Permanent Source-Available License. See LICENSE for details.
Contributing
Contributions are welcome! Please read our contributing guidelines before submitting pull requests.