LiteLLM Proxy Server

A configured instance of LiteLLM library running as an OpenAI-compatible API proxy. This proxy provides a unified interface to multiple LLM providers (OpenAI, Anthropic, DeepSeek, Groq, Mistral, Gemini, etc.) with features like load balancing, cost tracking, and failover.

Quick Start

1. Start the Server

cd /home/clara/Documents/AI_AND_API/LiteLLM/litellm
./start.sh

The server runs on port 8000 with usage-based routing by default.

2. Stop the Server

pkill -f "litellm --config"

3. Check Server Status

# Check if server is running
ps aux | grep "litellm --config"

# View logs
tail -f litellm.log

# Health check (if implemented)
curl http://localhost:8000/health

Configuration

Master Key & Authentication

Master Key: supersecretkey123

All requests to the proxy must include this bearer token in the Authorization header:

Authorization: Bearer supersecretkey123

API Endpoints

Endpoint	Method	Description
`http://localhost:8000/v1/chat/completions`	POST	Chat completions (OpenAI-compatible)
`http://localhost:8000/v1/models`	GET	List available models
`http://localhost:8000/health`	GET	Health check (inferred from logs)

Environment Variables

API keys for all providers are stored in .env. Important: This file contains live API keys and should be secured.

Key variables:

LITELLM_MASTER_KEY=supersecretkey123 (proxy authentication)
OPENROUTER_API_KEY, OPENROUTER_API_KEY_2-6 (6 OpenRouter keys for load balancing)
DEEPSEEK_API_KEY, GEMINI_API_KEY, GROQ_API_KEY, MISTRAL_API_KEY, etc.

Configuration Files

config.yaml: Primary configuration with usage-based routing
config_cost_based.yaml: Alternative configuration with cost-based routing (selects cheapest model)
start.sh: Startup script

Usage with Claude Code

Claude Code can be configured to use this LiteLLM proxy as its AI model gateway. The proxy provides OpenAI-compatible endpoints, so Claude Code needs to be configured to use the OpenAI SDK with the proxy base URL.

Option 1: Configure Claude Code Settings

Add the following to your Claude Code configuration (~/.claude.json or equivalent):

{
  "model": "openrouter-claude",  // Use the proxy's model name for Claude
  "api_base": "http://localhost:8000/v1",
  "api_key": "supersecretkey123",
  "api_type": "openai"  // Ensure using OpenAI-compatible API
}

Option 2: Use Environment Variables

# For OpenAI SDK compatibility
export OPENAI_API_KEY="supersecretkey123"
export OPENAI_API_BASE="http://localhost:8000/v1"

# For Claude Code specific variables (if supported)
export ANTHROPIC_API_BASE="http://localhost:8000/v1"
export ANTHROPIC_API_KEY="supersecretkey123"

Option 3: Direct API Integration

When Claude Code makes API calls, you can intercept and route them through the proxy by setting the appropriate base URL and authentication header. The proxy supports the following Claude models via OpenRouter:

openrouter-claude (Claude-3-Haiku via OpenRouter with load balancing across 6 API keys)

Testing Claude Code Integration

Start the LiteLLM proxy: ./start.sh

Test with a simple curl request using the openrouter-claude model:

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer supersecretkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter-claude",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Configure Claude Code with the above settings and verify it routes through the proxy.

Note: Claude Code may require additional configuration to use OpenAI-compatible endpoints. Check Claude Code documentation for custom API base URL support.

Example API Calls

Chat Completion

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer supersecretkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

List Available Models

curl http://localhost:8000/v1/models \
  -H "Authorization: Bearer supersecretkey123"

Test Script

Use the provided test script:

python3 ../test_litellm.py

Available Models

The proxy provides access to multiple models through different providers:

Individual Models (Direct Access)

gpt-3.5-turbo (OpenAI)
deepseek-chat (DeepSeek)
gemini-pro (Gemini)
groq-llama3 (Groq)
cerebras-llama3 (Cerebras)
mistral-large (Mistral)
codestral (Codestral)
voyage-embed (Voyage)
morph (Morph)

OpenRouter Pool

openrouter-claude (Claude-3-Haiku via OpenRouter, 6 API keys for load balancing)

Free-Tier Aggregated Pool

free-tier (Smart logic pool with failover: includes DeepSeek, Mistral, Groq models, Gemini, Codestral, KiloCode, and more)

Groq Models (Extensive List)

allam-2-7b, llama-3.1-8b, llama-3.3-70b, llama-4-maverick-17b, llama-4-scout
whisper-large-v3, whisper-large-v3-turbo (audio transcription)
groq-compound, groq-compound-mini
llama-guard-4-12b, llama-prompt-guard-2-22m, llama-prompt-guard-2-86m
kimi-k2-instruct, kimi-k2-instruct-0905
gpt-oss-120b, gpt-oss-20b, gpt-oss-safeguard-20b
qwen3-32b

Routing Strategies

1. Usage-Based Routing (Default - `config.yaml`)

Distributes load across available models based on usage patterns.

2. Cost-Based Routing (`config_cost_based.yaml`)

Selects the cheapest model based on token costs.

Both strategies include:

2 retries per failed request
3 allowed fails before banning a model
24-hour cooldown for banned models
30-second timeout per request

Security Considerations

⚠️ CRITICAL SECURITY NOTES:

API keys exposed: .env file contains live API keys for 17+ providers
Weak master key: supersecretkey123 is hardcoded and should be changed
Network exposure: Proxy runs on 0.0.0.0:8000 (accessible from network)
No global rate limiting: Basic RPM/TPM limits per model but no global rate limiting
No request logging/auditing: Only basic server logs in litellm.log

Recommended security improvements:

Rotate all API keys in .env
Generate strong random master key
Implement IP-based rate limiting
Add request logging and auditing
Consider running behind reverse proxy with authentication
Add .env to .gitignore if not already

Integration with Claude-Flow

The .claude-flow/ directory contains integration metrics and configuration for Claude-Flow multi-agent system. The LiteLLM proxy serves as the AI model gateway for Claude-Flow agents.

Troubleshooting

Server Won't Start

# Check port conflict
lsof -i :8000

# Check Python environment
.venv/bin/python --version

# Run with debug
.venv/bin/litellm --config config.yaml --port 8000 --debug

Models Failing

Check .env for valid API keys
Review rate limits in config
Check litellm.log for specific errors
Test individual model with curl

High Latency

Consider switching to config_cost_based.yaml
Review free-tier model failures
Check network connectivity to providers

Backup & Recovery

Critical files to backup:

config.yaml - Primary configuration
config_cost_based.yaml - Alternative configuration
.env - API keys (store securely)
start.sh - Startup script

Recovery procedure:

Restore configuration files
Update API keys in .env if necessary
Start server: ./start.sh
Test: python3 ../test_litellm.py

Dependencies

Python 3.13 (from virtual environment)
litellm==1.80.10
litellm_enterprise==0.1.25
litellm_proxy_extras==0.4.14

Support

For issues, check the logs in litellm.log or consult the LiteLLM documentation.

Note: This proxy is configured for Rakel's AI development stack and integrates with Claude-Flow for multi-agent AI workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.swarm		.swarm
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
README_WORKFLOW.md		README_WORKFLOW.md
config.yaml		config.yaml
config_cost_based.yaml		config_cost_based.yaml
configuration_specification.md		configuration_specification.md
documentation_requirements.md		documentation_requirements.md
start.sh		start.sh
test_fix.sh		test_fix.sh
validation_checklist.md		validation_checklist.md
workflow_cheapest_first_fail_fast.md		workflow_cheapest_first_fail_fast.md

c1nderscript/litellm-custom

Folders and files

Latest commit

History

Repository files navigation