Skip to content

c1nderscript/litellm-custom

Repository files navigation

LiteLLM Proxy Server

A configured instance of LiteLLM library running as an OpenAI-compatible API proxy. This proxy provides a unified interface to multiple LLM providers (OpenAI, Anthropic, DeepSeek, Groq, Mistral, Gemini, etc.) with features like load balancing, cost tracking, and failover.

Quick Start

1. Start the Server

cd /home/clara/Documents/AI_AND_API/LiteLLM/litellm
./start.sh

The server runs on port 8000 with usage-based routing by default.

2. Stop the Server

pkill -f "litellm --config"

3. Check Server Status

# Check if server is running
ps aux | grep "litellm --config"

# View logs
tail -f litellm.log

# Health check (if implemented)
curl http://localhost:8000/health

Configuration

Master Key & Authentication

Master Key: supersecretkey123

All requests to the proxy must include this bearer token in the Authorization header:

Authorization: Bearer supersecretkey123

API Endpoints

Endpoint Method Description
http://localhost:8000/v1/chat/completions POST Chat completions (OpenAI-compatible)
http://localhost:8000/v1/models GET List available models
http://localhost:8000/health GET Health check (inferred from logs)

Environment Variables

API keys for all providers are stored in .env. Important: This file contains live API keys and should be secured.

Key variables:

  • LITELLM_MASTER_KEY=supersecretkey123 (proxy authentication)
  • OPENROUTER_API_KEY, OPENROUTER_API_KEY_2-6 (6 OpenRouter keys for load balancing)
  • DEEPSEEK_API_KEY, GEMINI_API_KEY, GROQ_API_KEY, MISTRAL_API_KEY, etc.

Configuration Files

  • config.yaml: Primary configuration with usage-based routing
  • config_cost_based.yaml: Alternative configuration with cost-based routing (selects cheapest model)
  • start.sh: Startup script

Usage with Claude Code

Claude Code can be configured to use this LiteLLM proxy as its AI model gateway. The proxy provides OpenAI-compatible endpoints, so Claude Code needs to be configured to use the OpenAI SDK with the proxy base URL.

Option 1: Configure Claude Code Settings

Add the following to your Claude Code configuration (~/.claude.json or equivalent):

{
  "model": "openrouter-claude",  // Use the proxy's model name for Claude
  "api_base": "http://localhost:8000/v1",
  "api_key": "supersecretkey123",
  "api_type": "openai"  // Ensure using OpenAI-compatible API
}

Option 2: Use Environment Variables

# For OpenAI SDK compatibility
export OPENAI_API_KEY="supersecretkey123"
export OPENAI_API_BASE="http://localhost:8000/v1"

# For Claude Code specific variables (if supported)
export ANTHROPIC_API_BASE="http://localhost:8000/v1"
export ANTHROPIC_API_KEY="supersecretkey123"

Option 3: Direct API Integration

When Claude Code makes API calls, you can intercept and route them through the proxy by setting the appropriate base URL and authentication header. The proxy supports the following Claude models via OpenRouter:

  • openrouter-claude (Claude-3-Haiku via OpenRouter with load balancing across 6 API keys)

Testing Claude Code Integration

  1. Start the LiteLLM proxy: ./start.sh
  2. Test with a simple curl request using the openrouter-claude model:
    curl http://localhost:8000/v1/chat/completions \
      -H "Authorization: Bearer supersecretkey123" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "openrouter-claude",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'
  3. Configure Claude Code with the above settings and verify it routes through the proxy.

Note: Claude Code may require additional configuration to use OpenAI-compatible endpoints. Check Claude Code documentation for custom API base URL support.

Example API Calls

Chat Completion

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer supersecretkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

List Available Models

curl http://localhost:8000/v1/models \
  -H "Authorization: Bearer supersecretkey123"

Test Script

Use the provided test script:

python3 ../test_litellm.py

Available Models

The proxy provides access to multiple models through different providers:

Individual Models (Direct Access)

  • gpt-3.5-turbo (OpenAI)
  • deepseek-chat (DeepSeek)
  • gemini-pro (Gemini)
  • groq-llama3 (Groq)
  • cerebras-llama3 (Cerebras)
  • mistral-large (Mistral)
  • codestral (Codestral)
  • voyage-embed (Voyage)
  • morph (Morph)

OpenRouter Pool

  • openrouter-claude (Claude-3-Haiku via OpenRouter, 6 API keys for load balancing)

Free-Tier Aggregated Pool

  • free-tier (Smart logic pool with failover: includes DeepSeek, Mistral, Groq models, Gemini, Codestral, KiloCode, and more)

Groq Models (Extensive List)

  • allam-2-7b, llama-3.1-8b, llama-3.3-70b, llama-4-maverick-17b, llama-4-scout
  • whisper-large-v3, whisper-large-v3-turbo (audio transcription)
  • groq-compound, groq-compound-mini
  • llama-guard-4-12b, llama-prompt-guard-2-22m, llama-prompt-guard-2-86m
  • kimi-k2-instruct, kimi-k2-instruct-0905
  • gpt-oss-120b, gpt-oss-20b, gpt-oss-safeguard-20b
  • qwen3-32b

Routing Strategies

1. Usage-Based Routing (Default - config.yaml)

Distributes load across available models based on usage patterns.

2. Cost-Based Routing (config_cost_based.yaml)

Selects the cheapest model based on token costs.

Both strategies include:

  • 2 retries per failed request
  • 3 allowed fails before banning a model
  • 24-hour cooldown for banned models
  • 30-second timeout per request

Security Considerations

⚠️ CRITICAL SECURITY NOTES:

  1. API keys exposed: .env file contains live API keys for 17+ providers
  2. Weak master key: supersecretkey123 is hardcoded and should be changed
  3. Network exposure: Proxy runs on 0.0.0.0:8000 (accessible from network)
  4. No global rate limiting: Basic RPM/TPM limits per model but no global rate limiting
  5. No request logging/auditing: Only basic server logs in litellm.log

Recommended security improvements:

  1. Rotate all API keys in .env
  2. Generate strong random master key
  3. Implement IP-based rate limiting
  4. Add request logging and auditing
  5. Consider running behind reverse proxy with authentication
  6. Add .env to .gitignore if not already

Integration with Claude-Flow

The .claude-flow/ directory contains integration metrics and configuration for Claude-Flow multi-agent system. The LiteLLM proxy serves as the AI model gateway for Claude-Flow agents.

Troubleshooting

Server Won't Start

# Check port conflict
lsof -i :8000

# Check Python environment
.venv/bin/python --version

# Run with debug
.venv/bin/litellm --config config.yaml --port 8000 --debug

Models Failing

  1. Check .env for valid API keys
  2. Review rate limits in config
  3. Check litellm.log for specific errors
  4. Test individual model with curl

High Latency

  • Consider switching to config_cost_based.yaml
  • Review free-tier model failures
  • Check network connectivity to providers

Backup & Recovery

Critical files to backup:

  1. config.yaml - Primary configuration
  2. config_cost_based.yaml - Alternative configuration
  3. .env - API keys (store securely)
  4. start.sh - Startup script

Recovery procedure:

  1. Restore configuration files
  2. Update API keys in .env if necessary
  3. Start server: ./start.sh
  4. Test: python3 ../test_litellm.py

Dependencies

  • Python 3.13 (from virtual environment)
  • litellm==1.80.10
  • litellm_enterprise==0.1.25
  • litellm_proxy_extras==0.4.14

Support

For issues, check the logs in litellm.log or consult the LiteLLM documentation.


Note: This proxy is configured for Rakel's AI development stack and integrates with Claude-Flow for multi-agent AI workflows.

About

LiteLLM configuration for cheapest first, lowest rate limit first, fail fast autonomous coding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages