Skip to content

feat(ai): Add LiteLLM-like router infrastructure (#286)#439

Open
matiasmagni wants to merge 1 commit intoarakoodev:tsfrom
matiasmagni:feature/litellm-router-286-v2
Open

feat(ai): Add LiteLLM-like router infrastructure (#286)#439
matiasmagni wants to merge 1 commit intoarakoodev:tsfrom
matiasmagni:feature/litellm-router-286-v2

Conversation

@matiasmagni
Copy link
Copy Markdown

@matiasmagni matiasmagni commented Mar 18, 2026

Summary

  • Implement load balancing between multiple LLM deployments (OpenAI, Google Palm/Gemini, Cohere)
  • Routing strategies: least-tokens (default), simple-shuffle, latency-based
  • Timeout/retry with exponential backoff via axios interceptors
  • Streaming support for all providers
  • Token usage tracking with cost calculation
  • Sentry and Posthog logging callbacks
  • JSONNet configuration support
  • Mock servers for testing

Demo Video

https://youtu.be/Zij1XabtJnk

Features Implemented

  1. Load Balancing: Picks deployment below rate-limit with least tokens used
  2. Reliability: Timeouts, retries, exponential backoff
  3. Streaming: Full streaming support
  4. Token Usage: Tracks prompt/completion/total tokens and cost
  5. Logging: Sentry + Posthog callbacks

Tests

  • 8 E2E tests passing covering all features
  • Mock servers for OpenAI, Gemini, Cohere

Usage Example

import { Router } from "@arakoodev/edgechains.js/ai";

const router = new Router({
  modelList: [
    { modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-xxx", rpm: 3000, tpm: 90000 },
    { modelName: "gpt-3.5-turbo", provider: "openai", apiKey: "sk-yyy", rpm: 3000, tpm: 90000 },
  ],
  routingStrategy: "least-tokens",
  numRetries: 3,
  timeout: 30000,
});

const response = await router.completion({
  model: "gpt-3.5-turbo",
  messages: [{ role: "user", content: "Hello!" }],
});

/claim #286

- Implement load balancing between multiple LLM deployments (OpenAI, Gemini, Cohere)
- Support least-tokens, simple-shuffle, and latency-based routing strategies
- Add timeout/retry logic with exponential backoff via axios interceptors
- Implement streaming support for all providers
- Add token usage tracking with cost calculation
- Add Sentry and Posthog logging callbacks
- Add JSONNet configuration example
- Add mock servers and E2E tests
@matiasmagni
Copy link
Copy Markdown
Author

I have read the Arakoo CLA Document and I hereby sign the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant