Feature Request: Multi API-Keys Load Balancing for LLM Providers

#  Multi API-Keys Load Balancing for LLM Providers

## **Is your feature request related to a problem?**

Yes. For individual users on free or low-tier plans of OpenAI API/Azure/Google AI API, or enterprise users running self-hosted model service platforms, single API key usage is severely constrained by strict RPM (Requests Per Minute) and TPM (Tokens Per Minute) rate limits - typically in the single or double digits for RPM.

In our team's practical scenarios, running DeepWiki on large-scale repositories and handling subsequent RAG indexing/querying demands can easily consume over 100,000 tokens per minute. When DeepWiki is deployed as a service for development teams with 20+ concurrent users, at least 70 LLM requests per minute are required. Supporting only one API key per provider prevents cost-sensitive individual and enterprise users from deploying DeepWiki effectively.

## **Describe the solution you'd like**

Implement a multi-API-key configuration system with intelligent load balancing algorithms to distribute model requests and token consumption across multiple API keys, thereby bypassing rate limit policies, while users don't have to rocket their spendings.

**Key Features:**
1. **Multiple Key Support**: Allow users to configure multiple API keys per provider (e.g., `OPENAI_API_KEYS=key1,key2,key3`)
2. **Load Balancing Strategy**: 
   - Primary criterion: Least-used key (by request count)
   - Tiebreaker: Least recently used timestamp
   - Per-provider independent tracking
3. **Configuration Flexibility**:
   - Environment variables with comma-separated values
   - JSON configuration file support
   - Dynamic key rotation without service restart
4. **Monitoring & Observability**:
   - Real-time key usage statistics
   - Balance ratio metrics
   - Per-key performance tracking

**Benefits:**
- ✅ Users can combine multiple free/low-tier keys to achieve higher effective rate limits
- ✅ Cost remains controlled while bypassing single-key rate limit restrictions
- ✅ Improved service reliability and availability
- ✅ Horizontal scaling capability for enterprise deployments

## **Describe alternatives you've considered**

**Alternative 1: Request Queue with Delayed Retry**

Instead of load balancing across multiple keys, implement a request queue that automatically retries failed requests after the rate limit window resets.

**Implementation:**
- Maintain a FIFO queue for all LLM requests
- When rate limit error occurs, calculate wait time based on rate limit window
- Automatically retry requests after the wait period
- Apply exponential backoff for subsequent failures

**Why This Doesn't Work:**
- ❌ **Poor User Experience**: Users face significant delays (30-60 seconds per rate limit hit)
- ❌ **Unpredictable Latency**: Response times become highly variable and unreliable
- ❌ **Queue Buildup**: During high concurrency, queues grow exponentially, leading to request timeouts
- ❌ **Resource Waste**: Server resources are tied up maintaining queue state and retry timers
- ❌ **No Scalability**: Doesn't solve the fundamental throughput problem, just delays it
- ❌ **Cascade Failures**: Long queues can cause cascading failures when requests timeout while waiting


## **Additional context**

✅ **Completed** - This feature has been fully implemented and a PR will be submitted later:
- Multi-key configuration via environment variables and JSON
- Load balancing with least-used + LRU strategy
- Real-time monitoring and statistics
- Full backward compatibility with single-key setups
- Comprehensive testing and documentation

**Performance Metrics (from testing):**
- 10 concurrent requests distributed across 5 keys
- Load balance ratio: 80%+ (1.0 = perfect balance)
- Zero request failures due to rate limiting
- Predictable, low-latency responses

**Configuration Example:**

```bash
# .env file
OPENAI_API_KEYS=sk-key1,sk-key2,sk-key3,sk-key4,sk-key5
GOOGLE_API_KEYS=AIza-key1,AIza-key2,AIza-key3
```

```json
// api/config/api_keys.json
{
  "openai": {
    "keys": ["${OPENAI_API_KEYS}"]
  },
  "google": {
    "keys": ["${GOOGLE_API_KEYS}"]
  }
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Multi API-Keys Load Balancing for LLM Providers #445

Multi API-Keys Load Balancing for LLM Providers

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Multi API-Keys Load Balancing for LLM Providers #445

Description

Multi API-Keys Load Balancing for LLM Providers

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions