[Feature]: Add per-model max_parallel_requests limit

### The Feature

Introduce a new entity called `model_max_parallel_requests`, allowing users to define the number of parallel requests allowed per key/user individually for each model.

### Motivation, pitch

Currently, it's possible to set individual rate limits for each model. Here's an example from the official documentation:

```bash
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"model_rpm_limit": {"gpt-4": 2}, "model_tpm_limit": {"gpt-4": 100}}'
```

However, in user scenarios involving local models where access is provided not only to LLMs but also to other models—such as embeddings or rerankers—using `model_rpm_limit` and `model_tpm_limit` may be confusing and difficult for users to manage. A much clearer and more intuitive limitation would be `max_parallel_requests`. Currently, though, it's not possible to set this parameter per model when generating a key.

### LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

### Twitter / LinkedIn details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add per-model max_parallel_requests limit #13930

The Feature

Motivation, pitch

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add per-model max_parallel_requests limit #13930

Description

The Feature

Motivation, pitch

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions