Skip to main content

Benchmarks

Benchmarks for LiteLLM Gateway (Proxy Server) tested against a fake OpenAI endpoint.

Use this config for testing:

Note: we're currently migrating to aiohttp which has 10x higher throughput. We recommend using the aiohttp_openai/ provider for load testing.

model_list:
- model_name: "fake-openai-endpoint"
litellm_params:
model: aiohttp_openai/any
api_base: https://your-fake-openai-endpoint.com/chat/completions
api_key: "test"

1 Instance LiteLLM Proxy​

In these tests the median latency of directly calling the fake-openai-endpoint is 60ms.

MetricLitellm Proxy (1 Instance)
RPS475
Median Latency (ms)100
Latency overhead added by LiteLLM Proxy40ms

Key Findings​

  • Single instance: 475 RPS @ 100ms latency
  • 2 LiteLLM instances: 950 RPS @ 100ms latency
  • 4 LiteLLM instances: 1900 RPS @ 100ms latency

2 Instances​

Adding 1 instance, will double the RPS and maintain the 100ms-110ms median latency.

MetricLitellm Proxy (2 Instances)
Median Latency (ms)100
RPS950

Machine Spec used for testing​

Each machine deploying LiteLLM had the following specs:

  • 2 CPU
  • 4GB RAM

Logging Callbacks​

GCS Bucket Logging​

Using GCS Bucket has no impact on latency, RPS compared to Basic Litellm Proxy

MetricBasic Litellm ProxyLiteLLM Proxy with GCS Bucket Logging
RPS1133.21137.3
Median Latency (ms)140138

LangSmith logging​

Using LangSmith has no impact on latency, RPS compared to Basic Litellm Proxy

MetricBasic Litellm ProxyLiteLLM Proxy with LangSmith
RPS1133.21135
Median Latency (ms)140132

Locust Settings​

  • 2500 Users
  • 100 user Ramp Up