Install and Configure LiteLLM

Installing and Configuring LiteLLM Proxy on a VPS: A Unified API Interface for Managing All Your LLMs

TL;DR

In this guide, we will explore the process of deploying LiteLLM Proxy on a virtual server—a powerful tool that unifies dozens of different AI providers (OpenAI, Anthropic, Google, Azure, Ollama) into a single unified API interface, fully compatible with the OpenAI format. This allows developers and companies to centrally manage keys, limits, logging, and fault tolerance for their AI services.

Unification: One endpoint for all models (GPT-4, Claude 3.5, Gemini, Llama 3).
Savings: Built-in request caching via Redis and detailed cost monitoring.
Reliability: Automatic retries and fallback to backup models during provider outages.
Security: Access management via virtual keys with token and budget limits.
Self-hosted: Full control over data and logs on your own VPS.

1. What We Are Setting Up and Why: The Problem of AI Service Fragmentation

By 2026, the landscape of large language models (LLMs) has become extremely fragmented. Developers must integrate APIs from OpenAI for GPT-5, Anthropic for Claude 4, Google for Gemini 2.0, as well as local models via vLLM or Ollama to ensure privacy. Each provider has its own request formats, authentication methods, and limit systems.

LiteLLM Proxy solves this problem by acting as an intelligent gateway. It accepts requests in the standard OpenAI format and "translates" them into the language of the required provider. But it's not just a proxy server. It's a full-fledged management platform (Governance Layer) that allows you to:

Create virtual keys for different departments or applications, limiting their budget (e.g., no more than $10 per day).
Configure "smart" routing: if OpenAI returns a 500 error, the request is automatically sent to Anthropic.
Log every request and response to a PostgreSQL database for subsequent quality and cost analysis.
Use semantic caching to avoid paying twice for identical user questions.

Choosing a self-hosted solution on a VPS instead of using cloud aggregators (like OpenRouter) is driven by security requirements and the desire to avoid additional markups on tokens. By having your own proxy, you pay providers directly and maintain full control over where your data goes.

2. What VPS Configuration is Needed for This Task

LiteLLM Proxy itself is a lightweight Python application, but its requirements grow depending on traffic volume, the use of a database for logs, and caching in Redis.

Component	Minimum (1-5 users)	Recommended (Production)
CPU	1 Core (Shared)	2-4 Cores (Dedicated)
RAM	2 GB	4-8 GB
Disk	20 GB SSD	50 GB NVMe (for DB logs)
OS	Ubuntu 24.04 LTS	Ubuntu 24.04 / 26.04 LTS

For stable system operation, considering the PostgreSQL database and Redis cache, a suitable VPS with 4 GB of RAM is the optimal choice. This will provide a sufficient margin for processing parallel requests without delays.

When is a Dedicated server needed? If you plan to run local models (e.g., Llama 3 70B) directly on the same server via Ollama. In this case, you will need GPU power (NVIDIA A100/H100) or a huge amount of RAM for CPU inference. For the proxy layer itself, a regular VPS is more than enough.

Location: Choose a data center as close as possible to your main application servers or provider endpoints (usually the US or Europe) to minimize network latency (TTFT — Time To First Token).

3. Server Preparation: Security and Basic Software

Before installing LiteLLM, you must prepare the environment. Security is critical, as your API keys for paid services will pass through this server.

Update packages and install basic utilities:


sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git wget build-essential software-properties-common

Create a separate user to run services so as not to use root:


sudo adduser litellm-admin
sudo usermod -aG sudo litellm-admin
# Переключаемся на нового пользователя
su - litellm-admin

Configure the basic UFW firewall. We will need ports 22 (SSH), 80 (HTTP), and 443 (HTTPS):


sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

To protect against SSH brute force, install fail2ban:


sudo apt install -y fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban

4. Installing Docker and Required Components

In 2026, the most reliable way to deploy LiteLLM Proxy remains Docker. This isolates Python dependencies and simplifies updates.

Install Docker Engine and Docker Compose:


# Добавляем официальный GPG ключ Docker
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Добавляем репозиторий
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Добавляем пользователя в группу docker
sudo usermod -aG docker $USER
# Примените изменения групп без перезагрузки
newgrp docker

Verify the installation:


docker --version && docker compose version

5. Step-by-Step LiteLLM Proxy Installation

We will use a combination of three containers: LiteLLM itself, PostgreSQL (for storing keys and logs), and Redis (for request caching and rate limiting).

Create a working directory:


mkdir ~/litellm-stack && cd ~/litellm-stack

Create the docker-compose.yaml file. This configuration is current for 2026 versions and includes automatic restarts and container health checks:


cat < docker-compose.yaml
services:
  db:
    image: postgres:16-alpine
    volumes:
      - ./postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: litellm
      POSTGRES_USER: litellm_user
      POSTGRES_PASSWORD: your_strong_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U litellm_user -d litellm"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - ./redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5

  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    volumes:
      - ./config.yaml:/app/config.yaml
    environment:
      - DATABASE_URL=postgresql://litellm_user:your_strong_password@db:5432/litellm
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - LITELLM_MASTER_KEY=sk-master-key-2026-very-secret
      - UI_USERNAME=admin
      - UI_PASSWORD=admin-password-change-me
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    command: ["--config", "/app/config.yaml", "--port", "4000", "--detailed_debug"]
EOF

Note: Be sure to replace your_strong_password and LITELLM_MASTER_KEY with your own unique values. The master key will be used for initial access to the admin panel and for creating virtual keys.

6. Configuration Deep Dive: Models, Secrets, and Routing

The heart of LiteLLM is the config.yaml file. Here, we define the list of models and the rules for their operation.

Let's create a configuration example:


cat < config.yaml
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4-turbo
      api_key: "os.environ/OPENAI_API_KEY"
      
  - model_name: claude-3-5
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20240620
      api_key: "os.environ/ANTHROPIC_API_KEY"

  - model_name: global-llm
    model_info:
      base_model: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: "os.environ/OPENAI_API_KEY"
      tpm: 10000
      rpm: 500
  
  - model_name: global-llm
    model_info:
      base_model: claude-3-5
    litellm_params:
      model: anthropic/claude-3-5-sonnet
      api_key: "os.environ/ANTHROPIC_API_KEY"

router_settings:
  routing_strategy: usage-based-routing-v2
  enable_pre_call_checks: true

litellm_settings:
  drop_params: true
  set_verbose: false
  cache: true
  cache_type: redis
  success_callback: ["database"]
  failure_callback: ["database"]
EOF

In this example, we have configured:

Direct models: Accessing specific GPT-4 or Claude instances.
Model group (Load Balancing): When requesting global-llm, the proxy will automatically select the less loaded model or switch to an alternative in case of failure.
Redis Caching: Repeated requests will not be sent to the provider, saving money.
Database Logging: All successful and failed calls are saved in PostgreSQL.

To pass API keys, let's create a .env file:


cat < .env
OPENAI_API_KEY=sk-proj-xxxx...
ANTHROPIC_API_KEY=sk-ant-xxxx...
EOF

Now update docker-compose.yaml so that it picks up environment variables from .env (add the env_file: .env section to the litellm service).

Launch the stack:


docker compose up -d

7. Setting up HTTPS via Caddy to Protect the API

Opening port 4000 directly to the internet is unsafe. We need a reverse proxy with automatic SSL certificate acquisition. Caddy is the ideal choice for 2026 due to its ease of configuration.


sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy

Let's configure the Caddyfile (replace api.yourdomain.com with your actual domain pointing to the server's IP):


sudo nano /etc/caddy/Caddyfile

File content:


api.yourdomain.com {
    reverse_proxy localhost:4000
    
    header {
        # Basic protection
        Strict-Transport-Security "max-age=31536000;"
        X-Content-Type-Options nosniff
        X-Frame-Options DENY
        Referrer-Policy no-referrer-when-downgrade
    }
}

Restart Caddy:


sudo systemctl restart caddy

8. Backups, Monitoring, and Maintenance

Your proxy now stores critical data: virtual user keys and cost history. Losing the PostgreSQL database would be painful.

Backup Automation

Let's create a simple script for a daily database dump to an S3-compatible storage or another server:


#!/bin/bash
# backup-db.sh
BACKUP_DIR="/home/litellm-admin/backups"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
mkdir -p $BACKUP_DIR

docker exec litellm-stack-db-1 pg_dump -U litellm_user litellm > $BACKUP_DIR/litellm_backup_$TIMESTAMP.sql

# Delete backups older than 30 days
find $BACKUP_DIR -type f -mtime +30 -name "*.sql" -delete

Add it to cron (crontab -e):

0 3 * * * /bin/bash /home/litellm-admin/backup-db.sh

Updating LiteLLM

LiteLLM developers release updates almost daily. To update without losing data:


cd ~/litellm-stack
docker compose pull
docker compose up -d

Monitoring

LiteLLM provides a built-in dashboard. After installation, it is available at https://api.yourdomain.com/ui. Use the login and password specified in docker-compose.yaml. There you will see usage graphs, active keys, and provider errors in real-time.

9. Troubleshooting + FAQ

1. "Connection refused" error when trying to access the API?

Check if the containers are running: docker compose ps. If the litellm container keeps restarting, check the logs: docker compose logs litellm. Most often, the problem is an incorrect config.yaml format or a missing DATABASE_URL.

2. How to add a new model without restarting the entire stack?

LiteLLM supports hot configuration reloading. Edit config.yaml and send a SIGHUP signal to the container or simply run docker compose up -d — Docker will only restart the changed service with minimal downtime.

3. What is the minimum VPS configuration suitable for personal use?

For a single user, 1 vCPU and 2 GB RAM are sufficient. However, if you enable detailed logging in PostgreSQL and start actively using the UI panel, the system may start swapping. 4 GB is the "gold standard" for comfortable operation.

4. What to choose — VPS or dedicated for this task?

In 95% of cases, a VPS is the best choice. A dedicated server is only needed in two scenarios: 1) You have massive traffic (millions of requests per day) requiring guaranteed CPU resources. 2) You want to run open-source models locally (Llama, Mistral) on the same hardware.

5. How to limit the budget for a specific API key?

This is done via the Admin UI or API. When creating a key, you can specify the max_budget and budget_duration parameters (e.g., $50 per month). LiteLLM will automatically block the key when the limit is reached.

6. Does LiteLLM support streaming?

Yes, LiteLLM fully supports stream: true. This is critical for chatbots so that the user sees the text as it is generated. The proxy correctly forwards data chunks from the provider to the client.

10. Conclusions and Next Steps

We have deployed a fault-tolerant, scalable, and secure gateway for working with artificial intelligence. Now you have a single entry point for all LLMs, protected by HTTPS and controlled via a database.

What to do next:

Integration: Redirect your applications (LangChain, AutoGPT, or custom scripts) to the new endpoint https://api.yourdomain.com/v1.
Optimization: Configure fallbacks in config.yaml so that your services don't go down when OpenAI undergoes maintenance.
Analytics: After a week of operation, study the LiteLLM dashboard to understand which models are costing you the most and where you can save by switching to cheaper alternatives (e.g., from GPT-4 to Claude Haiku for simple tasks).

Using your own proxy server is an important step toward a mature AI infrastructure, providing independence from a single provider's policy and full control over costs.