Deploy Open WebUI + Ollama on VPS: Build Your Private ChatGPT

Why Open WebUI + Ollama?

If you want to run your own AI assistant without paying for ChatGPT Plus every month, Open WebUI + Ollama is the most mature self-hosted solution available today. This powerful combo gives you on your VPS:

100% Private — Your data never leaves your server. No privacy concerns
Zero Subscription Costs — Just pay for your VPS, no API fees
Works Offline — No internet connection required to use
Multi-Model Support — Run Llama, Qwen, DeepSeek, Mistral, Gemma and more simultaneously
ChatGPT-Class Experience — OpenAI-compatible web interface with full feature parity
RAG Knowledge Base — Upload documents and let AI answer based on your data
Multi-User Support — Share the same server with your team

Prerequisites

Before you begin, make sure you have:

A VPS (recommended: 4 CPU cores, 8 GB RAM, 50 GB SSD)
Docker and Docker Compose (installed)
A domain name (optional, for HTTPS configuration)
Basic Linux command-line knowledge

Recommended VPS Specs

Model Size	Recommended RAM	Storage Needed	Use Case
1B-3B parameters	4 GB	10 GB	Light chat, translation, summarization
7B-8B parameters	8 GB	20 GB	General QA, code assistance
14B-20B parameters	16 GB	40 GB	Complex reasoning, professional writing
70B+ parameters	32 GB+	80 GB+	Advanced reasoning, multilingual

Tip: For most use cases, an 8 GB RAM VPS running a 7B model (like Qwen2.5-7B or Llama-3.1-8B) delivers excellent results.

Step 1: Install Docker and Docker Compose

If not already installed:

# Update system
apt update && apt upgrade -y

# Install Docker
curl -fsSL https://get.docker.com | bash

# Verify installation
docker --version
docker compose version

Step 2: Create Docker Compose Configuration

Create a project directory and configuration file:

mkdir -p ~/open-webui && cd ~/open-webui
nano docker-compose.yml

Paste the following configuration:

version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ./ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped
    networks:
      - ai-net
    environment:
      - OLLAMA_KEEP_ALIVE=24h
      - OLLAMA_NUM_PARALLEL=1
      - OLLAMA_MAX_LOADED_MODELS=1

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    depends_on:
      - ollama
    volumes:
      - ./webui_data:/app/backend/data
    ports:
      - "3000:8080"
    restart: unless-stopped
    networks:
      - ai-net
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=replace-with-random-secret
      - WEBUI_NAME=My AI Chat
    extra_hosts:
      - "host.docker.internal:host-gateway"

networks:
  ai-net:
    driver: bridge

Security note: Generate a random secret with openssl rand -base64 32 and replace WEBUI_SECRET_KEY.

Step 3: Start the Services

docker compose up -d

Check service status:

docker compose ps
docker compose logs -f

You’ll know it’s working when you see:

open-webui  | INFO:     Application startup complete.
ollama      | 2026/05/21 10:30:00 server.go:89: Listening on 0.0.0.0:11434

Step 4: Download AI Models

Now download and test your first model:

# Pull a model (Qwen2.5-7B is excellent for multilingual use)
docker exec ollama ollama pull qwen2.5:7b

# Other recommended models
# docker exec ollama ollama pull llama3.1:8b     # Best for English
# docker exec ollama ollama pull deepseek-r1:8b   # Strong reasoning
# docker exec ollama ollama pull gemma2:9b        # Google's offering
# docker exec ollama ollama pull mistral:7b       # Efficient and fast

# Test the model
docker exec ollama ollama run qwen2.5:7b "Hello, introduce yourself"

Download time depends on your VPS network speed. A 7B model requires ~4-5 GB of storage and may take 5-20 minutes to download.

List Downloaded Models

docker exec ollama ollama list

Step 5: Access Open WebUI

Direct IP Access

If your VPS firewall allows port 3000:

http://YOUR_VPS_IP:3000

Nginx Reverse Proxy + HTTPS (Recommended)

Create an Nginx configuration:

sudo nano /etc/nginx/sites-available/open-webui

server {
    listen 80;
    server_name chat.your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
    }
}

Enable the site and get SSL:

sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo certbot --nginx -d chat.your-domain.com
sudo nginx -t && sudo systemctl reload nginx

Advanced Configuration

1. Set Up RAG Knowledge Base

Open WebUI has built-in RAG (Retrieval-Augmented Generation) support:

Click the "+" button next to the chat input
Upload PDF, TXT, Markdown, or other documents
The AI automatically indexes document content
Future conversations will include context from your documents

2. Run Multiple Models Simultaneously

Modify environment variables in docker-compose.yml:

environment:
  - OLLAMA_KEEP_ALIVE=-1          # Keep models in memory
  - OLLAMA_NUM_PARALLEL=4        # Allow parallel requests
  - OLLAMA_MAX_LOADED_MODELS=3   # Max 3 models loaded

3. GPU Acceleration

If your VPS has an NVIDIA GPU, add:

services:
  ollama:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

4. Integrate Cloud Models

Open WebUI also supports cloud API calls. Configure in the web interface:

OpenAI API — Plug in your API key for GPT-4 access
Google Gemini — Add your Gemini API key
Anthropic Claude — Configure Claude access
Custom Endpoints — Any OpenAI-compatible API

5. Enable Web Search

In Open WebUI’s “Admin Panel → Settings → Web Search”, enable search engines so the AI can fetch real-time information.

Performance Optimization

Memory Tuning

# Set up swap space (safety net for low memory)
fallocate -l 8G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

# Tune swappiness (lower = use RAM more aggressively)
sysctl vm.swappiness=10
echo 'vm.swappiness=10' >> /etc/sysctl.conf

Model Quantization Guide

Different quantization levels impact memory and quality:

Quantization	Precision	Memory (7B)	Quality
Q4_K_M	4-bit	~5 GB	Recommended balance
Q5_K_M	5-bit	~6 GB	Higher precision
Q8_0	8-bit	~8 GB	Nearly lossless
fp16	16-bit	~16 GB	Original precision

For an 8 GB VPS, use qwen2.5:7b (default Q4_K_M) or qwen2.5:7b-q5_k_m.

Frequently Asked Questions

Q: Why is model response slow?

A: A 7B model on CPU generates about 5-15 tokens/second, which is normal. To speed things up: ① Use a smaller model (1.5B/3B); ② Upgrade to a higher-performance VPS CPU; ③ Add GPU acceleration.

Q: Will I lose chat history after restarting containers?

A: No. The webui_data and ollama_data volumes persist on the host machine. Container restarts won’t delete your data.

Q: How do I upgrade Open WebUI?

A: Simply pull the latest image and restart:

docker compose pull
docker compose up -d --force-recreate

Q: Ollama keeps crashing due to memory limits?

A: Limit CPU threads used by the model:

docker exec ollama ollama run qwen2.5:7b --num-thread 4

Or set OLLAMA_NUM_THREADS=4 in the Ollama environment.

Conclusion

With the Open WebUI + Ollama combination, you can build a fully functional private AI chat platform on any VPS. It’s completely free, privacy-protecting, and supports advanced features like multi-model switching, RAG knowledge bases, and web search.

This setup is ideal for:

Developers — Code assistance, technical Q&A, local document search
Small teams — Shared AI assistant, reduced API costs
Privacy-conscious users — Medical, legal, financial data stays on your server
Offline environments — AI capabilities in air-gapped networks

Get started today! Pick a capable VPS, follow this guide, and you’ll have your own private AI assistant up and running in minutes.