What is Harbor?
Harbor (2,900+ ⭐ on GitHub) is a CLI tool and companion app that spins up a complete local LLM stack with a single command. Think of it as Docker Compose for the AI world — it wires together backends (Ollama, llama.cpp, vLLM), frontends (Open WebUI, Lobe Chat, LibreChat), and supporting services (SearXNG for web search, Speaches for voice, ComfyUI for images) so they all work together out of the box.
# That's it. One command.
harbor up
# Open WebUI + Ollama are now running on your VPS.
No more stitching together Docker Compose files by hand, configuring reverse proxies, or debugging cross-service connectivity. Harbor handles all of that for you.
Why Harbor on a VPS?
Running LLMs on a VPS is increasingly practical thanks to:
- Cheap GPU cloud options: VPS providers like Vast.ai, RunPod, and TensorDock offer $0.30–$0.80/hr GPU instances
- Quantized models: 7B–14B parameter models (Qwen2.5, Phi-4, Llama 3) run on 6–16GB VRAM with GGUF quantization
- Data privacy: Your API calls, prompts, and documents never leave your infrastructure
Harbor makes the software configuration part trivial, so you can focus on actually using the models.
Prerequisites
Before installing Harbor, your VPS needs:
| Requirement | Minimum | Recommended |
|---|---|---|
| Docker Engine | 24.x | 27.x+ |
| Docker Compose | 2.23.1+ | 2.30+ |
| RAM | 8 GB | 16 GB+ |
| Disk | 20 GB | 50 GB+ |
| GPU (optional) | NVIDIA with 6 GB VRAM | NVIDIA with 12 GB+ VRAM |
Verify Docker is ready:
docker --version
docker compose version
Step 1: Install Harbor
Harbor offers a one-liner install script:
curl https://raw.githubusercontent.com/av/harbor/refs/heads/main/install.sh | bash
This installs the harbor CLI to /usr/local/bin. Verify it works:
harbor --version
harbor doctor # Checks Docker, disk space, and GPU availability
No GPU? No problem. Harbor will run models on CPU. For 7B models with 8 GB RAM, expect 3–8 tokens/sec — usable for chat and batch processing.
Step 2: Deploy the Default Stack
The default stack includes Ollama (backend) + Open WebUI (frontend). Start it with:
harbor up
Harbor will:
- Pull the latest Docker images
- Start Ollama on
localhost:11434 - Start Open WebUI on
localhost:3000 - Wire them together automatically
When you see “Services started successfully”, open the web UI:
harbor open
First time? Create an admin account in Open WebUI, then pull a model from the admin panel or via CLI:
# Pull a model from the VPS terminal
docker exec -it $(docker ps -q -f name=ollama) ollama pull qwen2.5:7b
# Or try a smaller model for 8 GB RAM
docker exec -it $(docker ps -q -f name=ollama) ollama pull phi-4-mini:3.8b
Alternatively, use Harbor’s built-in model management:
harbor ollama pull qwen2.5:7b
Step 3: Add Supporting Services
Harbor’s real power is adding services on top of the base stack:
# Add web search RAG (SearXNG) + voice (Speaches)
harbor up searxng speaches
This enables:
- SearXNG → Open WebUI can search the web and feed results into LLM context (Web RAG)
- Speaches → OpenAI-compatible speech-to-text and text-to-speech (whisper + TTS)
Other useful services to try:
# Image generation
harbor up comfyui
# Alternative inference backends
harbor up llamacpp # CPU-friendly GGUF inference
harbor up vllm # High-throughput GPU inference
# Alternative frontends
harbor up lobechat # Modern UI with multi-provider support
harbor up dify # LLM app development platform
Step 4: Enable GPU Acceleration (NVIDIA)
If your VPS has an NVIDIA GPU, enable GPU passthrough to Docker:
Ubuntu/Debian:
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
After installing, Harbor auto-detects the nvidia capability and passes --gpus all to containers that support it.
Step 5: Expose Harbor Securely
To access Harbor’s web UI from outside your VPS, use Cloudflare Tunnel (built into Harbor):
# Configure tunnel for Open WebUI
harbor tunnels add webui
This starts cloudflared as a sidecar and gives you a *.trycloudflare.com URL. For production:
- Set up a Cloudflare domain with a CNAME record
- Configure
cloudflaredwith your tunnel token - Add authentication (Open WebUI already requires login)
⚠️ Security warning: Never expose Open WebUI to the internet without authentication. Harbor’s default setup requires a login, but double-check your config.
Step 6: Switch Inference Backends
Harbor supports multiple LLM backends. Here’s how to choose:
| Backend | Best For | GPU Required | Speed |
|---|---|---|---|
| Ollama | General use, easy model management | Optional | Good |
| llama.cpp | CPU inference, GGUF format | No | Good on CPU |
| vLLM | High-throughput API serving | Yes | Excellent |
| TabbyAPI | ExLlamaV2, large context | Yes | Very fast |
| SGLang | Structured outputs, VLMs | Yes | Excellent |
Example: Switch to vLLM for production API serving:
# Stop current stack, restart with vLLM
harbor down
harbor up vllm
# vLLM API is now at localhost:8000 with OpenAI-compatible endpoints
You can even run multiple backends simultaneously — Harbor connects them all to Open WebUI.
Harbor Cheatsheet
harbor up # Start default stack (Ollama + Open WebUI)
harbor up searxng speaches # Add services
harbor up --no-defaults vllm # Start only vLLM (skip defaults)
harbor down # Stop all services
harbor open # Open web UI in browser
harbor logs webui # Tail logs for a specific service
harbor ps # Show running services
harbor doctor # System compatibility check
harbor update # Update Harbor CLI
harbor config set ui.autoopen true # Auto-open browser
harbor ollama pull qwen2.5:7b # Pull model via Ollama
harbor tunnels add webui # Expose via Cloudflare
Resource Estimates by Model Size
| Model | Size | RAM/VRAM | Tokens/sec (CPU) | Tokens/sec (GPU) |
|---|---|---|---|---|
| Phi-4-mini (3.8B) Q4 | 2.5 GB | 4 GB | 15–25 | 80–120 |
| Qwen2.5-7B Q4 | 4.5 GB | 6 GB | 5–10 | 40–60 |
| Llama 3.1-8B Q4 | 5 GB | 8 GB | 4–8 | 35–55 |
| DeepSeek-R1-Distill-7B Q4 | 5 GB | 8 GB | 4–7 | 30–50 |
| Qwen2.5-14B Q4 | 9 GB | 14 GB | 2–4 | 20–35 |
| Mistral-Small-24B Q4 | 14 GB | 20 GB | 1–2 | 12–20 |
Real-World VPS Configurations
Budget setup ($10–20/mo):
- 2 vCPU, 8 GB RAM, no GPU
- Run Phi-4-mini or Qwen2.5:7B on CPU via Ollama
- Harbor + Open WebUI + SearXNG for web RAG
- ~5–10 tokens/sec — fine for casual chat
Mid-range ($30–50/mo):
- 4 vCPU, 16 GB RAM, NVIDIA T4 (16 GB VRAM)
- Run Llama 3.1-8B or Qwen2.5-14B with vLLM
- Add Speaches for voice I/O, ComfyUI for images
- ~40–60 tokens/sec — production-ready
High-end ($100–150/mo):
- 8 vCPU, 32 GB RAM, NVIDIA L40S (48 GB VRAM)
- Run Mistral-Small-24B or Qwen2.5-32B
- Full stack: vLLM + Open WebUI + Dify + ComfyUI + SearXNG
- ~60+ tokens/sec — team usage
Conclusion
Harbor eliminates the hardest part of self-hosting AI — the configuration. With one command, you get a complete, production-ready LLM stack on your VPS. The 50+ available services mean you can expand from a simple chat interface to a full AI platform (RAG, voice, images, workflows) incrementally, as your needs grow.
The AI self-hosting ecosystem has matured to the point where setting up your own ChatGPT replacement takes minutes, not days. Harbor is the tool that makes that possible.
Next steps:
- Browse all 50+ services:
harbor ls - Join the Harbor Discord
- Try deploying with a GPU-equipped VPS from Vast.ai, RunPod, or TensorDock