Harbor: One-Command AI Stack Deployment for Your VPS

Deploy a complete LLM stack (Ollama, Open WebUI, vLLM, llama.cpp, ComfyUI) on your VPS with a single command using Harbor — no manual configuration needed

What is Harbor?

Harbor (2,900+ ⭐ on GitHub) is a CLI tool and companion app that spins up a complete local LLM stack with a single command. Think of it as Docker Compose for the AI world — it wires together backends (Ollama, llama.cpp, vLLM), frontends (Open WebUI, Lobe Chat, LibreChat), and supporting services (SearXNG for web search, Speaches for voice, ComfyUI for images) so they all work together out of the box.

# That's it. One command.
harbor up
# Open WebUI + Ollama are now running on your VPS.

No more stitching together Docker Compose files by hand, configuring reverse proxies, or debugging cross-service connectivity. Harbor handles all of that for you.


Why Harbor on a VPS?

Running LLMs on a VPS is increasingly practical thanks to:

  • Cheap GPU cloud options: VPS providers like Vast.ai, RunPod, and TensorDock offer $0.30–$0.80/hr GPU instances
  • Quantized models: 7B–14B parameter models (Qwen2.5, Phi-4, Llama 3) run on 6–16GB VRAM with GGUF quantization
  • Data privacy: Your API calls, prompts, and documents never leave your infrastructure

Harbor makes the software configuration part trivial, so you can focus on actually using the models.


Prerequisites

Before installing Harbor, your VPS needs:

RequirementMinimumRecommended
Docker Engine24.x27.x+
Docker Compose2.23.1+2.30+
RAM8 GB16 GB+
Disk20 GB50 GB+
GPU (optional)NVIDIA with 6 GB VRAMNVIDIA with 12 GB+ VRAM

Verify Docker is ready:

docker --version
docker compose version

Step 1: Install Harbor

Harbor offers a one-liner install script:

curl https://raw.githubusercontent.com/av/harbor/refs/heads/main/install.sh | bash

This installs the harbor CLI to /usr/local/bin. Verify it works:

harbor --version
harbor doctor   # Checks Docker, disk space, and GPU availability

No GPU? No problem. Harbor will run models on CPU. For 7B models with 8 GB RAM, expect 3–8 tokens/sec — usable for chat and batch processing.


Step 2: Deploy the Default Stack

The default stack includes Ollama (backend) + Open WebUI (frontend). Start it with:

harbor up

Harbor will:

  1. Pull the latest Docker images
  2. Start Ollama on localhost:11434
  3. Start Open WebUI on localhost:3000
  4. Wire them together automatically

When you see “Services started successfully”, open the web UI:

harbor open

First time? Create an admin account in Open WebUI, then pull a model from the admin panel or via CLI:

# Pull a model from the VPS terminal
docker exec -it $(docker ps -q -f name=ollama) ollama pull qwen2.5:7b
# Or try a smaller model for 8 GB RAM
docker exec -it $(docker ps -q -f name=ollama) ollama pull phi-4-mini:3.8b

Alternatively, use Harbor’s built-in model management:

harbor ollama pull qwen2.5:7b

Step 3: Add Supporting Services

Harbor’s real power is adding services on top of the base stack:

# Add web search RAG (SearXNG) + voice (Speaches)
harbor up searxng speaches

This enables:

  • SearXNG → Open WebUI can search the web and feed results into LLM context (Web RAG)
  • Speaches → OpenAI-compatible speech-to-text and text-to-speech (whisper + TTS)

Other useful services to try:

# Image generation
harbor up comfyui

# Alternative inference backends
harbor up llamacpp   # CPU-friendly GGUF inference
harbor up vllm       # High-throughput GPU inference

# Alternative frontends
harbor up lobechat   # Modern UI with multi-provider support
harbor up dify       # LLM app development platform

Step 4: Enable GPU Acceleration (NVIDIA)

If your VPS has an NVIDIA GPU, enable GPU passthrough to Docker:

Ubuntu/Debian:

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU access:

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

After installing, Harbor auto-detects the nvidia capability and passes --gpus all to containers that support it.


Step 5: Expose Harbor Securely

To access Harbor’s web UI from outside your VPS, use Cloudflare Tunnel (built into Harbor):

# Configure tunnel for Open WebUI
harbor tunnels add webui

This starts cloudflared as a sidecar and gives you a *.trycloudflare.com URL. For production:

  1. Set up a Cloudflare domain with a CNAME record
  2. Configure cloudflared with your tunnel token
  3. Add authentication (Open WebUI already requires login)

⚠️ Security warning: Never expose Open WebUI to the internet without authentication. Harbor’s default setup requires a login, but double-check your config.


Step 6: Switch Inference Backends

Harbor supports multiple LLM backends. Here’s how to choose:

BackendBest ForGPU RequiredSpeed
OllamaGeneral use, easy model managementOptionalGood
llama.cppCPU inference, GGUF formatNoGood on CPU
vLLMHigh-throughput API servingYesExcellent
TabbyAPIExLlamaV2, large contextYesVery fast
SGLangStructured outputs, VLMsYesExcellent

Example: Switch to vLLM for production API serving:

# Stop current stack, restart with vLLM
harbor down
harbor up vllm
# vLLM API is now at localhost:8000 with OpenAI-compatible endpoints

You can even run multiple backends simultaneously — Harbor connects them all to Open WebUI.


Harbor Cheatsheet

harbor up                    # Start default stack (Ollama + Open WebUI)
harbor up searxng speaches  # Add services
harbor up --no-defaults vllm # Start only vLLM (skip defaults)
harbor down                  # Stop all services
harbor open                  # Open web UI in browser
harbor logs webui            # Tail logs for a specific service
harbor ps                    # Show running services
harbor doctor                # System compatibility check
harbor update                # Update Harbor CLI
harbor config set ui.autoopen true  # Auto-open browser
harbor ollama pull qwen2.5:7b       # Pull model via Ollama
harbor tunnels add webui            # Expose via Cloudflare

Resource Estimates by Model Size

ModelSizeRAM/VRAMTokens/sec (CPU)Tokens/sec (GPU)
Phi-4-mini (3.8B) Q42.5 GB4 GB15–2580–120
Qwen2.5-7B Q44.5 GB6 GB5–1040–60
Llama 3.1-8B Q45 GB8 GB4–835–55
DeepSeek-R1-Distill-7B Q45 GB8 GB4–730–50
Qwen2.5-14B Q49 GB14 GB2–420–35
Mistral-Small-24B Q414 GB20 GB1–212–20

Real-World VPS Configurations

Budget setup ($10–20/mo):

  • 2 vCPU, 8 GB RAM, no GPU
  • Run Phi-4-mini or Qwen2.5:7B on CPU via Ollama
  • Harbor + Open WebUI + SearXNG for web RAG
  • ~5–10 tokens/sec — fine for casual chat

Mid-range ($30–50/mo):

  • 4 vCPU, 16 GB RAM, NVIDIA T4 (16 GB VRAM)
  • Run Llama 3.1-8B or Qwen2.5-14B with vLLM
  • Add Speaches for voice I/O, ComfyUI for images
  • ~40–60 tokens/sec — production-ready

High-end ($100–150/mo):

  • 8 vCPU, 32 GB RAM, NVIDIA L40S (48 GB VRAM)
  • Run Mistral-Small-24B or Qwen2.5-32B
  • Full stack: vLLM + Open WebUI + Dify + ComfyUI + SearXNG
  • ~60+ tokens/sec — team usage

Conclusion

Harbor eliminates the hardest part of self-hosting AI — the configuration. With one command, you get a complete, production-ready LLM stack on your VPS. The 50+ available services mean you can expand from a simple chat interface to a full AI platform (RAG, voice, images, workflows) incrementally, as your needs grow.

The AI self-hosting ecosystem has matured to the point where setting up your own ChatGPT replacement takes minutes, not days. Harbor is the tool that makes that possible.

Next steps:

  • Browse all 50+ services: harbor ls
  • Join the Harbor Discord
  • Try deploying with a GPU-equipped VPS from Vast.ai, RunPod, or TensorDock

📺 看视频版教程 → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials