Deploying Open-Source AI Tools: LocalAI, Ollama, Stable Diffusion & More on Your VPS

Why Self-Host AI Tools?

As AI technology evolves rapidly, more open-source AI tools can run on your own server. Benefits include:

🔒 Data Privacy: Sensitive data never leaves your server
💰 Cost Control: Pay only for your hardware, no API subscription fees
⚡ Low Latency: Local inference, no network delays
🎯 Full Customization: Choose models and parameters freely

Hardware Requirements

Self-hosting AI tools requires some hardware. Recommended VPS specs:

Purpose	Minimum	Recommended
LLM (7B model)	8GB RAM, 4 vCPU	16GB RAM, 8 vCPU + GPU
Speech-to-Text	4GB RAM, 2 vCPU	8GB RAM, 4 vCPU
Image Generation	8GB RAM + 4GB VRAM	16GB RAM + 8GB VRAM

⚠️ Note: For GPU acceleration, consider providers like Hetzner (GPU cloud instances), RunPod, or Vast.ai.

Tool 1: Ollama — Run LLMs Locally

Ollama is the simplest way to run large language models locally. Supports Llama, Mistral, Qwen, and more.

Installation

# One-command Docker deploy
docker run -d --name ollama -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# Pull and run a model
docker exec -it ollama ollama pull llama3.2:1b
docker exec -it ollama ollama run llama3.2:1b

# API call
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "What is self-hosting?",
  "stream": false
}'

Recommended Models

Model	Parameters	RAM	Use Case
llama3.2:1b	1B	<2GB	Lightweight Q&A
llama3.2:3b	3B	~3GB	General chat
qwen2.5:7b	7B	~8GB	Chinese optimized
mistral:7b	7B	~8GB	English reasoning

Tool 2: LocalAI — OpenAI API Alternative

LocalAI is a drop-in replacement for OpenAI’s API, supporting LLM, TTS, image generation, and more.

# Docker Compose deploy
mkdir -p ~/localai && cd ~/localai

cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  localai:
    image: localai/localai:latest
    ports:
      - "8080:8080"
    volumes:
      - ./models:/build/models
    environment:
      - THREADS=4
      - CONTEXT_SIZE=2048
    command: ["/usr/bin/local-ai"]
EOF

docker compose up -d

Usage

# Chat completion (OpenAI SDK compatible)
curl http://localhost:8080/v1/chat/completions -d '{
  "model": "llama3.2-3b",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

Tool 3: OpenAI Whisper — Speech-to-Text

Whisper is an open-source speech recognition model supporting 99+ languages.

# Docker deploy
docker run -d --name whisper \
  -p 9000:9000 \
  -v whisper-data:/data \
  onerahmet/openai-whisper-asr-webservice:latest

Use Cases:

Meeting transcription
Auto-generated video captions
Voice input systems

Tool 4: Stable Diffusion — Image Generation

Deploy via Automatic1111 WebUI:

# Docker deploy (requires GPU)
docker run -d --name sd-webui \
  --gpus all \
  -p 7860:7860 \
  -v models:/app/stable-diffusion-webui/models \
  abdibrokhim/stable-diffusion-webui:latest

Tool 5: LobeChat — AI Chat Interface

LobeChat is a modern AI chat UI supporting Ollama, LocalAI, and more.

# Docker deploy
docker run -d --name lobe-chat \
  -p 3210:3210 \
  -e OLLAMA_PROXY_URL=http://localhost:11434 \
  lobehub/lobe-chat:latest

Stack Architecture

Recommended self-hosted AI stack:

User → Nginx → LobeChat (frontend)
                ├── Ollama (LLM inference)
                ├── LocalAI (OpenAI-compatible API)
                └── Whisper (speech recognition)

Summary

In 2025, self-hosting AI tools has evolved from a niche experiment to a practical solution. With falling hardware costs and improving model optimization, running AI on your personal VPS is more accessible than ever.

Quick Start

Start with Ollama + LobeChat for the simplest setup
Add Whisper for speech processing as needed
Add Stable Diffusion when GPU is available

💡 Tip: If your VPS has limited resources, start with 1B-3B parameter models and scale up gradually.