<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AI Deployment on SelfVPS Guide</title><link>https://selfvps.net/en/categories/ai-deployment/</link><description>Recent content in AI Deployment on SelfVPS Guide</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><lastBuildDate>Sat, 16 May 2026 14:00:00 +0800</lastBuildDate><atom:link href="https://selfvps.net/en/categories/ai-deployment/index.xml" rel="self" type="application/rss+xml"/><item><title>Deploying Open-Source AI Tools: LocalAI, Ollama, Stable Diffusion &amp; More on Your VPS</title><link>https://selfvps.net/en/post/deploying-open-source-ai-tools/</link><pubDate>Sat, 16 May 2026 14:00:00 +0800</pubDate><guid>https://selfvps.net/en/post/deploying-open-source-ai-tools/</guid><description>&lt;h2 id="why-self-host-ai-tools"&gt;Why Self-Host AI Tools?
&lt;/h2&gt;&lt;p&gt;As AI technology evolves rapidly, more open-source AI tools can run on your own server. Benefits include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;🔒 &lt;strong&gt;Data Privacy&lt;/strong&gt;: Sensitive data never leaves your server&lt;/li&gt;
&lt;li&gt;💰 &lt;strong&gt;Cost Control&lt;/strong&gt;: Pay only for your hardware, no API subscription fees&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Low Latency&lt;/strong&gt;: Local inference, no network delays&lt;/li&gt;
&lt;li&gt;🎯 &lt;strong&gt;Full Customization&lt;/strong&gt;: Choose models and parameters freely&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="hardware-requirements"&gt;Hardware Requirements
&lt;/h2&gt;&lt;p&gt;Self-hosting AI tools requires some hardware. Recommended VPS specs:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;th&gt;Minimum&lt;/th&gt;
 &lt;th&gt;Recommended&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;LLM (7B model)&lt;/td&gt;
 &lt;td&gt;8GB RAM, 4 vCPU&lt;/td&gt;
 &lt;td&gt;16GB RAM, 8 vCPU + GPU&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Speech-to-Text&lt;/td&gt;
 &lt;td&gt;4GB RAM, 2 vCPU&lt;/td&gt;
 &lt;td&gt;8GB RAM, 4 vCPU&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Image Generation&lt;/td&gt;
 &lt;td&gt;8GB RAM + 4GB VRAM&lt;/td&gt;
 &lt;td&gt;16GB RAM + 8GB VRAM&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

 &lt;blockquote&gt;
 &lt;p&gt;⚠️ &lt;strong&gt;Note&lt;/strong&gt;: For GPU acceleration, consider providers like Hetzner (GPU cloud instances), RunPod, or Vast.ai.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;h2 id="tool-1-ollama--run-llms-locally"&gt;Tool 1: Ollama — Run LLMs Locally
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://ollama.com" target="_blank" rel="noopener"
 &gt;Ollama&lt;/a&gt; is the simplest way to run large language models locally. Supports Llama, Mistral, Qwen, and more.&lt;/p&gt;
&lt;h3 id="installation"&gt;Installation
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# One-command Docker deploy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker run -d --name ollama -p 11434:11434 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -v ollama:/root/.ollama &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ollama/ollama
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Pull and run a model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; -it ollama ollama pull llama3.2:1b
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; -it ollama ollama run llama3.2:1b
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# API call&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl http://localhost:11434/api/generate -d &lt;span class="s1"&gt;&amp;#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;model&amp;#34;: &amp;#34;llama3.2:1b&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;prompt&amp;#34;: &amp;#34;What is self-hosting?&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;stream&amp;#34;: false
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="recommended-models"&gt;Recommended Models
&lt;/h3&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Parameters&lt;/th&gt;
 &lt;th&gt;RAM&lt;/th&gt;
 &lt;th&gt;Use Case&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;llama3.2:1b&lt;/td&gt;
 &lt;td&gt;1B&lt;/td&gt;
 &lt;td&gt;&amp;lt;2GB&lt;/td&gt;
 &lt;td&gt;Lightweight Q&amp;amp;A&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;llama3.2:3b&lt;/td&gt;
 &lt;td&gt;3B&lt;/td&gt;
 &lt;td&gt;~3GB&lt;/td&gt;
 &lt;td&gt;General chat&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;qwen2.5:7b&lt;/td&gt;
 &lt;td&gt;7B&lt;/td&gt;
 &lt;td&gt;~8GB&lt;/td&gt;
 &lt;td&gt;Chinese optimized&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;mistral:7b&lt;/td&gt;
 &lt;td&gt;7B&lt;/td&gt;
 &lt;td&gt;~8GB&lt;/td&gt;
 &lt;td&gt;English reasoning&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="tool-2-localai--openai-api-alternative"&gt;Tool 2: LocalAI — OpenAI API Alternative
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://localai.io" target="_blank" rel="noopener"
 &gt;LocalAI&lt;/a&gt; is a drop-in replacement for OpenAI&amp;rsquo;s API, supporting LLM, TTS, image generation, and more.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Docker Compose deploy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p ~/localai &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ~/localai
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; docker-compose.yml &lt;span class="s"&gt;&amp;lt;&amp;lt; &amp;#39;EOF&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;version: &amp;#39;3.8&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;services:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; localai:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; image: localai/localai:latest
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; ports:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; - &amp;#34;8080:8080&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; volumes:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; - ./models:/build/models
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; environment:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; - THREADS=4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; - CONTEXT_SIZE=2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; command: [&amp;#34;/usr/bin/local-ai&amp;#34;]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker compose up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="usage"&gt;Usage
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Chat completion (OpenAI SDK compatible)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl http://localhost:8080/v1/chat/completions -d &lt;span class="s1"&gt;&amp;#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;model&amp;#34;: &amp;#34;llama3.2-3b&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt; &amp;#34;messages&amp;#34;: [{&amp;#34;role&amp;#34;: &amp;#34;user&amp;#34;, &amp;#34;content&amp;#34;: &amp;#34;Hello!&amp;#34;}]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s1"&gt;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="tool-3-openai-whisper--speech-to-text"&gt;Tool 3: OpenAI Whisper — Speech-to-Text
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/openai/whisper" target="_blank" rel="noopener"
 &gt;Whisper&lt;/a&gt; is an open-source speech recognition model supporting 99+ languages.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Docker deploy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker run -d --name whisper &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -p 9000:9000 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -v whisper-data:/data &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; onerahmet/openai-whisper-asr-webservice:latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Meeting transcription&lt;/li&gt;
&lt;li&gt;Auto-generated video captions&lt;/li&gt;
&lt;li&gt;Voice input systems&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tool-4-stable-diffusion--image-generation"&gt;Tool 4: Stable Diffusion — Image Generation
&lt;/h2&gt;&lt;p&gt;Deploy via &lt;a class="link" href="https://github.com/AUTOMATIC1111/stable-diffusion-webui" target="_blank" rel="noopener"
 &gt;Automatic1111 WebUI&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Docker deploy (requires GPU)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker run -d --name sd-webui &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --gpus all &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -p 7860:7860 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -v models:/app/stable-diffusion-webui/models &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; abdibrokhim/stable-diffusion-webui:latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="tool-5-lobechat--ai-chat-interface"&gt;Tool 5: LobeChat — AI Chat Interface
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/lobehub/lobe-chat" target="_blank" rel="noopener"
 &gt;LobeChat&lt;/a&gt; is a modern AI chat UI supporting Ollama, LocalAI, and more.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Docker deploy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker run -d --name lobe-chat &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -p 3210:3210 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -e &lt;span class="nv"&gt;OLLAMA_PROXY_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; lobehub/lobe-chat:latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="stack-architecture"&gt;Stack Architecture
&lt;/h2&gt;&lt;p&gt;Recommended self-hosted AI stack:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;User → Nginx → LobeChat (frontend)
 ├── Ollama (LLM inference)
 ├── LocalAI (OpenAI-compatible API)
 └── Whisper (speech recognition)
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="summary"&gt;Summary
&lt;/h2&gt;&lt;p&gt;In 2025, self-hosting AI tools has evolved from a niche experiment to a practical solution. With falling hardware costs and improving model optimization, running AI on your personal VPS is more accessible than ever.&lt;/p&gt;
&lt;h3 id="quick-start"&gt;Quick Start
&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;Start with &lt;strong&gt;Ollama + LobeChat&lt;/strong&gt; for the simplest setup&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;Whisper&lt;/strong&gt; for speech processing as needed&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;Stable Diffusion&lt;/strong&gt; when GPU is available&lt;/li&gt;
&lt;/ol&gt;

 &lt;blockquote&gt;
 &lt;p&gt;💡 &lt;strong&gt;Tip&lt;/strong&gt;: If your VPS has limited resources, start with 1B-3B parameter models and scale up gradually.&lt;/p&gt;

 &lt;/blockquote&gt;</description></item></channel></rss>