Data loss is every VPS owner’s nightmare. One accidental rm -rf, one ransomware attack, one disk failure — a few seconds of misfortune can destroy months of work. Traditional backup solutions are either too complex to configure or too simple to be reliable.
The good news: LLM Agents can turn backup and disaster recovery (DR) into an automated, verifiable, and intelligent system. This guide walks you through building an AI-driven backup system that handles:
- Automated backup strategy planning (full + incremental)
- Smart backup integrity verification
- Scheduled disaster recovery drills
- Anomaly detection and proactive defense
Why You Need an AI Backup Agent
Traditional backup approaches all have tradeoffs:
| Approach | Pros | Cons |
|---|---|---|
Manual rsync / scp | Simple, direct | Easy to forget, no verification, unreliable |
| cron + shell scripts | Automated schedule | Silent failures, nobody tests recovery |
| Commercial backup tools | Full-featured | Expensive, over-engineered, vendor lock-in |
An AI Agent combines the best of all worlds: lightweight automation like scripts, intelligent reliability like commercial tools. And crucially, an Agent understands context — when a backup fails, it doesn’t just retry blindly, it adjusts its strategy based on the failure reason.
Architecture Overview
Our backup Agent is built on four core layers:
┌─────────────────────────────────────┐
│ Agent Orchestrator │ ← LLM-powered decision core
├─────────────────────────────────────┤
│ Strategy Engine │ Exec Engine │ Verify │
├─────────────────────────────────────┤
│ Restic │ Borg │ rsync │ S3 │ ← Underlying tools
├─────────────────────────────────────┤
│ Local Storage │ Remote VPS │ Object │ ← Storage backends
└─────────────────────────────────────┘
Agent Orchestrator is the brain. It decides based on predefined policies and real-time state:
- When to run backups (full vs incremental)
- Where to send them (local, remote, S3)
- How to verify after completion
- What to do when anomalies arise
Step 1: Choose Your Backup Tool
The AI Agent doesn’t reinvent the wheel — it orchestrates proven backup tools. Here are three options:
Option A: Restic (Recommended)
Features: Encrypted, deduplicated, multi-backend (local/S3/SFTP)
Install: apt install restic
Option B: BorgBackup
Features: Superior compression, high dedup efficiency
Install: apt install borgbackup
Option C: rsync + hardlinks
Features: Simple, universal, zero dependencies
Install: Built-in on all Linux systems
This guide uses Restic — it offers the best balance of encryption, deduplication, and multi-backend support.
Step 2: Build Your AI Backup Agent
Let’s create the core backup script and wrap it with LLM-driven decision logic.
2.1 Backup Script Foundation
#!/usr/bin/env python3
"""backup_agent.py — AI-driven backup agent"""
import os
import json
import subprocess
import datetime
from typing import Dict, List
# ── Configuration ──────────────────────────
REPOSITORIES = {
"local": "/mnt/backup/vps",
"s3": "s3:https://s3.amazonaws.com/my-bucket/vps-backup",
}
PATHS = ["/var/www", "/etc", "/opt/docker", "/home"]
RESTIC_PASSWORD = os.environ.get("RESTIC_PASSWORD", "your-password-here")
def run_restic(args: List[str]) -> str:
"""Execute a restic command and return output."""
cmd = ["restic", "-r", REPOSITORIES["local"]] + args
env = {**os.environ, "RESTIC_PASSWORD": RESTIC_PASSWORD}
result = subprocess.run(cmd, capture_output=True, text=True, env=env)
return result.stdout
def check_backup_health() -> Dict:
"""Check the health status of existing backups."""
snapshots = run_restic(["snapshots", "--json"])
stats = run_restic(["stats", "--json"])
return {
"snapshot_count": len(json.loads(snapshots)) if snapshots else 0,
"stats": json.loads(stats) if stats else {},
"last_backup": get_last_backup_time(),
}
def create_backup(paths: List[str], tag: str = "auto") -> str:
"""Create a new backup snapshot."""
result = run_restic(["backup"] + paths + ["--tag", tag])
return result
def verify_backup(snapshot_id: str) -> bool:
"""Verify the integrity of a backup."""
result = run_restic(["check", "--read-data-subset", "10%"])
return "no errors" in result.lower()
2.2 AI Agent Decision Layer
The real intelligence comes from LLM Agent decision-making. We wrap the backup logic with an Agent framework that supports tool calling:
# agent_decision.py — LLM-driven backup decisions
agent_instruction = """
You are a VPS backup management Agent. Your responsibilities:
1. Decide whether to execute backups based on health reports
2. Choose the optimal strategy (full/incremental/differential)
3. Verify backup integrity
4. Execute recovery plans when anomalies occur
Available tools:
- check_health: Check backup status
- create_backup: Execute backup
- verify_backup: Verify backup integrity
- list_snapshots: List all snapshots
- prune_snapshots: Clean up expired snapshots
Decision rules:
- If last backup > 24h ago → run incremental backup
- If last full backup > 7 days ago → run full backup
- If verification finds errors → alert and switch backup target
- If disk remaining < 10% → trigger prune and alert
"""
def agent_decision_cycle():
"""Main decision loop for the backup agent."""
health = check_backup_health()
# Build context for the LLM
context = {
"health_report": health,
"disk_usage": get_disk_usage(),
"current_time": datetime.datetime.now().isoformat(),
}
# Call LLM for decision (pseudo-code)
# decision = llm.call(agent_instruction + json.dumps(context))
# execute_decision(decision)
For a full implementation, you’ll need to connect to an LLM API (OpenAI / DeepSeek / Claude) or use a tool-calling Agent framework like Hermes Agent.
Step 3: Automated Backup Strategy
A reliable backup strategy follows the 3-2-1 rule:
- 3 copies (1 original + 2 backups)
- 2 different media types (local + remote)
- 1 off-site copy (S3 / another VPS)
Our AI Agent operates on this schedule:
| Frequency | Type | Target | Retention |
|---|---|---|---|
| Daily at 03:00 | Incremental | Local | 30 days |
| Sunday at 03:00 | Full | Local + S3 | 12 weeks |
| 1st of month 03:00 | Full | Local + S3 + Remote | 12 months |
The Agent dynamically adjusts this plan. For example, if it detects unusually large file changes, it’ll trigger a full backup instead of the scheduled incremental.
Step 4: Automated Disaster Recovery
Backups only matter if you can restore from them. Our AI Agent runs automated, unattended DR drills:
def disaster_recovery_drill():
"""
Automated DR drill process:
1. Select latest backup
2. Mount to a temp directory
3. Verify file integrity
4. Test critical service files
5. Generate recovery report
"""
restore_test_dir = "/tmp/dr-drill-" + datetime.datetime.now().strftime("%Y%m%d")
os.makedirs(restore_test_dir, exist_ok=True)
# 1. Mount latest snapshot
run_restic(["mount", restore_test_dir])
# 2. Verify critical files
critical_paths = [
"/etc/nginx/nginx.conf",
"/etc/ssh/sshd_config",
"/var/www/html/index.html",
]
for path in critical_paths:
local_path = restore_test_dir + "/latest" + path
if os.path.exists(local_path):
print(f"✅ {path} exists")
else:
print(f"❌ {path} missing!")
# 3. Unmount
run_restic(["unmount", restore_test_dir])
# 4. Generate report
report = f"""
Disaster Recovery Drill Report
Date: {datetime.datetime.now()}
Status: {'✅ PASS' if all_ok else '❌ FAIL'}
Critical files: {ok_count}/{total_count}
"""
return report
Quick Recovery Guide
When real disaster strikes, the Agent follows this process:
1. Assess damage → 2. Select recovery point → 3. Provision new VPS → 4. Restore data → 5. Verify services
# Quick full system restore
restic restore latest --target /mnt/restore/
# Restore specific directories
restic restore latest --target /mnt/restore/ \
--include /etc \
--include /var/www
# Remote restore from S3
restic -r s3:https://s3.amazonaws.com/my-bucket/vps-backup \
restore latest --target /
Step 5: Smart Monitoring & Proactive Defense
The backup Agent does more than scheduled backups — it prevents data disasters:
Anomaly Detection
def anomaly_detection():
"""
Detect abnormal behavior that could lead to data loss:
- Mass file deletions
- Unusual file change patterns
- Encryption activity (sudden file extension changes)
"""
changes = get_recent_file_changes(hours=24)
alerts = []
if changes.get("deleted_count", 0) > 100:
alerts.append("⚠️ Mass deletions detected — immediate full backup recommended")
if changes.get("encrypted_pattern"):
alerts.append("🚨 Possible ransomware activity! Initiating emergency backup")
return alerts
Automated Defense Response
When the Agent detects anomalies, it immediately triggers protective measures:
- Instant snapshot: Create an emergency snapshot of current state
- Data locking: Set recent backups to immutable mode
- Target failover: Switch to backup destination if primary is unreachable
- Alert dispatch: Notify via Telegram/Email/Slack
Real-World Deployment
Here’s a complete docker-compose setup, running the backup Agent as a microservice:
# docker-compose.yml
version: '3.8'
services:
backup-agent:
image: python:3.11-slim
volumes:
- ./backup_agent.py:/app/agent.py
- /var/www:/data/www:ro
- /etc:/data/etc:ro
- /mnt/backup:/backup
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- RESTIC_PASSWORD=${RESTIC_PASSWORD}
- LLM_API_KEY=${LLM_API_KEY}
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
- BACKUP_SCHEDULE=0 3 * * *
command: python /app/agent.py
restart: unless-stopped
Backup & Recovery Best Practices
Hard-won lessons from production deployments:
- Test before you trust: Any backup strategy must pass full restore tests. Our Agent runs them monthly.
- Encrypt everything: Restic uses AES-256-GCM by default — even if storage is compromised, your data stays safe.
- Geographic redundancy: At least two geographically separate backup targets. We use local + S3 + remote VPS.
- Immutable backups: Restic supports
--append-onlymode to prevent backup tampering or deletion. - Back up metadata too: Include database exports, Docker volumes, and container configurations alongside files.
- Monitor everything: Agent decision logs and backup status go to your monitoring system via webhooks.
Conclusion
Data disasters aren’t a matter of if, but when. Using an AI Agent to drive your backup and disaster recovery means you’re no longer relying on manual operations or fragile scripts — you have a 24/7 backup engineer that thinks, decides, and acts autonomously.
Backups shouldn’t be a burden — they should give you peace of mind. Let the AI Agent carry that weight so you can sleep soundly.
Tools used: Restic backup tool, Python 3.11, Hermes Agent / OpenAI API. Full example implementations available on GitHub.
