Block-level nightly backup of a Raspberry Pi to AWS S3. Restore a complete, bootable Pi to new hardware in one command. No manual setup, no secrets to re-enter, no git clones.
Think of it as an AMI for your Pi.
BACKUP (runs on Pi nightly via cron)
1. Quiesce the database
├─ MariaDB/MySQL detected → FLUSH TABLES WITH READ LOCK (zero downtime)
└─ No DB detected → stop Docker briefly (~10s)
2. Sync filesystem (instant)
└─ flush dirty pages and drop caches
3. Save partition table (GPT/MBR)
└─ sfdisk -d /dev/nvme0n1 ──► S3
4. Image each partition with partclone (~5–15 min for typical used data)
└─ partclone reads the filesystem allocation map and skips
unallocated blocks — only used data is transferred
/dev/nvme0n1p1 ──► partclone.ext4 ──► pigz ──► aws s3 cp ──► S3
/dev/nvme0n1p2 ──► partclone.ext4 ──► pigz ──► aws s3 cp ──► S3
/dev/mmcblk0p1 ──► partclone.vfat ──► pigz ──► aws s3 cp ──► S3
(boot firmware) partition-aware parallel streaming
clone gzip no local file
5. Release DB lock / restart Docker (writes resume)
6. Upload manifest JSON (metadata: partitions, sizes, duration)
RESTORE (run on a Linux machine or another Pi)
1. Download partition table from S3
└─ sfdisk /dev/target (recreates GPT layout)
2. Restore each partition with partclone
└─ S3 ──► gunzip ──► partclone.restore ──► /dev/target
3. Boot — root filesystem auto-expands to fill device on first boot
dd reads every sector on the device regardless of whether it's used. On a 954 GB NVMe that's 28% full, dd reads 954 GB. partclone reads the filesystem allocation bitmap and skips unallocated blocks — it reads only the ~28 GB of used data. Same result, 20× less data.
| dd | partclone | |
|---|---|---|
| Reads | every sector (used + empty) | used blocks only |
| Speed on 954GB NVMe (28% full) | ~90 min | ~5 min |
| S3 upload size | ~10 GB (compressed zeros) | ~3–5 GB |
| Restore | gunzip | dd | partclone per partition |
No config needed. The default DB_CONTAINER="auto" automatically scans for a running MariaDB/MySQL container. If found, pi2s3 uses FLUSH TABLES WITH READ LOCK (FTWRL) instead of stopping Docker. The lock is held only while sync and drop_caches complete (typically under 10 seconds), then released — DB writes resume while partclone images the partitions. Same technique as mariabackup/xtrabackup.
What happens automatically:
- Kills any orphaned
pi2s3-lockconnections left by previous crashed backups (db_kill_orphaned_locks) - Scans running containers for any MariaDB/MySQL image
- Auto-reads
MYSQL_ROOT_PASSWORD/MARIADB_ROOT_PASSWORDfrom the container env - Issues
FLUSH TABLES WITH READ LOCKvia a persistent background connection - Runs
sync+drop_cachesto flush InnoDB dirty pages to disk (~5–10 seconds) - Releases the lock — all containers stay up for the full imaging window
- Images all partitions with
partclone— site serves reads and writes throughout - Reports probe pass/fail in the ntfy notification
InnoDB replays any redo log entries written during imaging on the next startup — fuzzy snapshots taken after lock release are fully consistent and bootable.
If no MariaDB/MySQL container is found (or the lock fails), the script falls back to STOP_DOCKER=true automatically and logs why.
A background site availability probe runs every PROBE_INTERVAL seconds (default: 60) during imaging — cache-busted requests to confirm the site stays up. PROBE_LATEST_POST=true (default) auto-discovers the latest WordPress post via REST API and probes real dynamic content instead of the homepage.
If no MariaDB/MySQL container is detected, containers are stopped for the duration of partition imaging — typically 5–15 minutes at 2am. Docker is restarted immediately after all partitions are imaged.
This is still far better than the old dd approach (60–90 minutes on a full NVMe), and gives a fully consistent image. On restore, no recovery step is needed.
| Data | Location | Covered |
|---|---|---|
| OS + kernel + packages | /dev/nvme0n1 |
✅ |
| systemd services (cloudflared, watchdog) | /dev/nvme0n1 |
✅ |
| Docker runtime + all images | /dev/nvme0n1 |
✅ |
| Docker volumes (databases, uploads) | /dev/nvme0n1 |
✅ |
App config + .env files |
/dev/nvme0n1 |
✅ |
| SSH authorized keys | /dev/nvme0n1 |
✅ |
| Cron jobs | /dev/nvme0n1 |
✅ |
| GPT partition table | /dev/nvme0n1 |
✅ |
Boot firmware (config.txt, cmdline.txt) |
/boot/firmware partition |
✅ |
| NVMe performance tuning | /dev/nvme0n1 |
✅ |
Split-device setups: if your Docker data root is on a different physical device than your OS (e.g. SD card boots, USB NVMe holds data), the backup script detects this and warns you. Set
BACKUP_EXTRA_DEVICEinconfig.envto image both devices.
On the Pi (backup):
- Raspberry Pi OS (Bookworm or Trixie, 64-bit recommended)
- AWS CLI v2 — installed automatically by
install.sh partclone— installed automatically byinstall.shpigz— installed automatically byinstall.sh(parallel gzip, much faster thangzipon Pi 5's quad-core)- AWS credentials — see IAM policy below
For restore (Linux):
- Linux machine with
sfdisk(util-linux) andpartcloneinstalled - AWS CLI v2 with read access to your bucket
python3(for manifest parsing — standard on all modern Linux distros)jq— required for manifest field parsing:sudo apt install jqpvoptional for a live progress bar:sudo apt install pvlosetup(util-linux) — required for--extractpartial restores (standard on all Linux distros)
macOS note: The restore script requires Linux because
sfdiskandpartcloneare not available on macOS. The easiest approach: boot the new Pi from a minimal SD card, attach the target NVMe, SSH in, and runpi-image-restore.shfrom there.
| Hardware | OS | Status | Notes |
|---|---|---|---|
| Pi 5 + NVMe | Raspberry Pi OS Bookworm 64-bit | ✅ Tested | Reference platform |
| Pi 5 + SD card | Raspberry Pi OS Bookworm 64-bit | ✅ Tested | Slower upload (~90 min for full card) |
| Pi 4 + USB SSD | Raspberry Pi OS Bookworm 64-bit | ✅ Expected | Same kernel, same tools |
| Pi 4 + SD card | Raspberry Pi OS Bookworm 64-bit | ✅ Expected | |
| Pi 4 + NVMe (via HAT) | Raspberry Pi OS Bookworm 64-bit | ✅ Expected | HAT presents as /dev/nvme0n1 |
| Pi 3B/3B+ | Raspberry Pi OS Bookworm 64-bit | Single-core pigz, no parallel imaging; expect 2–4× slower | |
| Pi Zero 2W | Raspberry Pi OS Bookworm 64-bit | 512 MB RAM; 1-core compression; slow but functional | |
| Any Pi + 32-bit OS (armv7l) | Raspberry Pi OS Legacy 32-bit | ❌ Not supported | AWS CLI v2 has no official armv7l build; install.sh exits with a clear error |
| Non-Raspberry Pi Linux (x86_64, etc.) | Any 64-bit Linux | 🔬 Untested | partclone must be installed manually; no Pi model detection |
64-bit OS strongly recommended — AWS CLI v2 has full aarch64 support; the 32-bit (armv7l) path requires manual AWS CLI install from source or a third-party build.
Minimum permissions required. Create a dedicated IAM user and attach this policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:CreateBucket", "s3:ListBucket"],
"Resource": "arn:aws:s3:::YOUR-BUCKET-NAME"
},
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
},
{
"Effect": "Allow",
"Action": ["s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration"],
"Resource": "arn:aws:s3:::YOUR-BUCKET-NAME"
}
]
}The policy file is included at iam-policy.json. Print it with your bucket name substituted:
bash ~/pi2s3/install.sh --iam-policyApply via AWS CLI (one-time setup from any machine with IAM access):
# Create a dedicated user
aws iam create-user --user-name pi2s3
# Attach the policy (replace YOUR-BUCKET-NAME first)
aws iam put-user-policy --user-name pi2s3 \
--policy-name pi2s3 \
--policy-document file://iam-policy.json
# Generate credentials — paste output into 'aws configure' on the Pi
aws iam create-access-key --user-name pi2s3curl -sL pi2s3.com/install | bashSSH into your Pi and paste. Handles everything — installs dependencies, prompts for S3 bucket and region, configures cron, runs a dry-run test.
git clone https://github.com/andrewbakercloudscale/pi2s3.git ~/pi2s3
cd ~/pi2s3
bash install.shinstall.sh will:
- Prompt for your S3 bucket and AWS region (ntfy URL is optional)
- Write
config.env(gitignored — never committed) - Install
partclone,pigz, and AWS CLI v2 if not present (64-bit OS required) - Verify AWS access — offers to create the bucket if it doesn't exist yet
- Set up S3 lifecycle policy
- Install the nightly cron job (2:00am by default)
- Run a
--dry-runto confirm everything works - Offer to run a real backup immediately
The installer offers to run a backup at the end. To run one manually:
bash ~/pi2s3/pi-image-backup.sh --forceTakes 3–10 minutes depending on how full the device is and your network speed. You'll get an ntfy push notification when done (if NTFY_URL is configured).
All settings live in config.env (copy from config.env.example):
# Required
S3_BUCKET="your-bucket-name"
S3_REGION="us-east-1"
# Optional — push notifications via ntfy.sh (free hosted service)
NTFY_URL="" # e.g. https://ntfy.sh/my-pi-backups (blank = silent)
# Retention (default: 60 images)
MAX_IMAGES=60
# Per-host override (multi-Pi): hyphens → underscores in hostname
# MAX_IMAGES_my_pi_5=30
# AWS
AWS_PROFILE="" # blank = default profile or instance role
S3_STORAGE_CLASS="STANDARD_IA" # ~40% cheaper than STANDARD for backups
# Zero-downtime DB lock (recommended for MariaDB/MySQL setups)
DB_CONTAINER="auto" # "auto" | "container-name" | "" (native)
DB_ROOT_PASSWORD="" # blank = auto-read from container env
# Site availability probe (used with DB lock)
PROBE_URL="" # blank = auto-detect from CF_SITE_HOSTNAME
PROBE_LATEST_POST=true # probe latest WP post via REST API instead of homepage
PROBE_INTERVAL=60 # seconds between probes
# Backup behaviour (fallback if DB_CONTAINER not set)
STOP_DOCKER=true # stop Docker briefly for DB consistency (~10s)
DOCKER_STOP_TIMEOUT=30 # seconds to wait for containers to stop
CRON_SCHEDULE="0 2 * * *" # 2:00am daily
# Bandwidth throttle (requires: sudo apt install pv)
AWS_TRANSFER_RATE_LIMIT="" # e.g. "2m" = 2 MB/s, "500k" = 500 KB/s. blank = unlimited
# Client-side encryption (requires: sudo apt install gpg)
BACKUP_ENCRYPTION_PASSPHRASE="" # blank = S3 SSE only. Set to encrypt before upload.
# Pre/post backup hooks — stop/start non-Docker services around imaging
PRE_BACKUP_CMD="" # e.g. "systemctl stop nginx php8.2-fpm mariadb"
POST_BACKUP_CMD="" # e.g. "systemctl start mariadb php8.2-fpm nginx"
# Split-device (advanced)
BACKUP_EXTRA_DEVICE="" # image a second device alongside boot (see below)
# Post-backup auto-verify
BACKUP_AUTO_VERIFY=true # re-check S3 after every backup; result in ntfy notification
# Post-backup container safety check
POST_BACKUP_CHECK_ENABLED=true # separate cron ~30 min after backup confirms containers came back up
POST_BACKUP_CHECK_SCHEDULE="30 2 * * *" # adjust if backup typically runs longer than 30 min
# Pre-backup health checks
PREFLIGHT_ENABLED=true # check container health, free disk, I/O errors before imaging
PREFLIGHT_MIN_FREE_MB=500 # abort if less than this much free disk space (MB)
PREFLIGHT_ABORT_ON_WARN=false # false = warn but proceed; true = abort on any preflight warning
# Missed backup alert
STALE_CHECK_ENABLED=true # daily cron checks S3 for a recent backup; ntfy if none found
STALE_CHECK_SCHEDULE="0 6 * * *" # run well after backup window (default: 6am)
STALE_BACKUP_HOURS=25 # alert if no backup seen within this many hours
# Notifications
NTFY_LEVEL="all" # "all" | "failure"Set BACKUP_ENCRYPTION_PASSPHRASE to encrypt every partition image with GPG AES-256 before upload. Even full S3 bucket access is useless without the passphrase.
# config.env
BACKUP_ENCRYPTION_PASSPHRASE="my-strong-passphrase"Requires: sudo apt install gpg
The restore script reads the encryption field in the manifest and decrypts inline. If the passphrase is not in config.env, it prompts interactively.
Keep the passphrase safe. If your Pi dies and you lose
config.env, you cannot restore your backups. Store the passphrase in a password manager (1Password, Bitwarden, etc.) or write it down and store it offline — somewhere independent of the Pi.config.envalso contains your AWS credentials; treat it like a secrets file.
At ~3–5 GB compressed per image (128 GB NVMe, ~25% full):
| Retention | S3 storage | Monthly cost (STANDARD_IA) |
|---|---|---|
| 7 images | ~25 GB | <$1/month |
| 30 images | ~120 GB | ~$2/month |
| 60 images | ~240 GB | ~$3/month |
Costs vary by region. af-south-1 (Cape Town) is slightly higher than us-east-1.
pi-image-backup.sh [options]
--force Skip the duplicate-check (run even if today's backup exists)
--dry-run Show what would happen without uploading anything
--setup Create S3 lifecycle policy (run once after install)
--list List all backups in S3 with size and hostname
--verify Verify latest S3 backup files exist and are non-zero
--verify=DATE Verify specific date (YYYY-MM-DD)
--stale-check Ntfy alert if latest backup is older than STALE_BACKUP_HOURS
--cost Show S3 storage used and estimated monthly cost
--no-stop-docker Skip Docker stop (for daytime test runs with no downtime)
--help Show usage
SHA-256 checksums are computed in-flight during upload via tee >(sha256sum ...) — the compressed stream forks to the hash and S3 simultaneously with no re-download. Stored per partition in the manifest.
Failure ntfy alerts include the last 10 lines of the backup log for immediate triage without needing to SSH in.
Each backup creates a dated folder in S3:
s3://your-bucket/pi-image-backup/
2026-04-14/
partition-table-20260414_020045.sfdisk ← GPT layout (applied first on restore)
nvme0n1p1-20260414_020045.img.gz ← partclone image, partition 1
nvme0n1p2-20260414_020045.img.gz ← partclone image, partition 2
mmcblk0p1-boot-fw-20260414_020045.img.gz ← boot firmware (if on separate device)
manifest-20260414_020045.json ← metadata
The manifest records hostname, Pi model, OS, partition layout, sizes, duration, storage class, and SHA-256 checksums computed in-flight during upload (no re-download needed). The --verify flag checks all files listed in the manifest exist and are non-zero in S3, and prints the stored checksums.
Each partition entry in the manifest includes:
{
"name": "nvme0n1p2",
"fstype": "ext4",
"tool": "partclone.ext4",
"size_bytes": 127363883008,
"compressed_bytes": 2987654321,
"sha256": "e3b0c44298fc1c149afb...",
"key": "pi-image-backup/2026-04-16/nvme0n1p2-20260416_020045.img.gz"
}Old images beyond MAX_IMAGES are deleted automatically.
See what's in S3 with sizes and hostnames:
bash ~/pi2s3/pi-image-backup.sh --listOutput:
[1] 2026-04-16 4.2G compressed (my-pi-5)
[2] 2026-04-15 4.1G compressed (my-pi-5)
[3] 2026-04-14 4.0G compressed (my-pi-5)
Total: 3 backup(s)
Check that all files listed in the manifest exist and are non-zero in S3. Prints the stored SHA-256 checksums. Runs against the latest backup unless a date is specified:
# Verify latest
bash ~/pi2s3/pi-image-backup.sh --verify
# Verify a specific date
bash ~/pi2s3/pi-image-backup.sh --verify=2026-04-15Output:
OK nvme0n1p1-20260415_020045.img.gz (512M)
OK nvme0n1p2-20260415_020045.img.gz (3.7G)
OK partition-table (1234 bytes)
Checksums (SHA-256 of compressed upload):
e3b0c44298fc1c149afb4c8996fb92427ae41e4649b934ca495991b7852b855
a87ff679a2f3e71d9181a67b7542122c04521ead5f5a64afe10c2b7d64dd5c6
VERIFY OK. All backup files present in S3.
This is the same check BACKUP_AUTO_VERIFY=true runs automatically after each backup. Use --verify to re-check at any time — after a suspected S3 issue, before a planned restore, or as part of a monthly audit.
pi-image-restore.sh [options]
--list List all available backups
--date YYYY-MM-DD Use a specific backup (default: latest)
--device /dev/... Target device for full restore
--yes Skip confirmation prompts
--resize Expand last partition to fill device after restore
--host <hostname> Select a specific host's backups (multi-Pi setups)
--extract <path> Extract a file or directory from a backup (Linux only)
--partition <name> Partition to mount for --extract (default: largest non-boot)
--verify /dev/... Verify a flashed device against S3 manifest (dd format)
--rate-limit <speed> Cap NVMe write throughput (e.g. 10m = 10 MB/s).
Prevents PCIe write storms on Pi 5 that can cause
kernel panics. Applied after gunzip for direct control
over the uncompressed write rate. Requires pv.
--post-restore <script> Run a script inside the restored root before reboot.
RESTORE_ROOT is exported pointing to the mounted partition.
extras/post-restore-nvme-boot.sh wires up NVMe boot:
updates fstab + cmdline.txt for the new Pi's SD card.
extras/post-restore-example.sh: hostname, tunnel, .env.
Full step-by-step runbook: RECOVERY.md — the document to open when your Pi is dead and you need to restore from scratch.
Flash Raspberry Pi OS Lite (64-bit) to an SD card, boot the new Pi, SSH in, and run:
curl -sL pi2s3.com/restore | bashThis installs partclone, pigz, pv, and AWS CLI v2, prompts for your AWS credentials, clones pi2s3, and hands off to the interactive restore script. Have your AWS access key and secret ready (from your password manager).
From any machine with AWS access, confirm the S3 image is intact:
bash ~/pi2s3/test-recovery.sh --pre-flashChecks AWS access, confirms image exists and is non-zero, reads the manifest, estimates flash time, prints the restore command.
Flash Raspberry Pi OS Lite (64-bit) to an SD card using Raspberry Pi Imager. Boot the new Pi, SSH in, then run the restore one-liner above.
Requires Linux —
sfdiskandpartcloneare not available on macOS. The restore script runs on the new Pi itself.
bash ~/pi2s3/pi-image-restore.shInteractive prompts let you pick the backup date and target device. Streams directly from S3 — no local download needed.
Or restore a specific date non-interactively:
bash ~/pi2s3/pi-image-restore.sh --date 2026-04-13 --device /dev/nvme0n1 --yesInstall pv for a live progress bar:
sudo apt install pvWhat happens during restore:
- Partition table downloaded from S3 and applied to target device with
sfdisk - Each partition streamed from S3 →
gunzip→partclone.restorewith inline checksum verification fsckrun on each ext2/3/4 partition to clear journal state from live backup- Boot firmware partition restored separately (if it was on a separate device)
Pi 5 with NVMe — throttle writes to prevent kernel panics:
sudo apt install pv
bash ~/pi2s3/pi-image-restore.sh --device /dev/nvme0n1 --resize --yes --rate-limit 10m--rate-limit 10m caps the uncompressed byte rate into partclone at 10 MB/s. Increase if stable — full speed is fine on most setups; some Pi 5 + NVMe combinations trigger PCIe watchdog resets at sustained high write rates.
No target device needed. Streams the partition from S3, mounts it via a loop device (Linux kernel feature: treats a regular file as a block device), and copies the requested path to ./pi2s3-extract-<date>/.
# Recover /home/pi from the latest backup
bash ~/pi2s3/pi-image-restore.sh --extract /home/pi
# Recover /etc from a specific date
bash ~/pi2s3/pi-image-restore.sh --extract /etc --date 2026-04-16
# Specify which partition (default: largest non-boot partition = root fs)
bash ~/pi2s3/pi-image-restore.sh --extract /var/lib/docker --partition nvme0n1p2Linux only — requires losetup (standard in util-linux) and mount. Only works with partclone-format backups (all backups since v1.1).
Insert the storage into the new Pi and power on. Raspberry Pi OS automatically expands the root filesystem to fill the device on first boot.
Clear the old SSH host key on your Mac (the restored Pi has the same key as the original):
ssh-keygen -R raspberrypi.local
ssh-keygen -R <ip-address>
ssh pi@raspberrypi.localbash ~/pi2s3/test-recovery.sh --post-bootChecks: config.env present and configured, filesystem expansion, NVMe mount, Docker + all containers, Cloudflare tunnel, cron jobs, MariaDB tables, memory, load. PASS/FAIL/WARN per check.
bash ~/pi2s3/test-recovery.sh --guidePrints the complete step-by-step recovery guide.
Restore a backup to a second Pi and have it come up as a different site — different Cloudflare tunnel, hostname, and .env variables — without any manual editing after reboot.
bash ~/pi2s3/pi-image-restore.sh \
--date latest --device /dev/nvme0n1 --resize --yes \
--post-restore extras/post-restore-nvme-boot.shAfter restore completes, pi-image-restore.sh mounts the restored root partition, exports RESTORE_ROOT pointing to it, and runs your script. Changes are written directly to the target device before the first boot.
NVMe boot wiring: extras/post-restore-nvme-boot.sh — handles the two things that differ between Pi hardware:
- Updates
/etc/fstabon the restored root — swaps the original Pi's SD card PARTUUID with the new Pi's SD card PARTUUID for/boot/firmware. - Updates
/boot/firmware/cmdline.txton the running SD card so the next boot roots into the NVMe instead of the SD card.
Optionally renames the hostname: NEW_HOSTNAME=my-pi-qa bash pi-image-restore.sh ...
App customisation: extras/post-restore-example.sh — covers hostname rename, Cloudflare tunnel credential swap, .env substitution, and SSH host key regeneration.
A bootable Raspberry Pi OS Lite image with pi2s3 and all dependencies pre-installed. Flash to a USB stick or SD card, plug into any Pi 5, power on — it auto-logs in and launches the restore wizard. No laptop, no internet, no Raspberry Pi Imager session needed at restore time.
Download a pre-built image from GitHub Releases (tagged recovery-usb/YYYY-MM-DD), or build your own:
# Linux, ~15 min, ~6 GB free disk (x86_64 needs: sudo apt install qemu-user-static binfmt-support)
bash ~/pi2s3/extras/build-recovery-usb.sh
# → pi2s3-recovery-usb-YYYY-MM-DD.img.xzFlash with Raspberry Pi Imager → Use custom image. Default SSH password: recovery.
On first boot: Pi auto-logs in on tty1 → prompts for S3 bucket and AWS credentials if not yet configured → hands off to the interactive restore wizard.
To publish a release, trigger the Build Recovery USB Image workflow in GitHub Actions.
Configure a Pi 5 EEPROM to fall back to the pi2s3 restore environment over HTTP when no NVMe is present. No USB, no SD card, no laptop needed — just power and ethernet.
One-time setup per Pi:
bash ~/pi2s3/extras/setup-netboot.sh
sudo rebootThis adds HTTP_HOST=boot.pi2s3.com to the EEPROM and sets BOOT_ORDER to try NVMe first with HTTP as fallback. When the Pi boots with a blank or missing NVMe, it fetches kernel8.img + initrd.img from boot.pi2s3.com (CloudFront → S3) and boots the pi2s3 restore environment entirely in RAM.
Trigger immediate netboot (e.g. to restore to a new NVMe):
bash ~/pi2s3/extras/setup-netboot.sh --force # HTTP first
sudo reboot
# After recovery, restore normal boot order:
bash ~/pi2s3/extras/setup-netboot.sh # NVMe first, HTTP fallbackBuild and publish boot files:
bash ~/pi2s3/extras/build-netboot-image.sh --upload s3://boot.pi2s3.com/Or trigger the Build Netboot Image workflow in GitHub Actions.
Infrastructure: S3 bucket + CloudFront distribution at boot.pi2s3.com. Terraform in extras/terraform/boot-infrastructure/ with both automated and manual setup instructions.
Deploy the same backup to 10–100 Pis from a single CSV manifest. Each Pi gets the base image plus a per-Pi post-restore script that sets the hostname, Cloudflare tunnel credentials, SSH keys, and any .env variables.
Pis must be in recovery mode before deploying — use netboot (recommended for fleets) or the recovery USB image.
Manifest (fleet.csv):
# name,host,date,device,post_restore_script
pi-classroom-01,192.168.1.101,latest,/dev/nvme0n1,./post-restore/classroom.sh
pi-classroom-02,192.168.1.102,latest,/dev/nvme0n1,./post-restore/classroom.sh
pi-office,192.168.1.50,2026-04-20,/dev/nvme0n1,./post-restore/office.shDeploy:
# Sequential (default)
bash ~/pi2s3/extras/fleet-deploy.sh fleet.csv
# All Pis simultaneously
bash ~/pi2s3/extras/fleet-deploy.sh fleet.csv --parallel
# Show plan without deploying
bash ~/pi2s3/extras/fleet-deploy.sh fleet.csv --dry-run
# Deploy a single Pi by name (for re-runs or failures)
bash ~/pi2s3/extras/fleet-deploy.sh fleet.csv --only pi-classroom-02fleet-deploy.sh SSHes into each Pi, copies config.env and the per-Pi post-restore script, then runs pi-image-restore.sh non-interactively. Per-Pi logs are saved to fleet-deploy-logs-<timestamp>/.
See extras/fleet-example/ for a complete worked example including a classroom post-restore template that auto-sets the hostname from the Pi's IP address.
test-recovery.sh --pre-flash [--date YYYY-MM-DD]
test-recovery.sh --post-boot
test-recovery.sh --guide
--pre-flash (Mac) — run before flashing:
- Validates
config.envand AWS connectivity - Confirms image file exists and is non-zero size
- Reads manifest (hostname, Pi model, OS, device, compressed size)
- Estimates flash time
- Prints go/no-go with exact restore command
--post-boot (new Pi) — run after first boot:
- OS version, kernel, uptime
- Filesystem expansion (is root partition using the full device?)
- NVMe mounted at
/mnt/nvme - Docker daemon + all containers running
- Docker data-root on correct device
- Cloudflare tunnel active
- Cron jobs present (pi2s3 backup + app-layer backup)
- MariaDB responding + has tables
- HTTP check on localhost
- Memory and load
- SSH host key reminder
Exit code 0 = all passed. Exit code 1 = one or more failures.
If your Pi boots from SD card but stores Docker data on a separate NVMe or USB drive, the backup script detects this during preflight:
WARNING: Docker data is on a DIFFERENT device than boot!
Boot device: /dev/mmcblk0 (will be imaged)
Docker data: /dev/sda (NOT in this image)
Fix by adding to config.env:
BACKUP_EXTRA_DEVICE="/dev/sda"The script will then image both devices, storing the second as pi-image-extra-sda-<timestamp>.img.gz alongside the boot image.
If you run services outside of Docker — native MySQL, MariaDB, nginx, php-fpm, or any other systemd service — use PRE_BACKUP_CMD and POST_BACKUP_CMD to stop them before imaging and restart them after.
# config.env
STOP_DOCKER=false # no Docker on this Pi
PRE_BACKUP_CMD="systemctl stop nginx php8.2-fpm mariadb"
POST_BACKUP_CMD="systemctl start mariadb php8.2-fpm nginx"Imaging takes 5–15 minutes. MariaDB/nginx are down only for that window.
PRE_BACKUP_CMDruns after preflight, before imaging. If it exits non-zero the backup is aborted immediately — no partial image is taken.POST_BACKUP_CMDruns after imaging completes. Theon_exitcrash trap also calls it if the script dies mid-imaging, so your services always come back up even on failure.STOP_DOCKER=trueand hooks can coexist: Docker stops first, thenPRE_BACKUP_CMD, then imaging, thenPOST_BACKUP_CMD, then Docker restarts.
Any shell command or script path works:
PRE_BACKUP_CMD="/usr/local/bin/my-pre-backup.sh"
POST_BACKUP_CMD="/usr/local/bin/my-post-backup.sh"pi2s3 supports multiple Pis sharing a single S3 bucket. Each Pi stores its backups under pi-image-backup/<hostname>/ automatically — no config needed.
Add per-host retention overrides in config.env (replace hyphens in hostname with underscores):
MAX_IMAGES=60 # global default
MAX_IMAGES_my_pi_5=30 # override for host "my-pi-5"
MAX_IMAGES_pi_zero=7 # override for host "pi-zero"If multiple Pi hostnames exist in the bucket, pi-image-restore.sh prompts you to choose. Or specify directly:
bash ~/pi2s3/pi-image-restore.sh --host my-pi-5The --stale-check cron runs on each Pi independently and checks only that Pi's own backups under its hostname prefix.
See actual S3 usage and estimated monthly cost for the current host:
bash ~/pi2s3/pi-image-backup.sh --costextras/ — only needed if you run a Cloudflare tunnel. Script lives at
extras/cf-tunnel-watchdog.sh.
An optional self-healing monitor that runs every 5 minutes as a root cron job. If your site or Cloudflare tunnel goes down, it automatically recovers through three escalating phases before rebooting the Pi as a last resort.
Every 5 min (root cron)
↓
Check 1: Any Docker containers stopped?
Check 2: HTTP probe on localhost — 5xx or connection failure?
Check 3: cloudflared ha_connections > 0? (if metrics endpoint available)
↓
All OK → log and exit
↓
Something down:
Phase 1 (attempts 1–4, 0–20 min)
→ start stopped containers + restart cloudflared
→ verify, notify recovery or continue
Phase 2 (attempts 5–8, 20–40 min)
→ docker compose down/up (full stack restart) + cloudflared
→ verify, notify recovery or continue
Phase 3 (attempt 9+, 40+ min)
→ dump diagnostics to /var/log/pi2s3-watchdog-prediag.log
→ reboot Pi (max once per 6 hours — rate-limited)
→ if rate-limited: "manual needed" alert sent, exit without reboot
Push notifications via ntfy at every stage: first failure, each phase escalation, recovery, and stuck-down alerts.
In config.env:
CF_WATCHDOG_ENABLED=true
CF_SITE_HOSTNAME="your-site.com" # used in push notification titles
CF_HTTP_PORT=80 # local port to probe
CF_COMPOSE_DIR="" # auto-detected, or set explicitlyThen install:
bash ~/pi2s3/install.sh --watchdogOr set CF_WATCHDOG_ENABLED=true before running the initial install.sh and it installs automatically as part of setup.
| Setting | Default | Description |
|---|---|---|
CF_WATCHDOG_ENABLED |
false |
Set true to install |
CF_SITE_HOSTNAME |
hostname | Used in ntfy notification titles |
CF_HTTP_PORT |
80 |
Local port for HTTP probe |
CF_HTTP_PROBE_PATH |
/ |
URL path to probe |
CF_METRICS_URL |
http://127.0.0.1:20241/metrics |
cloudflared metrics endpoint |
CF_COMPOSE_DIR |
auto-detect | Path to docker-compose.yml |
CF_PHASE1_MAX |
4 |
Attempts before full stack restart |
CF_PHASE2_MAX |
8 |
Attempts before Pi reboot |
CF_REBOOT_MIN_INTERVAL |
21600 |
Seconds between reboots (6 hours) |
cloudflared metrics: only checked if the endpoint is reachable. Enable in your cloudflared config with
metrics: localhost:20241. If not configured, the watchdog skips the tunnel connection check and relies on Docker + HTTP checks only.
# Install / reinstall watchdog
bash ~/pi2s3/install.sh --watchdog
# Manual test run
sudo /usr/local/bin/pi2s3-watchdog.sh
# Live log tail
sudo journalctl -t pi2s3-watchdog -f
# View today's watchdog activity
sudo journalctl -t pi2s3-watchdog --since today
# View pre-reboot diagnostics (if a watchdog reboot occurred)
sudo cat /var/log/pi2s3-watchdog-prediag.logextras/ — only needed if you run WordPress with PHP-FPM. Script lives at
extras/fpm-saturation-monitor.sh. Configuration:extras/config.env.example.
When all PHP-FPM workers are exhausted, WordPress serves 504 errors — but WP-Cron itself is also stuck, so any cron-based alerting goes silent at the worst moment. fpm-saturation-monitor.sh runs as a host cron (not inside Docker), so it fires regardless of PHP-FPM state.
Three checks run every minute:
| Check | Condition | Action |
|---|---|---|
| HTTP probe | curl to FPM_PROBE_URL times out or returns 5xx |
Increment saturation counter |
| Long DB queries | > 15 s queries from wordpress user in PROCESSLIST |
Increment saturation counter |
| Orphaned backup lock | pi2s3-lock sleep connection in PROCESSLIST |
Kill immediately + alert (once per 30 min) |
After FPM_SATURATION_THRESHOLD consecutive saturated checks (default: 3) an ntfy alert fires. Set FPM_AUTO_RESTART=true to automatically restart the WordPress container instead of waiting for manual intervention — an ntfy notification confirms the restart with a cooldown between attempts. A recovery notification fires when the site returns to normal.
# 1. Add to crontab on the Pi host
crontab -e
# Paste: * * * * * /home/pi/pi2s3/extras/fpm-saturation-monitor.sh 2>/dev/nullAdd the FPM settings from extras/config.env.example to ~/pi2s3/config.env. Key settings:
FPM_SATURATION_THRESHOLD=3 # consecutive saturated checks before alerting
FPM_PROBE_URL=http://localhost:80/
FPM_WP_CONTAINER=wordpress
FPM_DB_CONTAINER=mariadb
FPM_AUTO_RESTART=false # true = restart container automatically on saturation| Setting | Default | Description |
|---|---|---|
FPM_SATURATION_THRESHOLD |
3 |
Consecutive saturated checks before alert |
FPM_PROBE_URL |
http://localhost:8082/ |
URL to probe for liveness |
FPM_PROBE_TIMEOUT |
5 |
curl timeout in seconds |
FPM_WP_CONTAINER |
pi_wordpress |
WordPress Docker container name |
FPM_DB_CONTAINER |
pi_mariadb |
MariaDB Docker container name |
FPM_ALERT_COOLDOWN |
1800 |
Seconds between repeat saturation alerts (30 min) |
FPM_AUTO_RESTART |
false |
Set true to auto-restart on saturation |
FPM_RESTART_COOLDOWN |
1200 |
Seconds between auto-restarts (20 min) |
FPM_CALLBACK_URL |
— | CloudScale Devtools plugin admin-ajax.php URL |
FPM_CALLBACK_TOKEN |
— | Token from Debug AI tab in plugin |
The CloudScale Cyber and Devtools plugin (Debug AI tab → PHP-FPM Saturation Monitor) shows:
- Configurable settings (threshold, cooldown, probe URL, containers)
- A pre-filled
config.envsnippet with a one-click copy button - Last saturation event timestamp and reason (requires
FPM_CALLBACK_URL/FPM_CALLBACK_TOKEN) - Restart events reported separately (
type=restarted) whenFPM_AUTO_RESTART=true
An optional daily "I'm alive" push notification via ntfy. If the notification stops arriving, the Pi is down or unreachable.
Enable in config.env:
NTFY_HEARTBEAT_ENABLED=true
NTFY_HEARTBEAT_SCHEDULE="0 8 * * *" # 8:00am dailyThen install (or re-run install):
bash ~/pi2s3/install.shEach heartbeat includes: uptime, RAM usage, disk usage, Docker container count.
Pull the latest code and redeploy:
bash ~/pi2s3/install.sh --upgradeThis:
- Runs
git pullin the repo directory - Redeploys the watchdog binary to
/usr/local/bin/pi2s3-watchdog.sh(if installed) - Refreshes the backup cron schedule in case
CRON_SCHEDULEchanged
The --status command also detects if the watchdog binary is stale (source updated but binary not redeployed):
bash ~/pi2s3/install.sh --statuspi2s3 captures the full machine state but is large (~3–5 GB/image). For cheap, fast, granular data recovery (restore just the database, single-file recovery, cross-version migrations), run an app-layer backup alongside:
| pi2s3 | App-layer backup | |
|---|---|---|
| What's backed up | Entire disk | DB + uploads + config files |
| Compressed size | ~3–5 GB | ~500 MB |
| Restore scenario | Pi hardware failure, OS corruption | DB corruption, accidental delete |
| Restore process | Flash + boot | docker restore commands |
| Knowledge needed | None | Some |
| Cost (60 days) | ~$3/month | <$1/month |
Both are complementary. pi2s3 for disaster recovery; app-layer for day-to-day data safety.
If a restore fails or produces unexpected results, run the built-in diagnostic:
sudo bash ~/pi2s3/extras/diagnose-restore.sh10 sections: power/voltage (throttle state, rail voltages, dmesg UV count), hardware (NVMe SMART, EEPROM boot order, watchdog), restore log completeness (UV events per interval, download speed), WiFi (SSID, signal, password encoding), corporate proxy/firewall, internet + AWS reachability (10-ping packet loss), active processes, boot config with PARTUUID cross-check, kernel messages, and S3 manifest JSON validation.
Saves a full report to /var/log/pi2s3-diagnose-TIMESTAMP.log. Attach that file when opening a GitHub issue.
Solid red with no green ACT LED means the Pi firmware can't read the SD card or cmdline.txt has a bad root=PARTUUID= value. Run this on your Mac with the SD card inserted:
bash ~/pi2s3/extras/recover-sd-boot.shAuto-detects the SD card at /Volumes/bootfs. Shows the current cmdline.txt, diagnoses the root= parameter, and restores from cmdline.txt.bak (saved automatically by post-restore-nvme-boot.sh before any edits). If no backup exists, prints step-by-step instructions for all recovery scenarios.
Cannot detect boot device
The script couldn't identify which device the Pi boots from. Check:
findmnt -n -o SOURCE /
lsblkOverride manually by setting BOOT_DEV at the top of pi-image-backup.sh.
partclone not found
Re-run install.sh or install manually:
sudo apt install partcloneCannot reach s3://your-bucket/
Check credentials and IAM permissions:
aws s3 ls s3://your-bucket/
aws sts get-caller-identity
# Print the required policy with your bucket name:
bash ~/pi2s3/install.sh --iam-policyaws CLI not found
Re-run install.sh or install manually:
# Pi (aarch64)
curl -sL https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip -o awscliv2.zip
unzip awscliv2.zip && sudo ./aws/installBackup takes too long
Install pigz for parallel compression (4× faster on Pi 5):
sudo apt install pigzFilesystem didn't expand after restore
sudo raspi-config --expand-rootfs
sudo rebootSSH host key conflict after restore
ssh-keygen -R raspberrypi.local
ssh-keygen -R <ip-address>bash ~/pi2s3/install.sh --status # show cron, log tail, dependency versions, stale binary check
bash ~/pi2s3/install.sh --uninstall # remove all cron jobs and logrotate config
bash ~/pi2s3/install.sh --upgrade # git pull + redeploy watchdog binary + refresh cronMIT