Skip to content

Latest commit

 

History

History
286 lines (217 loc) · 9.12 KB

File metadata and controls

286 lines (217 loc) · 9.12 KB

Getting Started

pg_hardstorage is a PostgreSQL backup tool built around continuous WAL streaming over the replication protocol. In production you run two processes side by side: wal stream continuously receives WAL from PG and commits every completed 16 MiB segment into the repo, and backup lays down a base backup on a schedule (e.g. nightly) that the stream rolls forward from. Daily backup + always-on stream = PITR to any segment-aligned point.

It works against managed PG (RDS, Cloud SQL, Azure DB) the same as bare metal, deduplicates and encrypts content-addressed chunks, and restores with PITR. PG 15+, Apache 2.0.

This page gets you from zero to a running streamer, a first base backup, and a restored data dir in five minutes. After that, see the operator guide.


1. Install

Pre-built binary

Releases ship as static linux/{amd64,arm64} and darwin/arm64 tarballs (Windows is CLI-only). Grab the matching one from github.com/cybertec-postgresql/pg_hardstorage/releases, verify the cosign signature, and drop the binary on your $PATH:

curl -LO https://github.com/cybertec-postgresql/pg_hardstorage/releases/download/v0.1.1/pg_hardstorage_linux_amd64.tar.gz
tar xzf pg_hardstorage_linux_amd64.tar.gz
sudo install -m 0755 pg_hardstorage /usr/local/bin/
pg_hardstorage version

.deb (Debian / Ubuntu)

sudo dpkg -i pg-hardstorage_0.1.1_amd64.deb

The package installs the binary at /usr/bin/pg_hardstorage, drops a systemd unit at /lib/systemd/system/pg_hardstorage.service, and creates /etc/pg_hardstorage/, /var/lib/pg_hardstorage/, /var/log/pg_hardstorage/ with mode 0750 owned by pg-hardstorage.

.rpm (Fedora / RHEL / Rocky / Alma)

sudo rpm -i pg-hardstorage-0.1.1-1.x86_64.rpm

Same layout as the .deb.

Container image

docker pull ghcr.io/cybertec-postgresql/pg_hardstorage:v0.1.1
docker run --rm ghcr.io/cybertec-postgresql/pg_hardstorage:v0.1.1 version

The image is distroless. Mount a config dir at /etc/pg_hardstorage and a state dir at /var/lib/pg_hardstorage; both must be writable by UID 65532.

From source

git clone https://github.com/cybertec-postgresql/pg_hardstorage
cd pg_hardstorage
make                       # produces bin/pg_hardstorage
sudo install -m 0755 bin/pg_hardstorage /usr/local/bin/

Requires Go 1.26+. make test runs the full unit suite under the race detector; make test-integration exercises a real PostgreSQL 17 container via testcontainers-go (needs Docker).


2. Five-minute quickstart

2.1 Provision a replication user on PostgreSQL

CREATE ROLE pgbackup REPLICATION LOGIN PASSWORD '<strong>';

Add a pg_hba.conf line that allows the agent host to replicate as that role:

host  replication  pgbackup  10.0.0.5/32  scram-sha-256

Reload PG (SELECT pg_reload_conf()).

2.2 Create a repository

pg_hardstorage repo init file:///srv/backups

The repo is a directory (or S3 bucket) that holds chunks, manifests, and WAL. One repo can hold many deployments. repo init is idempotent on the URL — re-running against an existing repo returns conflict.repo_exists (exit 7).

S3 works the same way:

pg_hardstorage repo init 's3://acme-backups/?region=eu-central-1'

Other backends use the same shape — pick the URL scheme that matches your storage:

Backend Example URL
Local filesystem file:///srv/backups
AWS S3 / MinIO / R2 / B2 s3://acme-backups/?region=eu-central-1
Google Cloud Storage gcs://acme-backups/
Azure Blob azblob://account.blob.core.windows.net/container/
Remote host via SSH (SFTP) sftp://backup@nas.example.com/srv/backups
Remote host via SSH (ssh-exec) scp://backup@nas.example.com/srv/backups

sftp:// and scp:// both ride SSH; pick sftp:// by default and scp:// when the remote disables the SFTP subsystem. See Add an SFTP repository and Add an SCP repository for the auth / known_hosts / extras-map setup.

2.3 Validate PG is ready to stream (one-shot preflight)

wal stream runs an automatic preflight on every start, but you can run it standalone first to confirm the source PostgreSQL satisfies the replication requirements before you wire systemd:

pg_hardstorage wal preflight db1 \
    --pg-connection 'postgres://pgbackup@db1.example.com/postgres'

Fatal findings (wal_level.too_low, max_replication_slots.full, max_wal_senders.saturated, role.no_replication) make the command exit non-zero with a suggestion: block on each finding. Warnings (max_slot_wal_keep_size.set, idle_replication_slot_timeout.set on PG 17+) surface but don't block.

2.4 Start the WAL streamer (this is the always-on core)

With preflight clean, start the WAL streamer. It is the headline feature of pg_hardstorage and the process you keep running 24/7:

pg_hardstorage wal stream db1 \
    --pg-connection 'postgres://pgbackup@db1.example.com/postgres' \
    --repo file:///srv/backups

The agent issues CREATE_REPLICATION_SLOT pg_hardstorage_db1 PHYSICAL RESERVE_WAL if the slot is absent — RESERVE_WAL pins the slot's restart_lsn immediately at create time, so PG retains WAL from that moment on. Then it issues START_REPLICATION SLOT pg_hardstorage_db1 PHYSICAL against the slot. The stream is gap-free across agent restarts. Supervise it with systemd (the package ships pg_hardstorage@<deployment>.service for exactly this) or your container scheduler.

--skip-preflight is the explicit override if you've already audited PG; --no-slot is the explicit escape hatch for archive-only deployments that guarantee WAL retention through another mechanism (both emit loud warnings — using either is deliberate).

Leave it running. The remaining steps run in a second terminal or under a separate scheduler.

2.5 Take the first base backup

With the streamer running concurrently, take a base backup. The two processes share the repo URL but do not coordinate beyond that — backup streams a BASE_BACKUP over its own replication connection while wal stream keeps shipping WAL.

The wizard probes PG, generates a signing keypair and a KEK, writes pg_hardstorage.yaml, and (by default) takes the first backup:

pg_hardstorage init \
    --pg-connection 'postgres://pgbackup@db1.example.com/postgres' \
    --repo file:///srv/backups \
    --deployment db1 \
    --yes

To take a backup later without going through the wizard (this is the command your scheduler runs nightly):

pg_hardstorage backup db1 \
    --pg-connection 'postgres://pgbackup@db1.example.com/postgres' \
    --repo file:///srv/backups

In production: schedule backup (cron / systemd timer / k8s CronJob), supervise wal stream (systemd / k8s Deployment). The base backup is the periodic anchor; the streamer is what makes PITR byte-precise between anchors.

2.6 Restore

pg_hardstorage restore db1 latest \
    --target /var/lib/postgresql/restored \
    --repo file:///srv/backups

PITR via natural-language time:

pg_hardstorage restore db1 latest \
    --target /var/lib/postgresql/restored \
    --repo file:///srv/backups \
    --to "5 minutes ago"

Or to a specific LSN: --to-lsn 0/3000028. Or to a named restore point: --to-name pre_release.

The restore writes a managed recovery.signal and a managed block in postgresql.auto.conf whose restore_command invokes pg_hardstorage wal fetch <deployment> %f %p --repo .... Start PG; recovery proceeds.

A pg_verifybackup check runs against the data dir before the restore declares success. Skip it with --verify=skip only if you know what you are doing — exit 9 means the verifier said no.


3. Verifying the install

$ pg_hardstorage version
pg_hardstorage v0.1.1 (abc1234, built 2026-04-29T12:00:00Z)

doctor is the single-command "is anything wrong" check:

$ pg_hardstorage doctor
db1 — PG 17.2 — primary @ db1.example.com
  ✓ PostgreSQL reachable
  ✓ Replication slot 'pg_hardstorage_db1' active, lag 12s
  ✓ Last backup 47m ago
  ✓ Repository file:///srv/backups writable
  ✓ KMS keyring present (~/.config/pg_hardstorage/keyring)
  ✓ Schedule: next at 04:00 UTC
Summary: 1 healthy.

Any line carries a Suggested fix: block underneath. Run pg_hardstorage doctor -o json for a machine-readable form.

doctor exits 0 when healthy and exit 10 with --exit-on-issues when there are findings — wire that into your alerting if you want a hard fail signal.


4. What's next

  • Operator guide — daily operations, retention, verification, encryption, sinks, troubleshooting pointers.
  • Architecture — how the data plane is built, why it talks the replication protocol, what the on-disk layout looks like.
  • Runbooks — copy-paste procedures for the seven scenarios that wake an on-call DBA at 3am.
  • API reference — REST surface for the control plane.