Skip to content

Add offline backup for foremanctl#507

Open
sjha4 wants to merge 1 commit into
theforeman:masterfrom
sjha4:backup-offline-implementation
Open

Add offline backup for foremanctl#507
sjha4 wants to merge 1 commit into
theforeman:masterfrom
sjha4:backup-offline-implementation

Conversation

@sjha4

@sjha4 sjha4 commented May 13, 2026

Copy link
Copy Markdown
Contributor

Why are you introducing these changes? (Problem description, related links)

This implements offline backup with foremanctl.

What are the changes introduced in this pull request?

  • Offline backup

    • Adds foremanctl backup command for offline backups
    • Implements comprehensive backup of core deployment components:
      • All databases (foreman, candlepin, pulp, IOP databases)
      • Foremanctl state directory
      • Pulp content directory (optional via --skip-pulp-content)
      • Container image metadata (digests for restore compatibility)
    • Preflight checks to ensure backup safety:
      • Verifies no running Foreman/Pulp tasks (with --wait-for-tasks option)
      • Database integrity verification using PostgreSQL amcheck extension
    • Automatic service restoration on backup failure
    • Adds comprehensive user documentation at docs/user/backup.md
    • New roles: backup, check_database_index

How to test this pull request

I got a foremanctl box with normal deploy. On this box, clone foremanctl repo and checkout this branch.
cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl

Then try ./foremanctl --help

Also,

(.venv) [root@ip-10-0-167-40 foremanctl]# ./foremanctl backup  --help
usage: foremanctl backup [-h] [-v] [--skip-pulp-content][--wait-for-tasks] backup_dir

Create offline backup of Foreman databases and configuration

positional arguments:
  backup_dir            Directory where backup files will be stored

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --skip-pulp-content   Skip Pulp content directory backup
  --wait-for-tasks      Wait for running tasks to complete instead of failing immediately

You can run :
Command:

cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl
./foremanctl backup /var/tmp/foreman-backup-test --wait-for-tasks

Steps to reproduce:

  • On a foremanctl box, run foremanctl backup BACKUP_DIR

Checklist

  • Tests added/updated (if applicable)
  • Documentation updated (if applicable)

@sjha4 sjha4 marked this pull request as ready for review May 13, 2026 19:28
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/backup.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 2 times, most recently from 15ba9dd to c6de1eb Compare May 14, 2026 14:14

@ianballou ianballou left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an automated test run, take these with a grain of salt, but I wanted to post early in case they're real for extra time to test and fix.

Firstly, the DBs did get backed up, so awesome, but there were some hiccups that stopped the full process from running:

Bug #1: podman_network.yaml fails when no custom networks exist

File: src/playbooks/backup/tasks/podman_network.yaml
Severity: Blocker — prevents backup from completing
Description: The shell command podman network ls --format '{{.Name}}' | grep -v '^podman$' | while read net; do ... returns exit code 1 when there are no custom networks, because grep -v finds no matching lines.
Fix: Add || true after the grep, or use a different approach:

# Option A: tolerate empty result
  failed_when: networks_json.rc not in [0, 1]

# Option B: check first, skip if no custom networks

Bug #2: Wrong Foreman tasks API endpoint

File: src/playbooks/backup/tasks/preflight.yaml
Severity: High — preflight silently skips running task detection
Description: Uses https://{{ fqdn }}/api/v2/tasks?state=running which returns 404. The correct endpoint is https://{{ fqdn }}/foreman_tasks/api/tasks?state=running&search=state%3Drunning. Because failed_when: false is set, the error is silently ignored.
Impact: Backups will proceed even with running Foreman tasks, risking data inconsistency.

Bug #3: pg_isready and pg_dump not available on host

... I cut the output here, I'm not sure why these commands weren't on my box. It's not related to this PR I don't think.

Bug #4: Hardcoded parameters.yaml path in metadata task

File: src/playbooks/backup/tasks/metadata.yaml
Severity: Low — affects metadata accuracy only
Description: ansible.builtin.slurp reads from /var/lib/foremanctl/parameters.yaml but foremanctl's state directory is configurable via OBSAH_STATE. In dev/vagrant setups, the actual path is different (e.g., /vagrant/.var/lib/foremanctl/parameters.yaml).
Impact: enabled_features: [] in metadata despite features being configured. Doesn't affect DB dump correctness.
Fix: Use the state_dir variable instead of hardcoding the path.

@sjha4

sjha4 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Will update the tasks endpoint and parameters.yaml path..

About the podman networks, those are created for IOP.. Like https://github.com/theforeman/foremanctl/blob/master/src/roles/iop_network/tasks/main.yaml so I'd assume we have that present in production deployments. We can add some handling for when it's not.

@aidenfine

Copy link
Copy Markdown
Contributor

Maybe nitpick but does it make sense to include the not yet implemented flags when doing backup --help? Would you be opposed to just leaving them out in this PR?

@sjha4

sjha4 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Maybe nitpick but does it make sense to include the not yet implemented flags when doing backup --help? Would you be opposed to just leaving them out in this PR?

I am fine either way but it's helpful guidance for future PRs and documentation to look at.

Comment thread src/playbooks/backup/tasks/podman_network.yaml Outdated
Comment thread src/playbooks/backup/tasks/podman_volumes.yaml Outdated
Comment thread src/roles/backup/tasks/preflight.yaml Outdated
Comment thread src/playbooks/backup/tasks/quadlet_files.yaml Outdated
Comment thread src/playbooks/backup/tasks/foremanctl_state.yaml Outdated
Comment thread src/playbooks/backup/tasks/quadlet_files.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 4 times, most recently from 4da7174 to 10c3e00 Compare May 15, 2026 17:44
Comment thread src/playbooks/backup/tasks/database_detection.yaml Outdated
Comment thread src/playbooks/backup/tasks/database_dumps.yaml Outdated
Comment thread src/playbooks/backup/tasks/foremanctl_state.yaml Outdated
Comment thread src/playbooks/backup/tasks/foremanctl_state.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/podman_secrets.yaml Outdated
Comment thread src/playbooks/backup/tasks/podman_volumes.yaml Outdated
@evgeni

evgeni commented May 19, 2026

Copy link
Copy Markdown
Member

not a full review (I stopped somewhere around the secrets backup), but overall this feels a lot like "let's write a huge bash script and then wrap it in YAML" and not like Ansible :(

@sjha4 sjha4 force-pushed the backup-offline-implementation branch from b066ac9 to 16293d2 Compare June 10, 2026 21:05
Comment thread src/roles/backup/tasks/main.yaml
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 4 times, most recently from a67e4d0 to 273fa50 Compare June 11, 2026 20:31
@sjha4

sjha4 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@ehelms @evgeni @ianballou Does this 🍞 look baked enough? We have some follow up PRs in this area we are starting to focus on based on this PR if we think this looks good.

Comment thread src/playbooks/backup/backup.yaml Outdated
Comment thread src/roles/backup/tasks/main.yaml Outdated
Comment thread src/roles/backup/tasks/main.yaml Outdated
Comment thread src/roles/backup/tasks/main.yaml Outdated
Comment thread src/roles/backup/tasks/main.yaml Outdated
Comment thread src/roles/backup/tasks/main.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 2 times, most recently from e82328e to 40a2bd1 Compare June 16, 2026 14:15
Comment thread src/roles/backup/tasks/main.yaml Outdated
Comment thread src/roles/check_database_index/tasks/main.yml Outdated
Comment thread src/roles/backup/tasks/pulp_content.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 4 times, most recently from 0e59f8e to 78230ae Compare June 16, 2026 15:42

@ianballou ianballou left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments, overall this is working well. Overview of my testing:

- All three databases dumped and valid
- State directory archived (passwords, OAuth keys, parameters.yaml, and certs when `OBSAH_STATE=/var/lib/foremanctl`)
- Pulp content + encryption keys captured
- `--skip-pulp-content` works
- Preflight checks (tasks, amcheck) work
- Services always come back up after backup or failure
- No dead code
- Restore works (with some bugs caused by the restore code, not backup)

The comments are minor and non-blocking from my standpoint, so I'm going to ack this. If we discover anything else during restore (which I don't expect since I tested restore), I think we can address it there.

- "../../vars/flavors/{{ flavor }}.yml"
- "../../vars/{{ certificates_source }}_certificates.yml"
- "../../vars/foreman.yml"
- "../../vars/database.yml"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of database.yml causes password files to be created even for databases that don't exist. In my case, it was for the IoP databases on my vanilla Katello installation.

I'm not sure this should be fixed by backup/restore. Without being an Ansible expert, I wonder if backup/restore should not be responsible for filtering out databases that are outside of the scope of features used in the installation. Otherwise we'll have having feature filtering logic spread all over.

This is not a critical issue I don't think. Side effect is that my machine ended up with:

-rw------- 1 root root 21 Jun 16 14:46 iop-advisor-db-password
-rw------- 1 root root 21 Jun 16 14:46 iop-inventory-db-password
-rw------- 1 root root 21 Jun 16 14:46 iop-remediation-db-password
-rw------- 1 root root 21 Jun 16 14:46 iop-vmaas-db-password
-rw------- 1 root root 21 Jun 16 14:46 iop-vulnerability-db-password

even though I had no IoP DBs. @ehelms do you think this is worth a broader issue?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - please file a github issue.

Comment thread src/roles/backup/tasks/main.yaml Outdated
Comment thread src/roles/check_database_index/tasks/main.yml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 2 times, most recently from 0ee6604 to 45db441 Compare June 16, 2026 20:15
Comment thread src/roles/backup/tasks/metadata.yaml Outdated
hostname: "{{ ansible_fqdn }}"
os_version: "{{ ansible_distribution }} {{ ansible_distribution_version }}"
foremanctl_version: "{{ ansible_facts.packages['foremanctl'][0].version | default('unknown') if 'foremanctl' in ansible_facts.packages else 'unknown' }}"
online: false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be type: offline

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to type: offline ✅

Comment thread src/roles/backup/tasks/metadata.yaml Outdated
incremental: false
timestamp: "{{ backup_timestamp }}"
databases: "{{ backup_databases_to_backup }}"
iop_enabled: "{{ 'iop' in enabled_features }}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iop will be in the features list and is not needed here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. 👍🏼

@ehelms ehelms left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nitpicks. I am ok with adding tests as a follow up PR.

Implements comprehensive offline backup functionality for Foreman deployments:
- Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs)
- Backs up podman secrets, networks, volumes, quadlet files
- Backs up systemd units and foremanctl state
- Includes metadata with container image digests for restore compatibility
- Preflight checks for running tasks and database integrity (amcheck)
- Automatic service restoration on failure

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sjha4 sjha4 force-pushed the backup-offline-implementation branch from 45db441 to f027813 Compare June 17, 2026 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants