01 Backup layers
| Layer | What's protected | Destination | Recovery |
|---|---|---|---|
| Longhorn volume backups | All stateful Kubernetes PVCs (Vault, PostgreSQL, Nextcloud, Authentik, Immich) | NAS NFS: /var/nfs/shared/backups/pvcs |
Restore from Longhorn UI during cluster rebuild |
| PostgreSQL SQL dumps | All databases in the shared PostgreSQL instance | NAS NFS: /mnt/nas/backups |
Restore from .sql dump file with psql |
| NAS storage protection | All NAS data including backups themselves |
02 Longhorn volume backups
Longhorn's recurring job system automatically snapshots and backs up every annotated PVC. The default schedule runs against any PVC carrying recurring-job-group.longhorn.io/default: enabled.
namespace PVC name data stored
────────────────────────────────────────────────────────────────
security vault-data-lh Vault KV secrets (all app credentials)
db postgres-data-lh All PostgreSQL databases
cloud nextcloud-data-lh Nextcloud application data
security authentik-media-lh Authentik uploaded media
security authentik-templates-lh Authentik custom templates
immich immich-pvc (see note) Immich app data (not the photo library)
/mnt/nas/immich — a direct NFS mount from the NAS. That data is protected by the NAS storage layer, not Longhorn. A cluster rebuild does not affect the photo library — it lives on the NAS and is remounted on first deploy.The backup target and polling interval are set in the Longhorn Helm values:
defaultSettings:
backupTarget: nfs://172.20.20.2:/var/nfs/shared/backups/pvcs
backupTargetCredentialSecret: "" # no auth needed (NAS on trusted Servers VLAN)
backupPollInterval: 300 # seconds between polls
03 Backup schedule
All layers run on a coordinated schedule. Longhorn snapshots run first, backups to NAS run 30 minutes later, and NAS snapshots run 29 minutes after that — giving Longhorn time to finish before the NAS preserves its own state.
| Layer | Event | Times (daily) |
|---|---|---|
| Longhorn | Volume snapshot | 12:00 AM · 6:00 AM · 12:00 PM · 6:00 PM |
| Longhorn → NAS | Backup to NFS target | 12:30 AM · 6:30 AM · 12:30 PM · 6:30 PM |
| NAS (UNAS 4) | Filesystem snapshot (RAID 5) | 12:59 AM · 6:59 AM · 12:59 PM · 6:59 PM |
| NAS → Remote NAS | Off-site backup | 3:00 AM (daily) |
04 PostgreSQL SQL dumps
A pg-dump CronJob in the db namespace runs periodic SQL dumps of all databases and writes them to /mnt/nas/backups. This provides a second, SQL-level backup independent of Longhorn — useful for point-in-time restore of specific tables or schemas without needing to restore an entire volume.
# list recent dump jobs
$ kubectl get jobs -n db --sort-by=.metadata.creationTimestamp
# check the last dump succeeded
$ kubectl logs -n db -l job-name=pg-dump --tail=20
# list dump files on the NAS
$ ls -lh /mnt/nas/backups/*.sql 2>/dev/null
05 NAS storage protection
The NAS itself needs protection — if it fails, all Longhorn backups and SQL dumps are lost. The UNAS 4 uses RAID 5, so a single drive failure results in no data loss. Snapshots and off-site replication add additional recovery options.
| Property | Value |
|---|---|
| RAID / redundancy | RAID 5 — single drive failure tolerated. NAS self-rebuilds when a replacement drive is inserted. |
| Snapshot schedule | Every 6 hours at 12:59, 6:59, 12:59, 18:59 — timed 29 min after Longhorn backups complete. |
| Off-site backup | Backed up to a remote NAS every 24 hours at 3:00 AM. |
06 Recovery objectives
| Scenario | RPO (max data loss) | RTO (recovery time) | Method |
|---|---|---|---|
| k3s node failure | Since last Longhorn backup (≤6 hours) | ~30 min | Full rebuild — rebuild guide |
| Single app failure | 0 (redeploy from Git) | <5 min | ArgoCD re-sync or pod restart |
| Database corruption | Since last Longhorn backup or pg-dump (≤6 hours) | ~15 min | Restore from Longhorn backup (preferred); fall back to .sql dump if backup is not recent enough |
| NAS drive failure | 0 — RAID 5, no data loss on single drive failure | RAID rebuild time (varies by drive size) | Replace failed drive; NAS automatically rebuilds RAID array |
| NAS server failure | Since last off-site backup (≤24 hours) | ~1 hour | Restore from remote NAS backup to replacement hardware |
| Vault data loss | Since last Longhorn backup (≤6 hours) | ~15 min | Vault data lives in a Longhorn PVC — restore the PVC, then unseal with the offline unseal key |
| Vault unseal key lost | N/A | Significantly longer — manual | Without the unseal key, Vault cannot be unsealed. Recovery requires manual re-initialization and re-seeding all secrets. Store the unseal key offline and never commit it to Git. |