aly badawy/homelab
all systems operational
// cluster · secrets

HashiCorp Vault

The single source of truth for all application secrets. Vault stores credentials as key-value pairs in KV v2; ESO syncs them into app namespaces as Kubernetes Secrets. Post-reboot recovery is fully automatic.

Unsealed namespace: security KV v2 auto-unseal CronJob

Vault runs as a StatefulSet in the security namespace, backed by a Longhorn PVC. It uses the file storage backend (single-node, no HA needed). TLS is terminated by ingress-nginx — Vault's listener runs plain HTTP inside the cluster. A CronJob unseals it automatically within 60 seconds of every reboot.

01 Architecture

The secrets flow is one-directional and automatic:

  1. Secrets are written to Vault KV v2 under secret/ once (manually, or restored from Longhorn backup).
  2. ESO reads from Vault via the ClusterSecretStore/k8s-secrets using Kubernetes auth.
  3. Each app has one or more ExternalSecret resources that declare what keys to fetch and where to put them.
  4. ESO creates/updates native Kubernetes Secret objects in the app namespace. Apps consume these normally — they have no awareness of Vault.
Rebuild requires one secret. On a fresh node, restoring the Longhorn PVC restores Vault's data (all secrets intact). The only thing that needs to be manually seeded is the vault-unseal-key Kubernetes Secret — everything else recovers automatically.

Key facts

Property Value
Namespace security
Storage backend file — /vault/data on Longhorn PVC vault-data-lh
Secrets engine KV v2 at secret/
Auth method Kubernetes auth — ESO ServiceAccount eso-vault-auth
Listener Plain HTTP on [::]:8200 — TLS handled by ingress-nginx
UI vault.in.alybadawy.com (LAN and VPN only)
AppArmor Unconfined — required; Go 1.14+ sends SIGURG to itself for goroutine preemption and the default containerd profile denies it

02 Auto-unseal CronJob

Vault starts sealed after every reboot. The vault-auto-unseal CronJob runs every minute and unseals it as soon as Longhorn reattaches the PVC (typically 4–5 minutes after boot).

Why timeout 5 vault status? Plain vault status hangs at the HTTP layer while Longhorn is still reattaching storage — the Vault pod is running but not serving requests. The timeout 5 kills the hung call so the CronJob retries next minute instead of blocking.

The logic is simple: first check TCP connectivity with nc, then check seal status. Exit code 2 means sealed → unseal. Exit code 0 means already unsealed → do nothing.

k8s/components/vault/auto-unseal-cronjob.yaml (key section) bash
# runs every minute in the security namespace
# 1. check TCP port is reachable
if ! nc -z -w 5 "$VAULT_HOST" "$VAULT_PORT" 2>/dev/null; then
  echo "Vault port not reachable — will retry next minute"
  exit 1
fi
# 2. check seal status (timeout guards against hung HTTP)
timeout 5 vault status 2>/dev/null; STATUS=$?
if [ "$STATUS" -eq 2 ]; then
  echo "Vault is sealed — unsealing"
  vault operator unseal "$UNSEAL_KEY"
elif [ "$STATUS" -eq 0 ]; then
  echo "Vault is unsealed — nothing to do"
else
  echo "Vault not ready yet (status $STATUS) — will retry next minute"
  exit 1
fi

03 Post-reboot recovery sequence

After a reboot, recovery is fully automatic — no human intervention required. The full sequence takes ~6 minutes:

  1. Vault pod starts sealed. ESO enters exponential backoff (ClusterSecretStore degraded).
  2. vault-auto-unseal retries every minute. Longhorn reattaches the PVC at ~4–5 min.
  3. Vault unseals. vault-0 passes its readiness probe.
  4. eso-recovery CronJob detects Vault ready + store degraded → restarts ESO deployments to clear backoff.
  5. ESO reconnects to Vault. All ExternalSecret resources sync. Apps recover.
Nothing to do after a reboot. Walk away — the cluster heals itself. The eso-recovery CronJob exists specifically because ESO's exponential backoff would otherwise keep the store degraded even after Vault is back. Restarting ESO clears the backoff immediately.

04 Managing secrets

Secrets are stored as flat key-value pairs in Vault KV v2. To read, write, or rotate a secret:

vault CLI (port-forward or via vault.in.alybadawy.com) bash
# port-forward for CLI access
$ kubectl port-forward -n security svc/vault 8200:8200
$ export VAULT_ADDR=http://localhost:8200

# list all secrets
$ vault kv list secret/
postgres-secret  authentik-db  authentik-secret  grafana-admin  ...

# write a secret (all keys for a path in one command)
$ vault kv put secret/postgres-secret \
    POSTGRES_USER="homelab" \
    POSTGRES_PASSWORD="<strong-password>" \
    POSTGRES_DB="homelab"

# rotate one key without touching the others
$ vault kv patch secret/postgres-secret POSTGRES_PASSWORD="<new-password>"

# KV v2 auto-versions every write — roll back if needed
$ vault kv metadata get secret/postgres-secret
$ vault kv get -version=1 secret/postgres-secret

05 Access

The Vault UI is available at vault.in.alybadawy.com on the LAN. Before ingress is live (e.g. right after a fresh install), use a port-forward:

access before ingress is live bash
$ kubectl port-forward -n security svc/vault 8200:8200
# open http://localhost:8200
last updated 2026-06-08 · view source on GitHub