Monitoring — homelab.badawy

The monitoring stack runs in the monitor namespace and is deployed as a single ArgoCD Application at sync-wave 2 (after ingress-nginx and cert-manager). Prometheus stores 30 days of metrics on local-path storage; Grafana persists dashboards to a separate 5 GiB volume. Alertmanager is enabled but alert routing is configured separately.

01 Stack components

Component	URL	Notes
Grafana	grafana.in.alybadawy.com	Admin credentials from Vault `secret/grafana-admin`
Prometheus	prometheus.in.alybadawy.com	30-day retention, 20 GiB PV on `local-path`
Alertmanager	alerts.in.alybadawy.com	Enabled — routing config TBD
node-exporter	DaemonSet (no UI)	CPU, memory, disk, network per node
kube-state-metrics	Deployment (no UI)	Kubernetes object state metrics (pods, deployments, etc.)

02 Key configuration

Several kube-prometheus-stack defaults are disabled — they require HA control-plane components (etcd, scheduler, controller-manager) that don't exist in a single-node k3s cluster. The Prometheus operator's admission webhooks are also disabled to simplify the install.

k8s/components/monitor/helm-values.yaml (key sections) yaml

prometheusOperator:
  tls:
    enabled: false    # no TLS on operator — ingress-nginx handles it
  admissionWebhooks:
    enabled: false  # simplified install; not needed for single-node

grafana:
  admin:
    existingSecret: grafana-admin  # ESO syncs this from Vault secret/grafana-admin
    userKey: admin-user
    passwordKey: admin-password
  persistence:
    enabled: true
    storageClassName: local-path  # not Longhorn — no backup needed for dashboards
    size: 5Gi

prometheus:
  prometheusSpec:
    externalLabels:
      cluster: homelab
      env: prod
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: local-path
          resources:
            requests:
              storage: 20Gi

# disabled — these components don't exist in single-node k3s
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
kubeControllerManager:
  enabled: false

Why local-path instead of Longhorn? Prometheus and Grafana store metrics and dashboard configs — valuable but fully regenerable. Using local-path keeps Longhorn's backup job list clean and reduces Longhorn I/O. If the node is destroyed, metrics history is lost but dashboards can be reimported from JSON.

03 Grafana credentials

Grafana admin credentials are managed through Vault → ESO. The grafana-admin Kubernetes Secret is created by an ExternalSecret that pulls from secret/grafana-admin in Vault.

setting grafana credentials in Vault bash

$ vault kv put secret/grafana-admin \
    admin-user="admin" \
    admin-password="<strong-password>"

ArgoCD ignoreDifferences. The monitor Application has an ignoreDifferences block for the Grafana secret and checksum annotation. This prevents ArgoCD from trying to revert Grafana's auto-generated secret hash on every sync. Without it, the app would show a permanent diff.

last updated 2026-06-08 · view source on GitHub