aly badawy/homelab
all systems operational
// cluster · observability

Monitoring

The full kube-prometheus-stack: Prometheus scrapes metrics from all cluster components, Grafana visualises them, and Alertmanager fires alerts. Grafana admin credentials come from Vault via ESO — no hardcoded secrets.

Healthy namespace: monitor 30d retention kube-prometheus-stack

The monitoring stack runs in the monitor namespace and is deployed as a single ArgoCD Application at sync-wave 2 (after ingress-nginx and cert-manager). Prometheus stores 30 days of metrics on local-path storage; Grafana persists dashboards to a separate 5 GiB volume. Alertmanager is enabled but alert routing is configured separately.

01 Stack components

ComponentURLNotes
Grafana grafana.in.alybadawy.com Admin credentials from Vault secret/grafana-admin
Prometheus prometheus.in.alybadawy.com 30-day retention, 20 GiB PV on local-path
Alertmanager alerts.in.alybadawy.com Enabled — routing config TBD
node-exporter DaemonSet (no UI) CPU, memory, disk, network per node
kube-state-metrics Deployment (no UI) Kubernetes object state metrics (pods, deployments, etc.)

02 Key configuration

Several kube-prometheus-stack defaults are disabled — they require HA control-plane components (etcd, scheduler, controller-manager) that don't exist in a single-node k3s cluster. The Prometheus operator's admission webhooks are also disabled to simplify the install.

k8s/components/monitor/helm-values.yaml (key sections) yaml
prometheusOperator:
  tls:
    enabled: false    # no TLS on operator — ingress-nginx handles it
  admissionWebhooks:
    enabled: false  # simplified install; not needed for single-node

grafana:
  admin:
    existingSecret: grafana-admin  # ESO syncs this from Vault secret/grafana-admin
    userKey: admin-user
    passwordKey: admin-password
  persistence:
    enabled: true
    storageClassName: local-path  # not Longhorn — no backup needed for dashboards
    size: 5Gi

prometheus:
  prometheusSpec:
    externalLabels:
      cluster: homelab
      env: prod
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: local-path
          resources:
            requests:
              storage: 20Gi

# disabled — these components don't exist in single-node k3s
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
kubeControllerManager:
  enabled: false
Why local-path instead of Longhorn? Prometheus and Grafana store metrics and dashboard configs — valuable but fully regenerable. Using local-path keeps Longhorn's backup job list clean and reduces Longhorn I/O. If the node is destroyed, metrics history is lost but dashboards can be reimported from JSON.

03 Grafana credentials

Grafana admin credentials are managed through Vault → ESO. The grafana-admin Kubernetes Secret is created by an ExternalSecret that pulls from secret/grafana-admin in Vault.

setting grafana credentials in Vault bash
$ vault kv put secret/grafana-admin \
    admin-user="admin" \
    admin-password="<strong-password>"
ArgoCD ignoreDifferences. The monitor Application has an ignoreDifferences block for the Grafana secret and checksum annotation. This prevents ArgoCD from trying to revert Grafana's auto-generated secret hash on every sync. Without it, the app would show a permanent diff.
last updated 2026-06-08 · view source on GitHub