Monitoring

Prometheus-compatible metrics stack with VictoriaMetrics, Grafana, Loki, and Fluent Bit.

This namespace deploys the full observability stack for the homelab cluster. It combines VictoriaMetrics for metrics storage, Grafana for dashboards, Loki for log aggregation, and Fluent Bit as the log collector DaemonSet. Self-hosting this stack avoids cloud observability costs while providing full access to all cluster telemetry.

Alternatives considered

Cloud Hosted

Tool	Open Source	Free Tier	Monthly Cost
Grafana Cloud	Yes	Limited	From $19/mo
Datadog	No	No	From $15/host
New Relic	No	Limited	Pay-as-you-go

Installation

Architecture

HelmReleases: victoria-stack (VictoriaMetrics operator, vmsingle, vmstack + Grafana), loki, fluent-bit
DaemonSet: Fluent Bit log collector on all nodes, mounts /var/log
Additional: grafana-to-ntfy Deployment proxies Grafana alerts to ntfy; OpenTelemetry Collector
Storage: Longhorn-encrypted PVCs for VictoriaMetrics and Loki; S3/MinIO for Loki chunk storage
Networking: HTTPRoutes for Grafana, VictoriaMetrics UI, and Loki

Security

Fluent Bit runs as runAsUser: 0 (requires node log access)
grafana-to-ntfy runs as runAsUser: 1000, runAsNonRoot: true, readOnlyRootFilesystem: true, capabilities dropped
All secrets SOPS-encrypted with age

Updates

Managed by Renovate. grafana-to-ntfy and otelcol images are digest-pinned.

Data Management

PVCs: Longhorn-encrypted PVCs for VictoriaMetrics (vmsingle, vlsingle) and Loki data
S3: MinIO / Loki chunk storage for long-term log retention
Backups: No k8up schedule present. Data durability via Longhorn replication.

User Management

Grafana OIDC configured via GF_AUTH_GENERIC_OAUTH_* env vars from SOPS-encrypted secret. Users authenticated via the cluster's OIDC provider.

Configuration Management

Helm chart values in ConfigMaps for victoria-stack, Loki, and Fluent Bit
Grafana OIDC credentials, SMTP config, and ntfy auth from SOPS-encrypted secrets
ntfy-auth secret used by grafana-to-ntfy for push notification delivery

Administration

Usage

Access Grafana to view cluster dashboards, query metrics with PromQL/MetricsQL, and browse logs via Loki. Alerts configured in Grafana are forwarded to ntfy via the grafana-to-ntfy proxy service. Fluent Bit collects container logs from all nodes automatically.

Cluster-specific deviations from the above live in the per-cluster README — see k8s/apps/talos/monitoring/README.md.

Cluster Deployment

App URLs

Depends on

Envoy Gateway

Monitoring — Talos cluster

Cluster-specific notes only. General product info, "why we use it", and alternatives live in docusaurus/docs/apps/monitoring.mdx.

Deviations from defaults

Defaults live in docusaurus/docs/apps/monitoring.mdx — document anything this cluster does differently here, with a one-line reason.

Kubernetes Metadata

HelmRelease: fluent-bit@0.56.0
HelmRelease: loki@7.0.0
HelmRelease: victoria-metrics-k8s-stack@0.82.0
HelmRepo: fluent-bit (https://fluent.github.io/helm-charts)
HelmRepo: loki (https://grafana.github.io/helm-charts)
HelmRepo: victoria-metrics (https://victoriametrics.github.io/helm-charts/)
Image: kittyandrew/grafana-to-ntfy:latest@sha256:e1386f61db297b37ba4a6a056dfc370e9bee16c0cf394a24b581bdf0cd859a3e

Rendered manifests (kustomize build)

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: grafana-to-ntfy
  name: grafana-to-ntfy
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-to-ntfy
  strategy:
    rollingUpdate: null
    type: Recreate
  template:
    metadata:
      labels:
        app: grafana-to-ntfy
    spec:
      containers:
        - envFrom:
            - secretRef:
                name: grafana-to-ntfy
          image: kittyandrew/grafana-to-ntfy:latest@sha256:e1386f61db297b37ba4a6a056dfc370e9bee16c0cf394a24b581bdf0cd859a3e
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: grafana-to-ntfy
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 5
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
      terminationGracePeriodSeconds: 60

Alternatives considered​

Installation​

Architecture​

Security​

Updates​

Data Management​

User Management​

Configuration Management​

Administration​

Usage​

Cluster Deployment​

Deviations from defaults​