Monitoring
Prometheus-compatible metrics stack with VictoriaMetrics, Grafana, Loki, and Fluent Bit.
This namespace deploys the full observability stack for the homelab cluster. It combines VictoriaMetrics for metrics storage, Grafana for dashboards, Loki for log aggregation, and Fluent Bit as the log collector DaemonSet. Self-hosting this stack avoids cloud observability costs while providing full access to all cluster telemetry.
Alternatives considered
Cloud Hosted
| Tool | Open Source | Free Tier | Monthly Cost |
|---|---|---|---|
| Grafana Cloud | Yes | Limited | From $19/mo |
| Datadog | No | No | From $15/host |
| New Relic | No | Limited | Pay-as-you-go |
Installation
Architecture
- HelmReleases:
victoria-stack(VictoriaMetrics operator, vmsingle, vmstack + Grafana),loki,fluent-bit - DaemonSet: Fluent Bit log collector on all nodes, mounts
/var/log - Additional:
grafana-to-ntfyDeployment proxies Grafana alerts to ntfy; OpenTelemetry Collector - Storage: Longhorn-encrypted PVCs for VictoriaMetrics and Loki; S3/MinIO for Loki chunk storage
- Networking: HTTPRoutes for Grafana, VictoriaMetrics UI, and Loki
Security
- Fluent Bit runs as
runAsUser: 0(requires node log access) grafana-to-ntfyruns asrunAsUser: 1000,runAsNonRoot: true,readOnlyRootFilesystem: true, capabilities dropped- All secrets SOPS-encrypted with age
Updates
Managed by Renovate. grafana-to-ntfy and otelcol images are digest-pinned.
Data Management
- PVCs: Longhorn-encrypted PVCs for VictoriaMetrics (
vmsingle,vlsingle) and Loki data - S3: MinIO / Loki chunk storage for long-term log retention
- Backups: No k8up schedule present. Data durability via Longhorn replication.
User Management
Grafana OIDC configured via GF_AUTH_GENERIC_OAUTH_* env vars from SOPS-encrypted secret. Users authenticated via the cluster's OIDC provider.
Configuration Management
- Helm chart values in ConfigMaps for victoria-stack, Loki, and Fluent Bit
- Grafana OIDC credentials, SMTP config, and ntfy auth from SOPS-encrypted secrets
ntfy-authsecret used bygrafana-to-ntfyfor push notification delivery
Administration
Usage
Access Grafana to view cluster dashboards, query metrics with PromQL/MetricsQL, and browse logs via Loki. Alerts configured in Grafana are forwarded to ntfy via the grafana-to-ntfy proxy service. Fluent Bit collects container logs from all nodes automatically.
Cluster-specific deviations from the above live in the per-cluster README — see k8s/apps/talos/monitoring/README.md.
Cluster Deployment
Monitoring — Talos cluster
Cluster-specific notes only. General product info, "why we use it", and alternatives live in docusaurus/docs/apps/monitoring.mdx.
Deviations from defaults
Defaults live in docusaurus/docs/apps/monitoring.mdx — document anything this cluster does differently here, with a one-line reason.
- HelmRelease:
fluent-bit@0.56.0 - HelmRelease:
loki@7.0.0 - HelmRelease:
victoria-metrics-k8s-stack@0.82.0 - HelmRepo:
fluent-bit(https://fluent.github.io/helm-charts) - HelmRepo:
loki(https://grafana.github.io/helm-charts) - HelmRepo:
victoria-metrics(https://victoriametrics.github.io/helm-charts/) - Image:
kittyandrew/grafana-to-ntfy:latest@sha256:e1386f61db297b37ba4a6a056dfc370e9bee16c0cf394a24b581bdf0cd859a3e
Rendered manifests (kustomize build)
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana-to-ntfy
name: grafana-to-ntfy
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana-to-ntfy
strategy:
rollingUpdate: null
type: Recreate
template:
metadata:
labels:
app: grafana-to-ntfy
spec:
containers:
- envFrom:
- secretRef:
name: grafana-to-ntfy
image: kittyandrew/grafana-to-ntfy:latest@sha256:e1386f61db297b37ba4a6a056dfc370e9bee16c0cf394a24b581bdf0cd859a3e
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: grafana-to-ntfy
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
terminationGracePeriodSeconds: 60