Skip to main content

NetBird

NetBird is a managed WireGuard mesh that connects every environment in this homelab as if they shared a LAN. Three sites — edge, production, and home — register as peers; identity-based ACLs decide who can reach what.

The high-level wiring lives on the Fabric overview; this page is the deep-dive.

Why a mesh, not a hub-and-spoke VPN

Traditional site-to-site VPNs route everything through a central concentrator. That works, but:

  • The concentrator becomes a bottleneck and a single point of failure.
  • Every new site needs explicit tunnels to every other site (n² problem) or a star, which adds latency.
  • ACLs are usually IP-based; renumbering hurts.

NetBird is identity-based and full-mesh by default:

  • Each peer authenticates against an OIDC provider (here: Keycloak, via auth.kueber.eu).
  • Peers establish direct WireGuard tunnels when they can; relay servers cover the path-blocked cases.
  • Policies reference groups (administrators, production, edge_sidecar_envoy, …), not IPs — so renumbering a subnet doesn't break access rules.

Tofu-managed state

State for users, groups, tokens, and the shared DNS zone lives in the dedicated tofu/environment/netbird environment. Each per-site environment then consumes that state via data lookups, so you can never accidentally drift the shared identity layer from a per-site apply.

tofu/environment/
├── netbird/ ← owns: groups, users, tokens, DNS zone, cross-cutting policies
├── edge/ ← consumes netbird state, registers edge peers + edge resources
├── production/ ← consumes netbird state, registers production peers + resources
└── home/ ← consumes netbird state, registers home peers + resources

Per-environment service users (tofu_env_{netbird,production,home,edge}) authenticate to the NetBird API with one-year tokens, scoped to only their environment.

Networks, peers, and routers

ConceptWhat it is
PeerA device or workload running the NetBird agent
NetworkA logical bucket — typically one per site (edge, production, home)
ResourceA subnet exposed by a network (e.g. 192.168.100.0/24 in production)
Routing peerA peer that advertises a resource subnet — traffic from other peers to that subnet is funneled through it
GroupA label assigned to peers; ACL policies reference groups, not peers

For each site, one or more peers are designated as routing peers. They get a low metric so the mesh prefers them as the next hop into that site's subnets.

Where the routers run

SiteRouting peer(s)Notes
edgecontrol-plane-1 (Talos node)NetBird daemon installed alongside the cluster service
productionlxc-proxmox{1,2,3}-netbird LXCs on each Proxmox nodeSeparate LXCs to keep routing isolated from the cluster
homeOne reusable peer keyed with routing-peers-homeTypically a long-lived box on the management VLAN

The production routers are on the management VLAN (100), with extra NICs into VLAN 104 (storage) and 105 (public). They egress to the energy LAN (192.168.178.0/24) via their default route, so no extra interface is needed.

Cross-cutting groups and policies

GroupOwner envPurpose
administratorsnetbirdFull bidirectional access to every environment
sidecarsnetbirdWorkload sidecars that need overlay membership
PolicySourceDestinationNotes
Admin Edge Policyadministratorsedgetcp, bidirectional
Admin Production Policyadministratorsproductionall, bidirectional
Admin Home Policyadministratorshomeall, bidirectional
Sidecar Envoy Access Prod Publicedge_sidecar_envoyproduction resource publictcp, bidirectional

The last policy is what lets the edge Envoy gateway reach the public-VLAN workloads in the production cluster — this is the path that powers the edge → production traffic chain.

DNS — sys.kueber.eu

The zone is owned by the netbird environment; entries are added per-env via the modules/netbird/dns_record module.

SubdomainSource env
*.production.sys.kueber.eu (proxmox1/2/3, truenas)production
*.home.sys.kueber.eu (unifi-home, synology, home-assistant)home

Mesh peers can resolve those names directly; non-mesh clients can't (the zone isn't published to the public DNS).

Operational notes

  • Peer churn. A failed routing peer demotes the mesh's reach into that site to whatever fallback is available. The production site has 3 routers for that reason.
  • Token rotation. Service-user tokens are 1-year. Rotation is a tofu-side concern.
  • Adding a new site. Create a new tofu/environment/<site> consuming the netbird env's data sources, declare the network/resources/groups/policies, and register one or more routing peers. The pattern is the same for every site.
  • Adding a workload to the mesh. Run the agent, register it with a setup key scoped to the right group, and add a policy that lets it reach what it needs. No IP plumbing.