NetBird
NetBird is a managed WireGuard mesh that connects every environment in this homelab as if they shared a LAN. Three sites — edge, production, and home — register as peers; identity-based ACLs decide who can reach what.
The high-level wiring lives on the Fabric overview; this page is the deep-dive.
Why a mesh, not a hub-and-spoke VPN
Traditional site-to-site VPNs route everything through a central concentrator. That works, but:
- The concentrator becomes a bottleneck and a single point of failure.
- Every new site needs explicit tunnels to every other site (n² problem) or a star, which adds latency.
- ACLs are usually IP-based; renumbering hurts.
NetBird is identity-based and full-mesh by default:
- Each peer authenticates against an OIDC provider (here: Keycloak, via
auth.kueber.eu). - Peers establish direct WireGuard tunnels when they can; relay servers cover the path-blocked cases.
- Policies reference groups (
administrators,production,edge_sidecar_envoy, …), not IPs — so renumbering a subnet doesn't break access rules.
Tofu-managed state
State for users, groups, tokens, and the shared DNS zone lives in the dedicated tofu/environment/netbird environment. Each per-site environment then consumes that state via data lookups, so you can never accidentally drift the shared identity layer from a per-site apply.
tofu/environment/
├── netbird/ ← owns: groups, users, tokens, DNS zone, cross-cutting policies
├── edge/ ← consumes netbird state, registers edge peers + edge resources
├── production/ ← consumes netbird state, registers production peers + resources
└── home/ ← consumes netbird state, registers home peers + resources
Per-environment service users (tofu_env_{netbird,production,home,edge}) authenticate to the NetBird API with one-year tokens, scoped to only their environment.
Networks, peers, and routers
| Concept | What it is |
|---|---|
| Peer | A device or workload running the NetBird agent |
| Network | A logical bucket — typically one per site (edge, production, home) |
| Resource | A subnet exposed by a network (e.g. 192.168.100.0/24 in production) |
| Routing peer | A peer that advertises a resource subnet — traffic from other peers to that subnet is funneled through it |
| Group | A label assigned to peers; ACL policies reference groups, not peers |
For each site, one or more peers are designated as routing peers. They get a low metric so the mesh prefers them as the next hop into that site's subnets.
Where the routers run
| Site | Routing peer(s) | Notes |
|---|---|---|
| edge | control-plane-1 (Talos node) | NetBird daemon installed alongside the cluster service |
| production | lxc-proxmox{1,2,3}-netbird LXCs on each Proxmox node | Separate LXCs to keep routing isolated from the cluster |
| home | One reusable peer keyed with routing-peers-home | Typically a long-lived box on the management VLAN |
The production routers are on the management VLAN (100), with extra NICs into VLAN 104 (storage) and 105 (public). They egress to the energy LAN (192.168.178.0/24) via their default route, so no extra interface is needed.
Cross-cutting groups and policies
| Group | Owner env | Purpose |
|---|---|---|
administrators | netbird | Full bidirectional access to every environment |
sidecars | netbird | Workload sidecars that need overlay membership |
| Policy | Source | Destination | Notes |
|---|---|---|---|
Admin Edge Policy | administrators | edge | tcp, bidirectional |
Admin Production Policy | administrators | production | all, bidirectional |
Admin Home Policy | administrators | home | all, bidirectional |
Sidecar Envoy Access Prod Public | edge_sidecar_envoy | production resource public | tcp, bidirectional |
The last policy is what lets the edge Envoy gateway reach the public-VLAN workloads in the production cluster — this is the path that powers the edge → production traffic chain.
DNS — sys.kueber.eu
The zone is owned by the netbird environment; entries are added per-env via the modules/netbird/dns_record module.
| Subdomain | Source env |
|---|---|
*.production.sys.kueber.eu (proxmox1/2/3, truenas) | production |
*.home.sys.kueber.eu (unifi-home, synology, home-assistant) | home |
Mesh peers can resolve those names directly; non-mesh clients can't (the zone isn't published to the public DNS).
Operational notes
- Peer churn. A failed routing peer demotes the mesh's reach into that site to whatever fallback is available. The production site has 3 routers for that reason.
- Token rotation. Service-user tokens are 1-year. Rotation is a tofu-side concern.
- Adding a new site. Create a new
tofu/environment/<site>consuming the netbird env's data sources, declare the network/resources/groups/policies, and register one or more routing peers. The pattern is the same for every site. - Adding a workload to the mesh. Run the agent, register it with a setup key scoped to the right group, and add a policy that lets it reach what it needs. No IP plumbing.