Cilium
Cilium is an eBPF-powered CNI that replaces both the in-tree pod networking and kube-proxy, and layers identity-aware network policy and observability on top. In this homelab it is the only CNI — there is no kube-proxy running, no overlay network, and no separate ingress controller for L4.
Why Cilium
A few specific properties make it the right fit here:
- kube-proxy replacement. One fewer DaemonSet, one fewer set of iptables rules to reason about. Service load balancing is done in eBPF on every node.
- Native routing. Pod traffic flows on the underlying L2 — no VXLAN/Geneve tunnel tax. The two clusters interconnect via NetBird at the host level (see the Fabric overview), so a tunnel inside a tunnel would be wasted overhead.
- WireGuard node-to-node encryption. Pod-to-pod traffic between nodes is encrypted without an extra mesh.
- L2 announcements.
LoadBalancerservices get IPs that are ARP-announced on the local network. No external LB controller, no MetalLB. - Hubble. Flow-level observability with no extra agent — useful when debugging "why can't this pod reach that service" without reaching for
tcpdumpon a node. - CiliumNetworkPolicy. Identity-aware policies that survive pod IP churn, plus DNS-based egress rules that don't fall apart the moment a CDN rotates IPs.
Alternatives considered
| Option | Why not |
|---|---|
| Calico | Mature CNI; eBPF dataplane exists but is less integrated than Cilium's |
| Flannel | Simple overlay; no policy without Calico-on-top |
| Canal | Flannel + Calico policy; redundant once you have Cilium |
In-tree kube-proxy + chosen CNI | Two systems where one would do |
Installation
The general shape of a Cilium deployment in this homelab:
- HelmRelease via Flux, with the chart pinned by major.minor and tracked by Renovate.
- Native routing mode — no VXLAN/Geneve. The pod CIDR is unique within the homelab so packets can route between clusters over the NetBird mesh without re-NATing.
- kube-proxy disabled at the OS layer — see the matching Talos
patch-disable-kube-proxy.yaml. Cilium then enableskubeProxyReplacement: truein Helm values. - WireGuard node-to-node encryption for pod-to-pod traffic between nodes.
- L2 announcements declared in
infrastructure/<cluster>/configs/cilium-*.yaml. The IP pool is on the public VLAN; the LB-IPAM allocator picks one when aServiceof typeLoadBalanceris created. - Hubble (relay + UI) enabled with a single replica. UI exposed internally only — never publicly routed.
- Operator replicas: 2 — survival of one replica during upgrades.
cilium-agentcapabilities —NET_ADMIN,NET_RAW,SYS_ADMIN,SYS_RESOURCE,IPC_LOCK, plus the file-permission set (CHOWN,KILL,DAC_OVERRIDE,FOWNER,SETGID,SETUID). These are needed to load eBPF programs and manipulate the network stack and aren't reducible. ThecleanCiliumStateinit container needs a subset.
Administration
- Status check.
cilium statusfrom the local CLI, orkubectl -n kube-system exec cilium-<pod> -- cilium statusfrom inside the cluster. First glance for "is the CNI healthy." - Connectivity probe.
cilium connectivity testruns a battery of pod-to-pod / pod-to-service / pod-to-external assertions. Useful after a kernel or CNI upgrade. - Endpoint introspection.
kubectl execinto the agent andcilium endpoint listshows every pod's identity, labels, and policy verdicts. The fastest way to figure out why a specific pod is being denied egress. - Hubble flow inspection.
hubble observe(CLI) or the Hubble UI for visual debugging. Filter by namespace, pod, verdict (policy_denied), or protocol. - Upgrades. Renovate keeps the chart version current; auto-merge applies to patch versions only per the Renovate policy. Minor / major bumps need a human read because Cilium's defaults occasionally shift in non-patch releases.
Usage
Cilium runs the cluster's pod networking, service load balancing, and policy enforcement. From an app's perspective most of this is transparent:
- Pod-to-pod / pod-to-service traffic "just works" — Cilium handles routing in eBPF.
NetworkPolicyandCiliumNetworkPolicydeclared in any namespace get enforced. Apps that pull in the app-network-policy component get a default-deny baseline that they extend per app.LoadBalancerservices get an IP from the configured Cilium LB-IPAM pool, announced over ARP on the underlying L2.- Hubble is the diagnosis tool when traffic isn't reaching where it should.
policy_deniedverdicts are the breadcrumb to a missing allow-rule in the originating namespace.
Cluster Deployment
- Talos
- Edge
Cilium — Talos cluster
Cluster-specific notes only. General product info, "why we use it", and alternatives live in docusaurus/docs/platform/cilium.mdx.
Deviations from defaults
Defaults live in docusaurus/docs/platform/cilium.mdx — document anything this cluster does differently here, with a one-line reason.
- HelmRelease:
cilium@1.19.4 - HelmRepo:
cilium(https://helm.cilium.io/)
Rendered manifests (kustomize build)
apiVersion: v1
data:
values.yaml: >
# yaml-language-server:
$schema=https://raw.githubusercontent.com/cilium/cilium/refs/heads/main/install/kubernetes/cilium/values.schema.json
ipv4NativeRoutingCIDR: "10.100.0.0/16"
autoDirectNodeRoutes: true
routingMode: native
#MTU: 1450
k8sServiceHost: "localhost"
k8sServicePort: "7445"
kubeProxyReplacement: true
encryption:
enabled: true
type: wireguard
ipam:
operator:
clusterPoolIPv4PodCIDRList: "10.100.0.0/16"
bpf:
masquerade: false
datapathMode: veth
bandwidthManager:
enabled: true
bbr: true
l2announcements:
enabled: true
envoy:
enabled: false
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
replicas: 1
operator:
replicas: 2
# for Talos
cgroup:
autoMount:
enabled: false
hostRoot: "/sys/fs/cgroup"
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
kind: ConfigMap
metadata:
name: cilium-values-h8fhk46dg6
namespace: kube-system
- HelmRelease:
cilium@1.19.4 - HelmRepo:
cilium(https://helm.cilium.io/)
Rendered manifests (kustomize build)
apiVersion: v1
data:
values.yaml: |
k8sServiceHost: "localhost"
k8sServicePort: "7445"
kubeProxyReplacement: true
encryption:
enabled: true
type: wireguard
ipam:
mode: kubernetes
operator:
clusterPoolIPv4PodCIDRList: "10.10.0.0/16"
bpf:
masquerade: false
datapathMode: veth
bandwidthManager:
enabled: true
bbr: true
envoy:
enabled: false
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
replicas: 1
operator:
replicas: 1
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
cgroup:
hostRoot: /sys/fs/cgroup
autoMount:
enabled: false
kind: ConfigMap
metadata:
name: cilium-values-hht8k76hb8
namespace: kube-system