Skip to main content

Cilium

Cilium is an eBPF-powered CNI that replaces both the in-tree pod networking and kube-proxy, and layers identity-aware network policy and observability on top. In this homelab it is the only CNI — there is no kube-proxy running, no overlay network, and no separate ingress controller for L4.

Why Cilium

A few specific properties make it the right fit here:

  • kube-proxy replacement. One fewer DaemonSet, one fewer set of iptables rules to reason about. Service load balancing is done in eBPF on every node.
  • Native routing. Pod traffic flows on the underlying L2 — no VXLAN/Geneve tunnel tax. The two clusters interconnect via NetBird at the host level (see the Fabric overview), so a tunnel inside a tunnel would be wasted overhead.
  • WireGuard node-to-node encryption. Pod-to-pod traffic between nodes is encrypted without an extra mesh.
  • L2 announcements. LoadBalancer services get IPs that are ARP-announced on the local network. No external LB controller, no MetalLB.
  • Hubble. Flow-level observability with no extra agent — useful when debugging "why can't this pod reach that service" without reaching for tcpdump on a node.
  • CiliumNetworkPolicy. Identity-aware policies that survive pod IP churn, plus DNS-based egress rules that don't fall apart the moment a CDN rotates IPs.

Alternatives considered

OptionWhy not
CalicoMature CNI; eBPF dataplane exists but is less integrated than Cilium's
FlannelSimple overlay; no policy without Calico-on-top
CanalFlannel + Calico policy; redundant once you have Cilium
In-tree kube-proxy + chosen CNITwo systems where one would do

Installation

The general shape of a Cilium deployment in this homelab:

  • HelmRelease via Flux, with the chart pinned by major.minor and tracked by Renovate.
  • Native routing mode — no VXLAN/Geneve. The pod CIDR is unique within the homelab so packets can route between clusters over the NetBird mesh without re-NATing.
  • kube-proxy disabled at the OS layer — see the matching Talos patch-disable-kube-proxy.yaml. Cilium then enables kubeProxyReplacement: true in Helm values.
  • WireGuard node-to-node encryption for pod-to-pod traffic between nodes.
  • L2 announcements declared in infrastructure/<cluster>/configs/cilium-*.yaml. The IP pool is on the public VLAN; the LB-IPAM allocator picks one when a Service of type LoadBalancer is created.
  • Hubble (relay + UI) enabled with a single replica. UI exposed internally only — never publicly routed.
  • Operator replicas: 2 — survival of one replica during upgrades.
  • cilium-agent capabilitiesNET_ADMIN, NET_RAW, SYS_ADMIN, SYS_RESOURCE, IPC_LOCK, plus the file-permission set (CHOWN, KILL, DAC_OVERRIDE, FOWNER, SETGID, SETUID). These are needed to load eBPF programs and manipulate the network stack and aren't reducible. The cleanCiliumState init container needs a subset.

Administration

  • Status check. cilium status from the local CLI, or kubectl -n kube-system exec cilium-<pod> -- cilium status from inside the cluster. First glance for "is the CNI healthy."
  • Connectivity probe. cilium connectivity test runs a battery of pod-to-pod / pod-to-service / pod-to-external assertions. Useful after a kernel or CNI upgrade.
  • Endpoint introspection. kubectl exec into the agent and cilium endpoint list shows every pod's identity, labels, and policy verdicts. The fastest way to figure out why a specific pod is being denied egress.
  • Hubble flow inspection. hubble observe (CLI) or the Hubble UI for visual debugging. Filter by namespace, pod, verdict (policy_denied), or protocol.
  • Upgrades. Renovate keeps the chart version current; auto-merge applies to patch versions only per the Renovate policy. Minor / major bumps need a human read because Cilium's defaults occasionally shift in non-patch releases.

Usage

Cilium runs the cluster's pod networking, service load balancing, and policy enforcement. From an app's perspective most of this is transparent:

  • Pod-to-pod / pod-to-service traffic "just works" — Cilium handles routing in eBPF.
  • NetworkPolicy and CiliumNetworkPolicy declared in any namespace get enforced. Apps that pull in the app-network-policy component get a default-deny baseline that they extend per app.
  • LoadBalancer services get an IP from the configured Cilium LB-IPAM pool, announced over ARP on the underlying L2.
  • Hubble is the diagnosis tool when traffic isn't reaching where it should. policy_denied verdicts are the breadcrumb to a missing allow-rule in the originating namespace.

Cluster Deployment

Cilium — Talos cluster

Cluster-specific notes only. General product info, "why we use it", and alternatives live in docusaurus/docs/platform/cilium.mdx.

Deviations from defaults

Defaults live in docusaurus/docs/platform/cilium.mdx — document anything this cluster does differently here, with a one-line reason.

Kubernetes Metadata
Rendered manifests (kustomize build)
apiVersion: v1
data:
values.yaml: >
# yaml-language-server:
$schema=https://raw.githubusercontent.com/cilium/cilium/refs/heads/main/install/kubernetes/cilium/values.schema.json


ipv4NativeRoutingCIDR: "10.100.0.0/16"

autoDirectNodeRoutes: true

routingMode: native


#MTU: 1450


k8sServiceHost: "localhost"

k8sServicePort: "7445"


kubeProxyReplacement: true


encryption:
enabled: true
type: wireguard

ipam:
operator:
clusterPoolIPv4PodCIDRList: "10.100.0.0/16"

bpf:
masquerade: false
datapathMode: veth

bandwidthManager:
enabled: true
bbr: true

l2announcements:
enabled: true

envoy:
enabled: false

hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
replicas: 1

operator:
replicas: 2

# for Talos

cgroup:
autoMount:
enabled: false
hostRoot: "/sys/fs/cgroup"

securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
kind: ConfigMap
metadata:
name: cilium-values-h8fhk46dg6
namespace: kube-system