Summary
A homelab Kubernetes cluster running main-amd64 (current main HEAD 4d98156, with the Apr 6-12 PR series merged: dual-stack-preservation, service-suffix-annotation, go-1-26 bump) has been observing two compounding problems:
- OOMKill crash loop on the chart's default 128Mi memory limit — pod restarts every ~5 minutes, accumulated 387 restarts on a single pod over 4 days. Bumping
resources.limits.memory to 512Mi resolves the OOM (steady-state usage is ~80 MiB).
- Steady-state reconcile loop — even with the OOM fixed, the operator writes:
dynamicprefixes.dynamic-prefix.io PUT at 3.6/s
ciliumloadbalancerippools.cilium.io PUT at 2.3/s
ciliumloadbalancerippools.cilium.io PATCH at 1.2/s
- Total ~7.1 writes/sec sustained.
This sustained churn drove a kube-apiserver heap allocation of ~1.5 GB in DeepCopyJSONValue/schemaCoercingConverter per apiserver, causing a +3 GB cluster-wide memory step that took 4 days to root-cause.
The reconcile loop appears to be self-perpetuating; the last known-good release is v0.0.2 (2026-03-14), pre-merge of #9, #10, #13. Pinning to v0.0.2 eliminates the write storm.
Reproduction
- Operator running with:
image.tag: main-amd64, pullPolicy: Always
- Workload: ~100 cluster-wide Services (mix of LoadBalancer + ClusterIP), 3
DynamicPrefix CRs, 3 CiliumLoadBalancerIPPools, 2 referenced by each DynamicPrefix
- Cluster: 3-node Talos / Kubernetes 1.34, Cilium 1.19.x
Evidence
Reconcile loop pattern (operator startup logs)
INFO Pool synced successfully pool=dmz-ipv6-network blockCount=3
INFO DynamicPrefix changed, enqueuing referencing pools dynamicPrefix=dmz-ipv6 poolCount=2
INFO Syncing pool pool=dmz-ipv6-dynamic
INFO Pool synced successfully pool=dmz-ipv6-dynamic blockCount=3
INFO Syncing pool pool=dmz-ipv6-subnet
INFO Pool synced successfully pool=dmz-ipv6-subnet blockCount=3
INFO DynamicPrefix changed, enqueuing referencing pools dynamicPrefix=dmz-ipv6-subnet poolCount=1
INFO cilium.io/v2alpha1 CiliumBGPAdvertisement is deprecated; use cilium.io/v2 CiliumBGPAdvertisement
INFO Reconciling BGP advertisements dynamicPrefix=dmz-ipv6
After steady-state is reached, "Pool synced successfully" → "DynamicPrefix changed, enqueuing referencing pools" → "Syncing pool" cycles back without observable workload changes.
Apiserver write rate (kube-apiserver_request_total)
3.600/s PUT dynamicprefixes.dynamic-prefix.io
2.277/s PUT ciliumloadbalancerippools.cilium.io
1.200/s PATCH ciliumloadbalancerippools.cilium.io
These are sustained for hours with no underlying change to Services, the operator's own pod, or the upstream RA stream.
Apiserver heap profile
go tool pprof on a freshly-restarted apiserver after 25 minutes:
flat% func
54.51% k8s.io/apimachinery/pkg/runtime.DeepCopyJSONValue ← 1.21 GB
9.37% DeepCopyJSONValue (different line)
5.32% sigs.k8s.io/json/.../unquote
Cumulative call chain:
etcd3.watchChan.serialProcessEvents (1.68 GB)
→ schemaCoercingConverter.ConvertToVersion (1.50 GB)
→ unstructured.DeepCopyObject → DeepCopyJSON → DeepCopyJSONValue (1.48 GB)
The dynamicprefixes and ciliumloadbalancerippools CRDs feed into this allocation path on each WATCH event.
Hypothesised root cause (please verify in source)
The chain looks like:
poolsync controller writes to a CiliumLoadBalancerIPPool (status, blockCount, or annotation).
- The pool watch fires.
- The dynamicprefix controller (or poolsync's reconciler) sees the pool change and re-enqueues the referencing
DynamicPrefix.
DynamicPrefix reconciliation runs, calls enqueuing referencing pools, leading back to (1).
Common fixes for this pattern:
equality.Semantic.DeepEqual of the desired vs actual CiliumLoadBalancerIPPool.spec (or .status if writing status) before issuing the Patch / Update. Don't write if no field actually changed.
controllerutil.CreateOrPatch with an idempotent mutator function so identical desired state → empty Patch → no apiserver write.
- Filter the watch on
CiliumLoadBalancerIPPool to ignore events where metadata.managedFields shows the change came from this controller's manager name (avoid self-trigger).
- Add a
predicate.GenerationChangedPredicate on the CiliumLoadBalancerIPPool source so spec-only changes drive reconciles, not status-only ones (assuming the pool has status sub-resource).
Mitigations applied downstream
- Bumped the chart's default
resources.limits.memory from 128Mi to 512Mi in our values override (the OOM fix).
- Pinned
image.tag to v0.0.2 and pullPolicy: IfNotPresent — the last release before the post-merge build.
For others using the chart at default resource limits with a busy cluster: the 128Mi default may be insufficient for the post-merge build. Consider raising the chart default to 256Mi or 512Mi.
Suggested next steps
Happy to help test fixes — we can roll a candidate image into our cluster as a regression check.
Summary
A homelab Kubernetes cluster running
main-amd64(currentmainHEAD4d98156, with the Apr 6-12 PR series merged: dual-stack-preservation, service-suffix-annotation, go-1-26 bump) has been observing two compounding problems:resources.limits.memoryto 512Mi resolves the OOM (steady-state usage is ~80 MiB).dynamicprefixes.dynamic-prefix.ioPUT at 3.6/sciliumloadbalancerippools.cilium.ioPUT at 2.3/sciliumloadbalancerippools.cilium.ioPATCH at 1.2/sThis sustained churn drove a kube-apiserver heap allocation of ~1.5 GB in
DeepCopyJSONValue/schemaCoercingConverterper apiserver, causing a +3 GB cluster-wide memory step that took 4 days to root-cause.The reconcile loop appears to be self-perpetuating; the last known-good release is v0.0.2 (2026-03-14), pre-merge of #9, #10, #13. Pinning to
v0.0.2eliminates the write storm.Reproduction
image.tag: main-amd64,pullPolicy: AlwaysDynamicPrefixCRs, 3CiliumLoadBalancerIPPools, 2 referenced by eachDynamicPrefixEvidence
Reconcile loop pattern (operator startup logs)
After steady-state is reached, "Pool synced successfully" → "DynamicPrefix changed, enqueuing referencing pools" → "Syncing pool" cycles back without observable workload changes.
Apiserver write rate (kube-apiserver_request_total)
These are sustained for hours with no underlying change to Services, the operator's own pod, or the upstream RA stream.
Apiserver heap profile
go tool pprofon a freshly-restarted apiserver after 25 minutes:Cumulative call chain:
The
dynamicprefixesandciliumloadbalancerippoolsCRDs feed into this allocation path on each WATCH event.Hypothesised root cause (please verify in source)
The chain looks like:
poolsynccontroller writes to aCiliumLoadBalancerIPPool(status, blockCount, or annotation).DynamicPrefix.DynamicPrefixreconciliation runs, callsenqueuing referencing pools, leading back to (1).Common fixes for this pattern:
equality.Semantic.DeepEqualof the desired vs actualCiliumLoadBalancerIPPool.spec(or.statusif writing status) before issuing the Patch / Update. Don't write if no field actually changed.controllerutil.CreateOrPatchwith an idempotent mutator function so identical desired state → empty Patch → no apiserver write.CiliumLoadBalancerIPPoolto ignore events wheremetadata.managedFieldsshows the change came from this controller's manager name (avoid self-trigger).predicate.GenerationChangedPredicateon theCiliumLoadBalancerIPPoolsource so spec-only changes drive reconciles, not status-only ones (assuming the pool has status sub-resource).Mitigations applied downstream
resources.limits.memoryfrom 128Mi to 512Mi in our values override (the OOM fix).image.tagtov0.0.2andpullPolicy: IfNotPresent— the last release before the post-merge build.For others using the chart at default resource limits with a busy cluster: the 128Mi default may be insufficient for the post-merge build. Consider raising the chart default to 256Mi or 512Mi.
Suggested next steps
equality.Semantic.DeepEqualguards inpoolsyncreconciler before writing pool spec/statusdynamicprefixreconciler before writing back to DynamicPrefixresources.limits.memoryto 256Mi or 512MiHappy to help test fixes — we can roll a candidate image into our cluster as a regression check.