Non-leader units get stuck with idle/waiting for leadership status
Bug #1903313 reported by
Camille Rodriguez
This bug affects 4 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
CoreDNS Charm |
Fix Released
|
High
|
Unassigned | ||
MetalLB Operator |
Fix Released
|
High
|
Unassigned |
Bug Description
When deploying metallb in a multi-node setup, the juju status reports that the "leader" speaker is active/idle, but that the non-leader units are waiting for leadership.
This seems to be only a juju status bug, not affecting how the metallb-speakers are deployed in kubernetes. Since it is a daemonset, in the kubernetes status you will see that the speakers pods are deploying on each node.
However, this bug shows that multi-node setup, such as with a microk8s cluster, or a Charmed Kubernetes deployment, should be done more thoroughly. I am planning to run some more tests in a multi-node scenario to find out if other things might be broken.
Cheers!
Changed in operator-metallb: | |
importance: | Undecided → Medium |
status: | New → Triaged |
tags: | added: sts |
Changed in charm-coredns: | |
status: | Incomplete → New |
Changed in charm-coredns: | |
status: | Triaged → Fix Committed |
Changed in operator-metallb: | |
status: | Triaged → Fix Committed |
Changed in charm-coredns: | |
assignee: | nobody → Peter De Sousa (pjds) |
Changed in operator-metallb: | |
status: | Fix Committed → Triaged |
Changed in charm-coredns: | |
milestone: | none → 1.23+ck1 |
tags: | added: backport-needed |
Changed in charm-coredns: | |
assignee: | Peter De Sousa (pjds) → nobody |
Changed in charm-coredns: | |
milestone: | 1.23+ck1 → 1.24 |
Changed in charm-coredns: | |
status: | Fix Committed → Fix Released |
Changed in charm-coredns: | |
status: | Triaged → Fix Committed |
tags: | removed: backport-needed |
Changed in charm-coredns: | |
status: | Fix Committed → Fix Released |
Changed in operator-metallb: | |
milestone: | none → 1.25 |
status: | Triaged → Fix Committed |
Changed in operator-metallb: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
This is an issue with the current approach Juju takes for associating units with workload pods. In theory, a "unit" in Juju represents one of the pods of the workload. However, these "units" are only logical units; the charm is run in a single separate operator pod which then invoked in the context of each logic unit. However, the relationship between the charm and the workload via Juju is one-way: when in the "leader" context, it can set the pod spec, but in none of the unit contexts does it have any way (via Juju) to check the status of the workload pod which that unit context is associated with.
So, we could change the charm to report "active" for all unit contexts, but that still might not match the reality of the workload pod. But perhaps there are some non-Juju ways for the charm to query the status of the workload pod?
This won't apply to the upcoming sidecar approach, but I don't know much about how that will work.