Leadership change not resulting in appropriate leader-set updates
Bug #1925085 reported by
Paul Goins
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Etcd Charm |
Triaged
|
Medium
|
Unassigned |
Bug Description
Hi,
I recently had to redeploy the host of an etcd unit. I didn't pay close attention, but am suspecting it was the leader unit, as the leader_address retrieved by leader-get no longer reflected that of any active etcd unit.
Upon redeployment of the etcd unit that I removed, it tried to cluster up against the incorrect address it pulled via leader-get's leader_address value, and thus failed to cluster appropriately. I had to manually run leader-set to update it to point at the IP of the active leader, after which clustering worked.
Also, I suspect that the "cluster" field retrieved via leader-get also needs to be updated; it also refers to a non-existent node in my case.
To post a comment you must log in.
I can reproduce this in Charmed Kubernetes 1.21+ck1 (bundle rev 655). Juju 2.9.0. It happens when leadership is handed back to a unit that used to be a leader in the past.
Steps to reproduce:
1) `juju deploy cs:charmed- kubernetes`
2) Wait for deployment to settle
3) Force leadership to change by stopping jujud, e.g.
`juju ssh etcd/0 -- sudo systemctl stop jujud-machine-1`
Wait for Juju to assign a new leader unit. Repeat until all etcd units have been assigned leadership at least once.
4) Re-enable jujud, e.g.
`juju ssh etcd/0 -- sudo systemctl start jujud-machine-1`
`
Repeat for all units as necessary.
5) `juju remove-unit etcd/leader`
6) `juju add-unit etcd`
7) Wait for new unit to get stuck with "Waiting to retry etcd registration" status.
8) Check leader data:
`juju run --unit etcd/3 -- leader-get`