cluster Unhealthy after replacing a unit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Etcd Charm |
New
|
Undecided
|
Unassigned |
Bug Description
ubuntu@juju:~$ juju status etcd
Model Controller Cloud/Region Version SLA Timestamp
openstack maas-controller1 maas-cloud/default 3.2.4 unsupported 10:35:09Z
App Version Status Scale Charm Channel Rev Exposed Message
etcd 3.4.22 blocked 3 etcd stable 760 no UnHealthy with 3 known peers
Unit Workload Agent Machine Public address Ports Message
etcd/7* blocked idle 7/lxd/26 192.168.70.93 2379/tcp UnHealthy with 3 known peers
etcd/8 blocked idle 8/lxd/27 192.168.70.92 2379/tcp UnHealthy with 3 known peers
etcd/10 waiting idle 3/lxd/47 192.168.70.155 Waiting to retry etcd registration
Machine State Address Inst id Base AZ Message
3 started 192.168.6.113 op4 ubuntu@22.04 default Deployed
3/lxd/47 started 192.168.70.155 juju-206dbb-
7 started 192.168.6.101 xen01 ubuntu@22.04 xensrv Deployed
7/lxd/26 started 192.168.70.93 juju-206dbb-
8 started 192.168.6.110 op1 ubuntu@22.04 default Deployed
8/lxd/27 started 192.168.70.92 juju-206dbb-
I removed an unit using --force (without it didn't work) and then added again one unit (etcd/10).
looking into the new unit (etcd/10) log:
ubuntu@juju:~$ juju debug-log --include etcd/10
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:30 ERROR unit.etcd/
unit-etcd-10: 10:28:30 ERROR unit.etcd/
unit-etcd-10: 10:28:30 ERROR unit.etcd/
unit-etcd-10: 10:28:30 ERROR unit.etcd/
unit-etcd-10: 10:28:30 WARNING unit.etcd/
unit-etcd-10: 10:28:30 INFO unit.etcd/
unit-etcd-10: 10:28:30 INFO unit.etcd/
unit-etcd-10: 10:28:30 INFO unit.etcd/
unit-etcd-10: 10:28:31 INFO juju.worker.
unit-etcd-10: 10:28:27 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
unit-etcd-10: 10:28:28 INFO unit.etcd/
It seems to me that the removed unit remains int the cluster (client: endpoint https:/
How to recover?