reducing number of nodes related in hacluster does not remove the nodes from corosync/pacemaker

Bug #1821109 reported by Drew Freiberger
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
High
Unassigned

Bug Description

There seems to be an issue where removing units from a clustered application does not remove the leaving units from corosync/pacemaker.

I had three machines:
juju-32871d-12-lxd-4
juju-32871d-0-lxd-3
juju-32871d-1-lxd-4

I added 3 more (to migrate from lxd to kvm):

glance-purestorage-1
glance-purestorage-2
glance-purestorage-3

The cluster upticked properly to see all 6 nodes.

I then removed 2 nodes and they have left juju cleanly, but they still remain in crm config show and /etc/corosync/corosync.conf.

To reproduce:
juju deploy <some ha-consuming service> --num-units 3
juju deploy hacluster
juju config hacluster cluster_units=3
juju add-relation <haservice>:hacluster hacluster
juju-wait
check and dump juju ssh hacluster 'sudo crm status; sudo crm config show; sudo corosync-quorumtool'

juju add-unit <ha-consuming service>
juju add-unit <ha-consuming service>
juju add-unit <ha-consuming service>

check and dump juju ssh hacluster 'sudo crm status; sudo crm config show; sudo corosync-quorumtool'

juju remove-unit <ha-consuming service>/0
juju remove-unit <ha-consuming service>/1

check and dump juju ssh hacluster 'sudo crm status; sudo crm config show; sudo corosync-quorumtool'

Charms on this site are openstack-origin=cloud:xenial-pike and running cs:glance-275 and cs:hacluster-49

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Part of the workaround is on remaining nodes,

sudo crm_node -l |grep lost

for each lost node:

sudo crm_node -R <dead node>

This doesn't clean up the corosync.conf file, though.

It seems the charm doesn't handle hacluster-relation-departed hooks at all.

Ryan Beisner (1chb1n)
tags: added: scaleback
Ryan Beisner (1chb1n)
Changed in charm-hacluster:
importance: Undecided → High
Revision history for this message
Drew Freiberger (afreiberger) wrote :

After running the crm_node -R, you can then 'juju run --application <hacluster> hooks/config-changed' to trigger an update to corosync.conf file and running corosync status

To see status of cluster quorum, check corosync-quorumtool output on all hosts for Total votes: and Quorum: counts before removing additional units that would drop below quorum.

Ryan Beisner (1chb1n)
Changed in charm-hacluster:
milestone: none → 19.07
Changed in charm-hacluster:
status: New → Triaged
David Ames (thedac)
Changed in charm-hacluster:
milestone: 19.07 → 19.10
Revision history for this message
Trent Lloyd (lathiat) wrote :

Seems like this is a duplicate of this bug, can we confirm?
https://bugs.launchpad.net/charm-hacluster/+bug/1400481

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Confirmed, this is a duplicate of bug 1400481. This bug does list workaround information missing from that bug that may be useful for future travelers.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.