Juju Charms Collection
hacluster package

Removing unit from hacluster doesn't properly remove node from corosync

Bug #1400481 reported by Billy Olsen on 2014-12-08

110

This bug affects 18 people

	Status	Importance	Assigned to	Milestone
OpenStack Charm Guide	Fix Released	High	Unassigned	OpenStack Charm Guide 21.04
OpenStack HA Cluster Charm	Fix Released	Critical	Felipe Reyes	OpenStack HA Cluster Charm 21.04
hacluster (Juju Charms Collection)	Invalid	Undecided	Unassigned

Bug Description

[Description]
The hacluster charm doesn't properly support the hanode-relation-departed hook. This is also indicated in the TODO list for the charm itself. This relationship is needed to be handled in order to set the appropriate quorum count.

When destroying or removing a service unit from the hacluster, the node remains as an offline node in the corosync status output. In order to fully remove the node, the corosync service should first be stopped on the node which is being removed, then removed from the cluster resource manager on one of the remaining nodes.

Note, when a unit is added after removing 1 or more units, the charm will appropriately adjust the nodelist or the expected_votes count to be the appropriate number of votes expected in a cluster.

[Impact]
The number of nodes required for quorum may be incorrect causing inability to form quorum in cluster for small number of nodes. Two node specialty case may not be enabled when number of nodes is 2.

[Test Case]
1. Deploy a service w/ 3 nodes which accepts the hacluster subordinate charm (e.g. keystone)
2. Related service and hacluster
3. Remove one of the service units (either juju destroy-unit or juju remove-unit)

Observe:
- /etc/corosync/corosync.conf still contains an incorrect nodelist (unicast) or expected_votes (multicast)
- two_node option is not specified in quorum section
- sudo crm status continues to report the removed unit as offline

Tags:

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2017-01-05:

Yes, can confirm. After following the Test Case above, crm status gives:

sudo crm status
Last updated: Thu Jan 5 13:46:18 2017 Last change: Thu Jan 5 12:55:37 2017 by hacluster via crmd on juju-0388cc-default-1
Stack: corosync
Current DC: juju-0388cc-default-1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 4 resources configured

Online: [ juju-0388cc-default-1 juju-0388cc-default-2 ]
OFFLINE: [ juju-0388cc-default-3 ]

Full list of resources:

Resource Group: grp_ks_vips
     res_ks_ens2_vip (ocf::heartbeat:IPaddr2): Started juju-0388cc-default-1
Clone Set: cl_ks_haproxy [res_ks_haproxy]
     Started: [ juju-0388cc-default-1 juju-0388cc-default-2 ]
     Stopped: [ juju-0388cc-default-3 ]

Changed in hacluster (Juju Charms Collection):
status:	New → Confirmed

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2017-01-05:

Note there is an additional problem with (at least) the 16.04 xenial, in that the 'parallax' module is required for 'crm cluster health' (or any cluster operation that involves ssh). This makes the crm blow up, which makes a fix a bit more awkward.

I'm exploring whether 'crm node delete' can be used instead.

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2017-01-06:

Just some thoughts on how to deal with it:

If the unit is removed using "juju remove-unit ..." or "juju destroy-machine ..." then, the other hacluster subordinates (attached to the other units) get a 'hanode-relation-departed' hook call with the relation data of {u'ready': u'True', u'private-address': u'10.5.9.224'}

i.e. they receive notification that the unit left when it was still active, and the IP address of the unit that left.

However, due to the async nature a "crm node delete ..." will hang until the unit has actually really been deleted and corosync/pacemaker notices. So it might be best to record the departing node during the departed hook in the kv() store, and then check the machine has gone and delete it during an update-status?

James Page (james-page) on 2017-02-23

Changed in charm-hacluster:
status:	New → Confirmed
Changed in hacluster (Juju Charms Collection):
status:	Confirmed → Invalid

Revision history for this message

James Page (james-page) wrote on 2017-10-02:

The stop hook makes an attempt:

@hooks.hook()
def stop():
    cmd = 'crm -w -F node delete %s' % socket.gethostname()
    pcmk.commit(cmd)
    apt_purge(['corosync', 'pacemaker'], fatal=True)

however I suspect that at this point in time, the knowledge of the other units in the cluster has already been removed from its configuration file so things won't actually work.

Revision history for this message

James Page (james-page) wrote on 2017-10-02:

-1 on mutations or side-effects during update-status.

Changed in charm-hacluster:
importance:	Undecided → Medium
status:	Confirmed → Triaged

Peter Sabaini (peter-sabaini) on 2017-10-03

tags:

added: canonical-bootstack

Tytus Kurek (tkurek) on 2018-01-04

tags:

added: 4010 cpe-onsite

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2019-10-18:

It was noted in duplicate bug 1806505 that this should be non-impacting, as the config files are updated and the only issue is the lingering nodes in the running config, however there is a use-case where this creates a critical outage for services.

In the use case where you have a 3 node application that is deployed and you have to deploy 3 new units and remove the original 3 (i.e. while migrating the entire application from metal to lxd, or lxd to kvm, or from old hardware to new hardware), you end up with a loss of quorum in the running corosync environment.

consider the quorum counts that happen as you add nodes:
3 node cluster, quorum min = 2
4 node cluster, quorum min = 3
5 node cluster, quorum min = 3
6 node cluster, quorum min = 4
remove 3 nodes from 6 node cluster and there's now no way to reach quorum without cleanup, and the VIP resource goes offline with crm showing cluster w/out quorum.

I believe the current workaround is to remove the dead nodes with 'crm node remove' and running config-changed hook on each hacluster unit to update running corosync.

More notes in the duplicate bug 1821109.

Revision history for this message

David Ames (thedac) wrote on 2019-10-18:

Raised the priority. Note the 19.10 milestone is a lie. We don't have the 20.01 milestone created yet.

Changed in charm-hacluster:
importance:	Medium → Critical
milestone:	none → 19.10

Ryan Beisner (1chb1n) on 2019-10-18

tags:

added: scaleback

Revision history for this message

Andrea Ieri (aieri) wrote on 2019-10-19:

As an extension to what Drew described, consider the case of a 3 unit percona cluster in which you have replaced one unit. Everything seems to be working fine, but you're actually sitting on a time bomb: as soon as a single unit fails, corosync quorum is lost, all the resources get stopped, and your DB stops responding. Even though quorum in a 3 node cluster should be 2 votes, it has been inflated to 3 in a not so evident way.
The above is also described in more detail in the proposed nrpe check in LP#1835418

David Ames (thedac) on 2019-10-24

Changed in charm-hacluster:
milestone:	19.10 → 20.01

James Page (james-page) on 2020-03-02

Changed in charm-hacluster:
milestone:	20.01 → 20.05

Revision history for this message

Chris Sanders (chris.sanders) wrote on 2020-04-10:

I'm subscribing ~field-high as this appears to have slipped several releases and this is going to impact a major re-architecture/migration event we have planned in the near future. If this is in 20.05 we'll be fine, if it's later it could be problematic.

David Ames (thedac) on 2020-05-21

Changed in charm-hacluster:
milestone:	20.05 → 20.08

Alvaro Uria (aluria) on 2020-07-01

Changed in charm-hacluster:
assignee:	nobody → Alvaro Uria (aluria)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-07-17: Fix proposed to charm-hacluster (master)

#10

Fix proposed to branch: master
Review: https://review.opendev.org/741592

Changed in charm-hacluster:
status:	Triaged → In Progress

James Page (james-page) on 2020-08-03

Changed in charm-hacluster:
milestone:	20.08 → none

Aurelien Lourot (aurelien-lourot) on 2020-08-29

Changed in charm-hacluster:
assignee:	Alvaro Uria (aluria) → Aurelien Lourot (aurelien-lourot)

Chris Johnston (cjohnston) on 2020-10-07

tags:

added: sts

Revision history for this message

Aurelien Lourot (aurelien-lourot) wrote on 2021-01-22:

#11

Not working actively on it anymore. The last state is https://review.opendev.org/741592 . This review is in good shape but there are two items left to be addressed.

Changed in charm-hacluster:
assignee:	Aurelien Lourot (aurelien-lourot) → nobody

Felipe Reyes (freyes) on 2021-01-28

Changed in charm-hacluster:
assignee:	nobody → Felipe Reyes (freyes)

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2021-02-01:

#12

I did recently watch this happen live.

hacluster properly removed the node from crm, then uninstalled pacemaker/corosync, but then they ended up re-installed....During the stop hook, it looks like it re-starts pacemaker with on-disk configs and that causes the node to re-join the cluster.

https://pastebin.canonical.com/p/yFc8BJKtDV/

Revision history for this message

Aurelien Lourot (aurelien-lourot) wrote on 2021-03-16:

#13

This has landed \o/ I'll now resurrect the release notes review [0] and move it forward.

[0] https://review.opendev.org/#/c/741626/

Changed in charm-hacluster:
status:	In Progress → Fix Committed
milestone:	none → 21.04
Changed in charm-guide:
status:	New → In Progress
importance:	Undecided → High
assignee:	nobody → Aurelien Lourot (aurelien-lourot)

Revision history for this message

Aurelien Lourot (aurelien-lourot) wrote on 2021-03-19:

#14

FYI this introduced regression lp:1920124

Aurelien Lourot (aurelien-lourot) on 2021-03-24

Changed in charm-guide:
status:	In Progress → Fix Committed
assignee:	Aurelien Lourot (aurelien-lourot) → nobody
milestone:	none → 21.04

Alex Kavanagh (ajkavanagh) on 2021-05-03

Changed in charm-hacluster:
status:	Fix Committed → Fix Released
Changed in charm-guide:
status:	Fix Committed → Fix Released

Revision history for this message

Trent Lloyd (lathiat) wrote on 2023-01-24:

#15

Ran into a related issue to this and want to note the workaround for future travellers.

After pausing 1 unit all services went down, all VIP/haproxy were showing as Stopped in "crm status". In syslog we can see we had no quorum

[syslog]
pacemaker-schedulerd[PID]: warning: Fencing and resource management disabled due to lack of quorum
pacemaker-schedulerd[PID]: notice: * Start res_neutron_xxxxxx_vip ( hostname1 ) due to no quorum (blocked)

Pacemaker including "crm status", "crm configure show" and "crm_node -l" all showed 3 nodes as expected. 2 online, 1 offline. However we had no quorum.

The missing node only shows up in corosync status commands. corosync.conf was corrected and syncronised on all 3 nodes.

The solution was simply to run "corosync-cfgtool -R" to reload the configuration. It updated the ring from the config file and then quorum was acheived and the services started.

There have been a couple of fixes this 'update-ring' action and a newer 'delete-node-from-ring' but as we had 1 node down it wasn't clear if those would work correctly so we tried corosync-cfgtool which worked.

So seems sometimes we simply miss a reload somewhere.

From a sosreport we saw:

[sos_commands/corosync/corosync-quorumtool_-s]
Quorum information
------------------
Nodes: 2

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 2
Quorum: 3 Activity blocked

[sos_commands/corosync/corosync-cmapctl]
(this node no longer existed)
nodelist.node.0.nodeid (u32) = 1000
nodelist.node.0.ring0_addr (str) = X.X.X.1

(these are the correct nodes showing everywhere else)
nodelist.node.1.nodeid (u32) = 1001
nodelist.node.1.ring0_addr (str) = X.X.X.2
nodelist.node.2.nodeid (u32) = 1003
nodelist.node.2.ring0_addr (str) = X.X.X.3
nodelist.node.3.nodeid (u32) = 1002
nodelist.node.3.ring0_addr (str) = X.X.X.4

(the bad node 1000 also listed)
runtime.members.1000.config_version (u64) = 0
runtime.members.1000.ip (str) = r(0) ip(10.101.223.105)
runtime.members.1000.join_count (u32) = 1
runtime.members.1000.status (str) = left

runtime.votequorum.ev_barrier (u32) = 4
runtime.votequorum.highest_node_id (u32) = 1003
runtime.votequorum.lowest_node_id (u32) = 1001
runtime.votequorum.this_node_id (u32) = 1002
runtime.votequorum.two_node (u8) = 0

Ran into a related issue to this and want to note the workaround for future travellers.

After pausing 1 unit all services went down, all VIP/haproxy were showing as Stopped in "crm status". In syslog we can see we had no quorum

[syslog]
pacemaker-schedulerd[PID]:  warning: Fencing and resource management disabled due to lack of quorum
pacemaker-schedulerd[PID]:  notice:  * Start      res_neutron_xxxxxx_vip     (   hostname1 )   due to no quorum (blocked)

Pacemaker including "crm status", "crm configure show" and "crm_node -l" all showed 3 nodes as expected. 2 online, 1 offline. However we had no quorum.

The missing node only shows up in corosync status commands. corosync.conf was corrected and syncronised on all 3 nodes.

The solution was simply to run "corosync-cfgtool -R" to reload the configuration. It updated the ring from the config file and then quorum was acheived and the services started.

So seems sometimes we simply miss a reload somewhere.

From a sosreport we saw:

[sos_commands/corosync/corosync-quorumtool_-s]
Quorum information
------------------
Nodes:            2

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      2
Quorum:           3 Activity blocked

[sos_commands/corosync/corosync-cmapctl]
(this node no longer existed)
nodelist.node.0.nodeid (u32) = 1000
nodelist.node.0.ring0_addr (str) = X.X.X.1

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionhacluster package

Removing unit from hacluster doesn't properly remove node from corosync

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Juju Charms Collection
hacluster package