adding a 3rd hacluster unit frequently makes ha-relation-changed to loop on crm node list

Bug #1424048 reported by JuanJo Ciarlante on 2015-02-20
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack hacluster charm
High
Unassigned
hacluster (Juju Charms Collection)
High
Unassigned

Bug Description

FYI this happens when deploying HA openstack with 1501 release
charms, using hacluster in unicast mode. It's a staged deployment
where we 1st deploy all HA services with 2 units, relate them
(for OS service), and finally add the 3rd unit to all HA'd ones.

We're repeatedly seeing issues with hacluster not settling on the 3rd
unit (/2) - drilling down, found that the 3rd unit is carrying an
incomplete corosync.conf with "two_node: 1" and only 2 nodes there
for unicast, while the others are already running with the 3 nodes
setup: http://paste.ubuntu.com/10329322/

Then the charm loops on 'crm node list' which never settles, not even
a manual corosync,pacemaker kill + restart works, as /2 can't join
the 3-node cluster (as expected by the other units).
Manually copying corosync.conf from /0 into /2 and restarting
corosync+pacemaker works, it can then succeed on 'crm node list',
and join the cluster.

JuanJo Ciarlante (jjo) wrote :

FYI this looks like a race, as we have the same (repeated) deployment
sometimes failing on different hacluster subordinates (keystone,
cinder, glance).

tags: added: canonical-bootstack
Paul Gear (paulgear) wrote :

I've seen this several times during deploys. However, the loop on 'crm node list' is not a definitive indicator - there are some cases where pacemaker fails to start on a node which has a correct corosync.conf, and all that is needed is to restart pacemaker.

JuanJo Ciarlante (jjo) wrote :

FYI this has corosync_transport: unicast.

Changed our deployment sequence to deploy the 3 HA
units at once, then relate openstack services, and got
a different issue, on some of them (affected units
changed each time I redeployed , tried couple times):

$ juju run --timeout=10s --service=keystone 'sudo crm status 2>/dev/null|egrep Started:'
- Error: command timed out
  MachineId: 0/lxc/6
  Stdout: ""
  UnitId: keystone/0
- MachineId: 1/lxc/3
  Stdout: ' Started: [ juju-machine-1-lxc-3 juju-machine-2-lxc-3 ]

'
  UnitId: keystone/1
- MachineId: 2/lxc/3
  Stdout: ' Started: [ juju-machine-1-lxc-3 juju-machine-2-lxc-3 ]

'
  UnitId: keystone/2

Logging into the timed out unit shows pacemaker not started,
then hanode-relation-changed looping endlessly on failing
crm node list, after starting pacemaker there the hook could
complete ok: http://paste.ubuntu.com/10454923/

Billy Olsen (billy-olsen) wrote :

JuanJo or Paul, Do you have any juju logs or syslogs you could attach for analysis? sosreport works as well.

Billy Olsen (billy-olsen) wrote :

JuanJo have you set the cluster_count config-option in the hacluster charm? If not, when you deploy a 3 node cluster at once, try setting the cluster_count to 3 as well so that the charm is looking to see 3 nodes before it starts the clustering process.

James Page (james-page) wrote :

JuanJo

I think I've tracked this problem down to the following situation - I can repro on a 3 node cluster expansion:

1) charms deployed with hacluster with unicast/cluster_count=3 (which is right for an initial three node cluster)

bootstraps fine - cluster forms OK

2) juju add-unit <service>

Additional unit spins - corosync is unable to startup correctly, to crm node list just sits in the loop.

Looking at the debug output of corosync and the local corosync.conf, the new unit only has a partial node list of three nodes, one being itself, and this is confusing the votequorum function.

I tried with what appears to be success to override the expected_votes calculation based on the nodelist with an explicit configuration with votequourm (which was already done for multicast) and I can now expand the cluster reliably.

Changed in hacluster (Juju Charms Collection):
status: New → Confirmed
importance: Undecided → High
James Page (james-page) wrote :

Hmm - I think that this actually breaks depending on which unit is the current DC owner - the existing units have the new unit in their nodelist, so start to send data, and if the current owner is not in the list, then we get this situation.

Based on that, I still get breaks on new units joining the cluster.

James Page (james-page) wrote :

workaround for now is to increase the cluster_count configuration inline with the target cluster size prior to adding units.

James Page (james-page) on 2015-03-10
Changed in hacluster (Juju Charms Collection):
milestone: none → 15.04
James Page (james-page) on 2015-04-23
tags: added: openstack
Changed in hacluster (Juju Charms Collection):
milestone: 15.04 → 15.07
James Page (james-page) on 2015-08-10
Changed in hacluster (Juju Charms Collection):
milestone: 15.07 → 15.10
James Page (james-page) on 2015-10-22
Changed in hacluster (Juju Charms Collection):
milestone: 15.10 → 16.01
James Page (james-page) on 2016-01-28
Changed in hacluster (Juju Charms Collection):
milestone: 16.01 → 16.04
James Page (james-page) on 2016-04-22
Changed in hacluster (Juju Charms Collection):
milestone: 16.04 → 16.07
Liam Young (gnuoy) on 2016-07-29
Changed in hacluster (Juju Charms Collection):
milestone: 16.07 → 16.10
James Page (james-page) on 2016-10-14
Changed in hacluster (Juju Charms Collection):
milestone: 16.10 → 17.01
James Page (james-page) on 2017-02-23
Changed in charm-hacluster:
importance: Undecided → High
status: New → Confirmed
Changed in hacluster (Juju Charms Collection):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers