Corosync - 100% CPU consumption - HA not working

Bug #1379484 reported by Jorge Niedbalski
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
hacluster (Juju Charms Collection)
Invalid
Medium
Unassigned

Bug Description

It was brought to my attention the following scenario:
------------

One keystone HA deployment has the following errors:

Aug 18 07:54:24 r1-keystone pengine[1232]: warning: stage6: Node r2-keystone-bk is unclean!
Aug 18 07:54:24 r1-keystone pengine[1232]: warning: stage6: YOUR RESOURCES ARE NOW LIKELY COMPROMISED
Aug 18 07:54:24 r1-keystone pengine[1232]: error: stage6: ENABLE STONITH TO KEEP YOUR RESOURCES SAFE
Aug 18 07:54:24 r1-keystone pengine[1232]: error: process_pe_message: Calculated Transition 2315: /var/lib/pacemaker/pengine/pe-error-0.bz2

Keystone cluster CIB :

<cib epoch="19" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" cib-last-written="Tue Aug 12 08:23:19 2014" update-origin="r2-horizon" update-client="crmd" have-quorum="1">

<cib epoch="19" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" cib-last-written="Fri Jul 25 06:02:30 2014" update-origin="r2-horizon" update-client="crmd" have-quorum="1" dc-uuid="169738387">

Has never being synchronized.

------------
What is the deployment recommendation for keystone to be configured together with pacemaker + corosync ?

Tags: openstack cts ha
tags: added: cts
tags: added: ha openstack
Revision history for this message
James Page (james-page) wrote :

This looks like you may have a conflicting cluster configuration within your deployment; specifically two services (keystone and horizon) using the same multicast port maybe?

James Page (james-page)
Changed in keystone (Juju Charms Collection):
status: New → Incomplete
Revision history for this message
Juan L. Negron (negronjl) wrote :

Hi Jorge:

To clarify things here, can you paste your configuration?

-Juan

Revision history for this message
Ante Karamatić (ivoks) wrote :

IMO, this was caused by bad corosync configuration and should be fixed in /next hacluster charm.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Possibly a symptom of https://bugs.launchpad.net/charms/+source/keystone/+bug/1403132. If so, ensuring a different mcastaddr is used for each cluster should fix this.

Revision history for this message
James Page (james-page) wrote :

I spent time before christmas re-testing the various deployment options for the hacluster charm; I specifically saw this issue when deploying services in LXC containers under the MAAS provider; it would appear that the various bridges and network interfaces involved in connecting a container to the outside world are a bit flaky from a multicast perspective.

My recommendation is that you updated to the latest version of the hacluster charm, and then switch to using unicast; you'll most likely need todo the following if corosync is spinning at 100% CPU time:

   juju run --service keystone "sudo pkill -9 corosync"
   juju upgrade-charm hacluster-keystone && juju set hacluster-keystone corosync_transport=unicast

After a short while, your cluster should be reconfigured to use unicast, and all services should be back up and running. You'll also need to tidy up the old node records (the unicast ones will start at 1001 onwards - don't delete those):

   sudo crm node list (for the ID's
   sudo crm configure
      delete <id>

this can be done from one of the nodes in the cluster.

Changed in keystone (Juju Charms Collection):
status: Incomplete → Triaged
importance: Undecided → High
James Page (james-page)
summary: - Keystone HA - Corosync - 100% CPU consumption - HA not working
+ Corosync - 100% CPU consumption - HA not working
Revision history for this message
Paul Gear (paulgear) wrote :

I'm seeing this with unicast transport on hacluster charm r41. I'm currently trying to get sufficient logging out of corosync to diagnose the issue.

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Quick note: We are seeing this behavior even by enabling unicast transport on LXC containers.

Revision history for this message
Xiang Hui (xianghui) wrote :
Download full text (3.3 KiB)

We have reproduced this issue with unicast , but specially with maas+lxc so far, not happened with local+lxc.
Deploy three keystone nodes with hacluster by setting 'corosync_transport=udpu'
unit hostname id ip
keystone/0 juju-machine-1-lxc-0 1002 192.168.100.67
keystone/1 juju-machine-4-lxc-0 1001 192.168.100.69
keystone/2 juju-machine-5-lxc-0 1000 192.168.100.71

After adding keystone hacluster relation and wait for a while, split brain happened, keystone/1/2 is a cluster, keystone/0 single.

keystone/1/2:
root@juju-machine-5-lxc-0:~# crm status
Online: [ juju-machine-4-lxc-0 juju-machine-5-lxc-0 ]

 Resource Group: grp_ks_vips
     res_ks_eth0_vip (ocf::heartbeat:IPaddr2): Started juju-machine-4-lxc-0
 Clone Set: cl_ks_haproxy [res_ks_haproxy]
     Started: [ juju-machine-4-lxc-0 juju-machine-5-lxc-0 ]

root@juju-machine-5-lxc-0:~# corosync-quorumtool -l
Membership information
----------------------
    Nodeid Votes Name
      1001 1 192.168.100.69
      1000 1 192.168.100.71 (local)

keystone/0:
root@juju-machine-1-lxc-0:~# crm status
Last updated: Fri Feb 6 08:18:26 2015
Last change: Fri Feb 6 07:17:28 2015 via crmd on juju-machine-1-lxc-0
Stack: corosync
Current DC: juju-machine-1-lxc-0 (1002) - partition WITHOUT quorum
Version: 1.1.10-42f2063
1 Nodes configured
2 Resources configured

Online: [ juju-machine-1-lxc-0 ]

 Resource Group: grp_ks_vips
     res_ks_eth0_vip (ocf::heartbeat:IPaddr2): Started juju-machine-1-lxc-0
 Clone Set: cl_ks_haproxy [res_ks_haproxy]
     Started: [ juju-machine-1-lxc-0 ]

root@juju-machine-1-lxc-0:~# corosync-quorumtool -l
(hang forever ....)

So turn to keystone/0 node and checking cpu, we find it's 100% used by corosync.
asks: 42 total, 2 running, 40 sleeping, 0 stopped, 0 zombie
%Cpu(s):100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 1017896 total, 932296 used, 85600 free, 19148 buffers
KiB Swap: 1770492 total, 5572 used, 1764920 free. 409312 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18637 root 20 0 704252 199272 34016 R 99.9 19.6 44:40.43 corosync

From netstat output, one interesting finding is the Recv-Q size has a value 320256, which is higher than normal, one explanation from Internet is the Recv-Q getting hung at some point of time increased cpu usage of the corosync process which uses the socket, not sure whether we should bump the Recv-Q buffer, or it's related the linux bridge juju-br0 interface some kind of bottleneck.
And after simply doing pkill -9 corosync and restart corosync/pacemaker, the whole cluster are back normal.

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 320256 0 192.168.100.67:5434 0.0.0.0:* ...

Read more...

Revision history for this message
Xiang Hui (xianghui) wrote :

Above attach ment syslog.100cpu is when split brain shows up, syslog on keystone/0
syslog.after-restart-corosync is after restart corosync, everything back normal on keystone/0

Revision history for this message
Matt Rae (mattrae) wrote :

Noting that this bug happens with all charms which can use the hacluster subordinate charm, not just keystone.

Changed in hacluster (Juju Charms Collection):
status: New → Triaged
importance: Undecided → Critical
importance: Critical → High
no longer affects: keystone (Juju Charms Collection)
Revision history for this message
James Page (james-page) wrote :

I'd recommend setting the cluster_count configuration to 3 - this will ensure that any isolated node without quorum will shutdown its services, rather than assuming that its the only one left.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Spent awhile trying to recreate this tonight and while I didn't recreate the split brain, but I did recreate corosync going into a 100% cpu spin. In this scenario, corosync locked up early and pacemaker was unable to communicate with the corosync and crm -S would continually fail. I gathered a core dump of the process while it was running and got the current backtrace of the process from an attached gdb session. Unfortunately corosync was not in debug mode, but I will attach an sosreport and a core dump I triggered from the machine.

Revision history for this message
Billy Olsen (billy-olsen) wrote :
Revision history for this message
Billy Olsen (billy-olsen) wrote :
Revision history for this message
Billy Olsen (billy-olsen) wrote :
Changed in hacluster (Juju Charms Collection):
importance: High → Critical
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

From the core generated by @billy-olsen seems that the corosync thread 2 is looping on rrp_* and then jumping into joinlist_inform_clients for every process on the group (crmd, pacemaker, and so ) as part of the sync barrier stage.

At this specific point i don't know ( because of no logs ) is if the service synchronization was marked as done or it was rejected for some reason. What's interesting is that this phase occurs right after the totem ring is formed.

Regarding to this, there is an interesting patch on upstream for making sure left nodes are really removed when cpg process is exited[0]

From @xianghui's syslog the only interesting events i am observing is a continous loop over:

Sep 03 17:28:15 [22111] dev-cluster2-node4 corosync debug [QB ] qb_ipcs_disconnect(22112-22133-19) state:2
Sep 03 17:28:15 [22111] dev-cluster2-node4 corosync debug [QB ] epoll_ctl(del): Bad file descriptor (9)
Sep 03 17:28:15 [22111] dev-cluster2-node4 corosync debug [MAIN ] cs_ipcs_connection_closed()

This latest could be related to this upstream patch [1], worth the effort to check if this is fixed in the Ubuntu's version for both bugs.

[0] http://lists.corosync.org/pipermail/discuss/2014-February/003019.html
[1] https://bugzilla.redhat.com/show_bug.cgi?id=959189

If in the meanwhile anybody from the community wants to help hunting this bug, please proceed:

1) Enable corosync debugging, on juju , use 'juju set hacluster debug=True"

2) Once you reach the split-brain scenario , please run the following commands on all of the cluster's nodes:
2.1) sudo crm_report -f 00:00

This will generate a file named pcmk-datetime stored on the current working directory, Please attach this file into this case.

2.2) sudo killall -USR1 crmd
2.3) sudo killall -TRAP crmd

This command will generate a directory on /var/lib/pacemaker/blackbor/crmd-* , Please attach this directory /var/lib/pacemaker/blackbox/crmd* on this case.

3) Please make sure to grab a full sosreport at this point, and also compress /var/lib/pacemaker and /var/lib/corosync and upload it here.

Thanks

Revision history for this message
Matt Rae (mattrae) wrote :

We were originally older version of the charms when we ran into this issue. Noting that after using a newer version of the charms, we haven't seen this issue so far.

Its difficult to tell the history of the newer charms because it appears that the bzr repositories were reinitialized at one point so most of the log is missing. I was able to confirm though that the newer versions contained the fix for lp: 1394008

Revision history for this message
Edward Hope-Morley (hopem) wrote :

So we have been seeing this kind of behaviour in a number of places of the last few months, specifically when running corosync clusters in lxc containers. Bumping lxc veth mtu to a higher value than that inside the container seems to have proven to be a decent workaround up until now and support for this exists in Juju (although broken and in process of being fixed [1]). A number of people have pinpointed some issues in corosync itself but the biggest culprit looks to be [2] for which we now have a an SRU in trusty-proposed. Please upgrade to this package (and manually resolve any corosync cluster issues (restarting seems to fix)) and report back if you still see this issue.

[1] https://bugs.launchpad.net/juju-core/+bug/1442257
[2] https://bugs.launchpad.net/ubuntu/+source/libqb/+bug/1341496

Changed in hacluster (Juju Charms Collection):
importance: Critical → Medium
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Since we have not had reports of this problem for a while, particularly since the aforementioned fixes were released, I am going to mark this as no longer an issue. If anybody hits this issue again, please reopen the ticket.

Changed in hacluster (Juju Charms Collection):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.