Activity log for bug #1341496

Date Who What changed Old value New value Message
2014-07-14 09:30:13 Tomasz Kontusz bug added bug
2014-08-04 18:33:22 Launchpad Janitor libqb (Ubuntu): status New Confirmed
2014-09-12 15:45:42 Robie Basak libqb (Ubuntu): assignee Kick In (kick-d)
2014-09-19 10:29:17 zblk bug added subscriber zblk
2014-10-20 10:42:00 Adrián Santos Marrero bug added subscriber Adrián Santos Marrero
2014-11-05 10:12:20 Thilo Uttendorfer bug added subscriber Thilo Uttendorfer
2014-11-30 08:44:45 Kees B bug added subscriber Kees B
2014-12-05 11:20:23 Roberto Suarez bug added subscriber Roberto Suarez
2015-02-19 13:19:33 devweasel bug added subscriber Alexander J. Trentini
2015-03-02 19:49:38 Markus bug added subscriber Markus
2015-03-03 10:51:28 Merritt Krakowitzer bug added subscriber Merritt Krakowitzer
2015-03-04 10:43:06 Dennis S. bug added subscriber Dennis S.
2015-04-23 10:14:14 Mario Splivalo bug added subscriber Mario Splivalo
2015-04-28 16:43:13 Billy Olsen description $ lsb_release -rd Description: Ubuntu 14.04 LTS Release: 14.04 $ apt-cache policy libqb0 libqb0: Installed: 0.16.0.real-1ubuntu3 Candidate: 0.16.0.real-1ubuntu3 Version table: *** 0.16.0.real-1ubuntu3 0 500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status Corosync sometimes hangs inside libqb. I've looked at a hanged process with gdb, and I think I've found the problem. The problem is the loop here: https://github.com/ClusterLabs/libqb/blob/v0.16.0/lib/ringbuffer.c#L451 This was fixed in 0.17.0, see: https://github.com/ClusterLabs/libqb/blob/v0.17.0/lib/ringbuffer.c#L451 I think bumping to 0.17.0 should fix this (at least in backports? Please?) $ lsb_release -rd Description: Ubuntu 14.04 LTS Release: 14.04 $ apt-cache policy libqb0 libqb0:   Installed: 0.16.0.real-1ubuntu3   Candidate: 0.16.0.real-1ubuntu3   Version table:  *** 0.16.0.real-1ubuntu3 0         500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages         100 /var/lib/dpkg/status Corosync sometimes hangs inside libqb. I've looked at a hanged process with gdb, and I think I've found the problem. The problem is the loop here: https://github.com/ClusterLabs/libqb/blob/v0.16.0/lib/ringbuffer.c#L451 This was fixed in 0.17.0, see: https://github.com/ClusterLabs/libqb/blob/v0.17.0/lib/ringbuffer.c#L451 I think bumping to 0.17.0 should fix this (at least in backports? Please?) -------------------------------------------------------------------------- [Impact] * libqb does not currently handle ring buffer alloc errors properly. The result of this is corosync frequently ending up in an infinite loop (consuming 100% cpu) as it continuously tries and fails to allocate space from the ringbuffer due to erroneous logic when an attempt to reclaim space fails. This patch ensures that when the reclaim fails the libqb library gracefully errors out and allows corosync to proceed with execution. * This is fixed by cherry-picking the following 2 commits: - https://github.com/ClusterLabs/libqb/commit/00082df49f045053d03bba7713bfff35d2448459 - https://github.com/ClusterLabs/libqb/commit/47c690dbbc75957ac2354844b8fbf0a9f4791a87 [Test Case] There is a test case in comment #2. A test case that was simple for me to recreate the problem (I used juju to replicate): 1. Deploy a 2 node percona-cluster w/ corosync and pacemaker. 2. Scale the number of units from 2 to 5 nodes. 3. Observe one of the instances of corosync will encounter 100% cpu usage and will not be stuck. e.g. juju bootstrap # install percona-cluster juju deploy -n 2 cs:trusty/percona-cluster juju deploy cs:trusty/hacluster # configure corosync to use unicast for communication juju set hacluster corosync_transport=udpu # configure the virtual ip for corosync juju set percona-cluster vip=<your-vip> # cause juju to configure the corosync/pacemaker configuration with percona-cluster. juju add-relation percona-cluster hacluster # wait for juju debug-log to go quiet. # then expand the cluster by 3 nodes. juju add-unit -n 3 percona-cluster [Regression Potential] * As a result of the changes, this may cause a blackbox log entry to be dropped or it may cause a ring to be discarded and a new ring to be created. - If a log entry is dropped, some information may be missing from the blackbox used later for analysis. However, upstream has determined that missing a log entry is more ideal than hanging the corosync process. - Rings are discarded as part of the normal corosync communication process, and corosync already knows how ot properly handle this situation so the risk is small in this area.
2015-04-28 16:44:57 Billy Olsen attachment added lp1341496.debdiff https://bugs.launchpad.net/ubuntu/+source/libqb/+bug/1341496/+attachment/4386861/+files/lp1341496.debdiff
2015-04-28 16:49:31 Billy Olsen bug added subscriber Ubuntu Sponsors Team
2015-04-28 16:59:32 Chris J Arges nominated for series Ubuntu Utopic
2015-04-28 16:59:32 Chris J Arges bug task added libqb (Ubuntu Utopic)
2015-04-28 16:59:32 Chris J Arges nominated for series Ubuntu Trusty
2015-04-28 16:59:32 Chris J Arges bug task added libqb (Ubuntu Trusty)
2015-04-28 17:00:10 Chris J Arges libqb (Ubuntu Trusty): importance Undecided Medium
2015-04-28 17:00:12 Chris J Arges libqb (Ubuntu Utopic): importance Undecided Medium
2015-04-28 17:00:15 Chris J Arges libqb (Ubuntu Trusty): status New In Progress
2015-04-28 17:00:17 Chris J Arges libqb (Ubuntu Utopic): status New In Progress
2015-04-28 17:16:20 Chris J Arges removed subscriber Ubuntu Sponsors Team
2015-04-29 01:28:03 Benjamin Kendinibilir bug added subscriber Benjamin Kendinibilir
2015-04-29 07:17:02 Vytas bug added subscriber Vytas
2015-05-01 19:44:30 Brian Murray libqb (Ubuntu): status Confirmed Fix Released
2015-05-01 19:44:43 Brian Murray libqb (Ubuntu Trusty): status In Progress Fix Committed
2015-05-01 19:44:45 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2015-05-01 19:44:47 Brian Murray bug added subscriber SRU Verification
2015-05-01 19:44:57 Brian Murray tags verification-needed
2015-05-05 13:01:30 Thomas bug added subscriber Thomas Klaver
2015-05-06 14:54:08 Nobuto Murata bug added subscriber Nobuto Murata
2015-05-07 17:46:49 Rahul Krishna Upadhyaya bug added subscriber Rahul Krishna Upadhyaya
2015-05-11 22:08:47 Dean Henrichsmeyer bug added subscriber Landscape
2015-05-12 23:36:20 Matt Rae tags verification-needed verification-done
2015-05-13 15:12:18 Launchpad Janitor libqb (Ubuntu Trusty): status Fix Committed Fix Released
2015-05-13 15:12:21 Chris J Arges removed subscriber Ubuntu Stable Release Updates Team
2015-05-24 17:13:47 Launchpad Janitor branch linked lp:ubuntu/trusty-proposed/libqb
2016-04-24 10:46:50 Rolf Leggewie libqb (Ubuntu Utopic): status In Progress Won't Fix