Could not patch cib; leading to no haproxy running

Bug #1526271 reported by Adam Collard
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hacluster (Juju Charms Collection)
Invalid
Undecided
Unassigned
haproxy (Ubuntu)
Fix Released
Critical
Unassigned
Trusty
Fix Released
Critical
James Page
Vivid
Invalid
Critical
Unassigned
Wily
Invalid
Critical
Unassigned
Xenial
Fix Released
Critical
Unassigned

Bug Description

Using Juju 1.24.7, cs:trusty/hacluster-26, with attached Juju status.

hacluster unit log shows that it failed to configure the no-quorum-policy, which then lead to no haproxy running on the unit which was the leader and has the VIP. The symptom I saw was getting "Connection refused" when talking to the VIP.

2015-12-15 00:19:22 DEBUG juju-log hanode:15: Ready to form cluster - informing peers
2015-12-15 00:19:22 DEBUG juju-log hanode:15: Parsing cluster configuration using rid: ha:26, unit: keystone/0
2015-12-15 00:19:23 DEBUG juju-log hanode:15: Configuring and (maybe) restarting corosync
2015-12-15 00:19:23 INFO juju-log hanode:15: Writing file /etc/default/corosync root:root 444
2015-12-15 00:19:23 INFO juju-log hanode:15: Writing file /etc/corosync/uidgid.d/hacluster root:root 444
2015-12-15 00:19:23 INFO juju-log hanode:15: Writing file /etc/corosync/authkey root:root 400
2015-12-15 00:19:23 INFO juju-log hanode:15: Writing file /etc/corosync/corosync.conf root:root 444
2015-12-15 00:19:23 INFO hanode-relation-changed * Restarting corosync daemon corosync
2015-12-15 00:19:23 INFO hanode-relation-changed notice [MAIN ] Corosync Cluster Engine ('2.3.3'): started and ready to provide service.
2015-12-15 00:19:23 INFO hanode-relation-changed info [MAIN ] Corosync built-in features: dbus testagents rdma watchdog augeas pie relro bindnow
2015-12-15 00:19:23 INFO hanode-relation-changed ...done.
2015-12-15 00:19:28 INFO hanode-relation-changed Starting Pacemaker Cluster Manager: [ OK ]
2015-12-15 00:19:28 DEBUG juju-log hanode:15: Applying global cluster configuration
2015-12-15 00:19:28 DEBUG juju-log hanode:15: Configuring no-quorum-policy to stop
2015-12-15 00:19:28 INFO hanode-relation-changed Call cib_apply_diff failed (-206): Application of an update diff failed
2015-12-15 00:19:28 INFO hanode-relation-changed ERROR: could not patch cib (rc=206)
2015-12-15 00:19:28 INFO hanode-relation-changed INFO: offending xml diff: <diff crm_feature_set="3.0.7">
2015-12-15 00:19:28 INFO hanode-relation-changed <diff-removed>
2015-12-15 00:19:28 INFO hanode-relation-changed <cib epoch="22"/>
2015-12-15 00:19:28 INFO hanode-relation-changed </diff-removed>
2015-12-15 00:19:28 INFO hanode-relation-changed <diff-added>
2015-12-15 00:19:28 INFO hanode-relation-changed <cib epoch="23" num_updates="10" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" cib-last-written="Tue Dec 15 00:19:29 2015" update-origin="juju-machine-0-lxc-8" update-client="cibadmin" have-quorum="1" dc-uuid="1002"/>
2015-12-15 00:19:28 INFO hanode-relation-changed </diff-added>
2015-12-15 00:19:28 INFO hanode-relation-changed </diff>
2015-12-15 00:19:28 INFO hanode-relation-changed
2015-12-15 00:19:28 INFO hanode-relation-changed
2015-12-15 00:19:29 DEBUG juju-log hanode:15: Checking monitor host configuration
2015-12-15 00:19:29 INFO juju-log hanode:15: Disabling STONITH
2015-12-15 00:19:29 DEBUG juju-log hanode:15: Deleting Resources
2015-12-15 00:19:29 DEBUG juju-log hanode:15: Configuring Resources: {'res_ks_eth0_vip': 'ocf:heartbeat:IPaddr2', 'res_ks_haproxy': 'lsb:haproxy'}
2015-12-15 00:19:29 INFO hanode-relation-changed Removing any system startup links for /etc/init.d/haproxy ...
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc0.d/K20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc1.d/K20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc2.d/S20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc3.d/S20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc4.d/S20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc5.d/S20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed /etc/rc6.d/K20haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed * Stopping haproxy haproxy
2015-12-15 00:19:29 INFO hanode-relation-changed ...done.

--

keystone/2 has a running haproxy and is behaving as expected

Error looks remarkably similar to https://bugs.launchpad.net/fuel/+bug/1363908 where they moved away from using crm:

"We need to use cibadmin -P instead of crm to avoid such problems as it can lead to cluster in unconfigured state and to following problems with cluster scalability and failover."

Revision history for this message
Adam Collard (adam-collard) wrote :
Revision history for this message
Adam Collard (adam-collard) wrote :
Revision history for this message
Adam Collard (adam-collard) wrote :
tags: added: kanban-cross-team
tags: removed: kanban-cross-team
Revision history for this message
Adam Collard (adam-collard) wrote :
Revision history for this message
Adam Collard (adam-collard) wrote :
Revision history for this message
Adam Collard (adam-collard) wrote :

FWIW, manually starting haproxy seems to have helped

description: updated
Revision history for this message
David Britton (dpb) wrote :

It's looking like this is caused by haproxy (1.4.24-2ubuntu0.3) trusty; urgency=medium which landed in trusty-updates on 09-DEC-2015

Revision history for this message
James Page (james-page) wrote :

The EXIT trap in the init script calls 'exit' directly, which overrides the return code of the actual operation.

This is already fixed in Xenial and Wily; looking at Vivid and Trusty now.

Changed in haproxy (Ubuntu Xenial):
status: New → Fix Released
Changed in haproxy (Ubuntu Wily):
importance: Undecided → Critical
Changed in haproxy (Ubuntu Trusty):
importance: Undecided → Critical
Changed in haproxy (Ubuntu Vivid):
importance: Undecided → Critical
Changed in haproxy (Ubuntu Xenial):
importance: Undecided → Critical
Changed in haproxy (Ubuntu Wily):
status: New → Invalid
Changed in haproxy (Ubuntu Trusty):
status: New → Confirmed
Revision history for this message
James Page (james-page) wrote :

Only impacts trusty; all other series have the later version of the fix for bug 1481737

Changed in haproxy (Ubuntu Vivid):
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

SRU information

[Impact]
Management of haproxy via tools such as corosync and pacemaker is currently broken as the init script always returns code 0; things can't tell whether haproxy is running or not.

[Test Case]
sudo apt-get install haproxy
<edit /etc/haproxy/haproxy.cfg to have listeners>
sudo service haproxy status -> will always return code 0

[Regression Potential]
Equivalent behaviour already in >= vivid, so minimal.

Changed in haproxy (Ubuntu Trusty):
status: Confirmed → In Progress
assignee: nobody → James Page (james-page)
Changed in hacluster (Juju Charms Collection):
status: New → Invalid
Revision history for this message
James Page (james-page) wrote :

David - we should have this into proposed in the next hour or so - can you tweak your testing to use trusty-proposed?

Nice to have validation that it resolves your specific issue as well.

Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Adam, or anyone else affected,

Accepted haproxy into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/haproxy/1.4.24-2ubuntu0.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in haproxy (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
David Britton (dpb) wrote :

I have installed haproxy as part of an openstack (icehouse/trusty) installation using the trusty-proposed archive. We no longer see the issue with the updated package:

Package version tested:

1.4.24-2ubuntu0.4

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package haproxy - 1.4.24-2ubuntu0.4

---------------
haproxy (1.4.24-2ubuntu0.4) trusty; urgency=medium

  * debian/haproxy.init: Ensure that EXIT trap does not override the
    return status of the init script, which causes issues in tools that
    check return codes such as pacemaker (LP: #1526271).

 -- James Page <email address hidden> Tue, 15 Dec 2015 15:07:13 +0000

Changed in haproxy (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of the Stable Release Update for haproxy has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.