HA deploys unreliable, pacemaker dies
Bug #1421488 reported by
Paul Gear
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
Fix Released
|
Undecided
|
Billy Olsen | ||
hacluster (Juju Charms Collection) |
Invalid
|
Undecided
|
Billy Olsen | ||
keystone (Juju Charms Collection) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
In fresh deploys of HA keystone, I'm getting unreliable behaviour. One of the most common problems is that pacemaker dies on the 3rd node. Symptoms of this are: higher load on the failing node, pacemaker down, and the hanode-
tags: | removed: backport-potential |
Changed in charm-hacluster: | |
assignee: | nobody → Billy Olsen (billy-olsen) |
status: | New → Confirmed |
Changed in hacluster (Juju Charms Collection): | |
status: | Confirmed → Invalid |
To post a comment you must log in.
Here's the output of 'crm status' on all 3 nodes:
root@juju- machine- 0-lxc-6: ~# crm status 0-lxc-6 1-lxc-3 (1000) - partition with quorum
Last updated: Fri Feb 13 02:09:47 2015
Last change: Fri Feb 13 01:39:53 2015 via crmd on juju-machine-
Stack: corosync
Current DC: juju-machine-
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured
Online: [ juju-machine- 0-lxc-6 juju-machine- 1-lxc-3 ]
Resource Group: grp_ks_vips ks_eth0_ vip (ocf::heartbeat :IPaddr2) : Started juju-machine- 0-lxc-6 0-lxc-6 juju-machine- 1-lxc-3 ]
res_
Clone Set: cl_ks_haproxy [res_ks_haproxy]
Started: [ juju-machine-
root@juju- machine- 1-lxc-3: ~# crm status 0-lxc-6 1-lxc-3 (1000) - partition with quorum
Last updated: Fri Feb 13 02:10:51 2015
Last change: Fri Feb 13 01:39:53 2015 via crmd on juju-machine-
Stack: corosync
Current DC: juju-machine-
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured
Online: [ juju-machine- 0-lxc-6 juju-machine- 1-lxc-3 ]
Resource Group: grp_ks_vips ks_eth0_ vip (ocf::heartbeat :IPaddr2) : Started juju-machine- 0-lxc-6 0-lxc-6 juju-machine- 1-lxc-3 ]
res_
Clone Set: cl_ks_haproxy [res_ks_haproxy]
Started: [ juju-machine-
root@juju- machine- 2-lxc-6: ~# crm status
Could not establish cib_ro connection: Connection refused (111)
ERROR: crm_mon exited with code 107 and said: Connection to cluster failed: Transport endpoint is not connected
And, restarting pacemaker on node 2:
root@juju- machine- 2-lxc-6: ~# service pacemaker restart machine- 2-lxc-6: ~# crm status 1-lxc-3 1-lxc-3 (1000) - partition with quorum
Pacemaker Cluster Manager is already stopped[ OK ]
Starting Pacemaker Cluster Manager: [ OK ]
root@juju-
Last updated: Fri Feb 13 02:12:05 2015
Last change: Fri Feb 13 02:12:04 2015 via crmd on juju-machine-
Stack: corosync
Current DC: juju-machine-
Version: 1.1.10-42f2063
3 Nodes configured
4 Resources configured
Online: [ juju-machine- 0-lxc-6 juju-machine- 1-lxc-3 juju-machine- 2-lxc-6 ]
Resource Group: grp_ks_vips ks_eth0_ vip (ocf::heartbeat :IPaddr2) : Started juju-machine- 0-lxc-6 0-lxc-6 juju-machine- 1-lxc-3 juju-machine- 2-lxc-6 ]
res_
Clone Set: cl_ks_haproxy [res_ks_haproxy]
Started: [ juju-machine-