Context:
- OS deploy: cloud:trusty-mitaka
- juju version: 1.25.5
- affected service: swift-proxy (already at stable/17.02)
I've upgraded several of the *-hacluster subordinate
services from lp:charms/trusty/hacluster;revno=56
to stable/17.02, there was one that showed the following
state(s) after upgrade, hinting on a duplicate node
addition: same uname with a different node id (note in the
1000's range):
* note the same nodes as Online, OFFLINE:
~# crm status
Last updated: Mon Mar 27 15:00:18 2017
Last change: Mon Mar 27 14:52:02 2017 via crmd on juju-machine-0-lxc-22
Stack: corosync
Current DC: juju-machine-2-lxc-19 (1011) - partition with quorum
Version: 1.1.10-42f2063
6 Nodes configured
7 Resources configured
Online: [ juju-machine-0-lxc-22 juju-machine-1-lxc-20 juju-machine-2-lxc-19 ]
OFFLINE: [ juju-machine-0-lxc-22 juju-machine-1-lxc-20 juju-machine-2-lxc-19 ]
Resource Group: grp_swift_vips
res_swift_eth0_vip (ocf::heartbeat:IPaddr2): Started juju-machine-1-lxc-20
Clone Set: cl_swift_haproxy [res_swift_haproxy]
Started: [ juju-machine-0-lxc-22 juju-machine-1-lxc-20 juju-machine-2-lxc-19 ]
* dupped nodes indeed
~# crm node status
<nodes>
<node id="174427188" uname="juju-machine-2-lxc-19"/>
<node id="174427191" uname="juju-machine-1-lxc-20"/>
<node id="174427193" uname="juju-machine-0-lxc-22"/>
<node id="1009" uname="juju-machine-1-lxc-20"/>
<node id="1011" uname="juju-machine-2-lxc-19"/>
<node id="1010" uname="juju-machine-0-lxc-22"/>
</nodes>
* manually removing the "extra" nodes
(after verifying corosync.conf node ids in the 1000s)
~# cibadmin --delete --obj_type nodes --crm_xml '<node id="174427188">'
~# cibadmin --delete --obj_type nodes --crm_xml '<node id="174427191">'
~# cibadmin --delete --obj_type nodes --crm_xml '<node id="174427193">'
=> then got a clean crm status back.
This was due to a switch in the default transport from multicast to unicast in the hacluster charm made for the 16.10 charm release:
https:/ /docs.openstack .org/developer/ charm-guide/ 1610.html
although we did not have any detail about duplicate nodes in corosync at the link above.
The charm should probably make some sort of effort to cleanup the old nodes when switching between modes.