dupped node list after upgrade-charm to stable/17.02

Bug #1676529 reported by JuanJo Ciarlante
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Triaged
Medium
Unassigned

Bug Description

Context:
- OS deploy: cloud:trusty-mitaka
- juju version: 1.25.5
- affected service: swift-proxy (already at stable/17.02)

I've upgraded several of the *-hacluster subordinate
services from lp:charms/trusty/hacluster;revno=56
to stable/17.02, there was one that showed the following
state(s) after upgrade, hinting on a duplicate node
addition: same uname with a different node id (note in the
1000's range):

* note the same nodes as Online, OFFLINE:
~# crm status
Last updated: Mon Mar 27 15:00:18 2017
Last change: Mon Mar 27 14:52:02 2017 via crmd on juju-machine-0-lxc-22
Stack: corosync
Current DC: juju-machine-2-lxc-19 (1011) - partition with quorum
Version: 1.1.10-42f2063
6 Nodes configured
7 Resources configured

Online: [ juju-machine-0-lxc-22 juju-machine-1-lxc-20 juju-machine-2-lxc-19 ]
OFFLINE: [ juju-machine-0-lxc-22 juju-machine-1-lxc-20 juju-machine-2-lxc-19 ]

 Resource Group: grp_swift_vips
     res_swift_eth0_vip (ocf::heartbeat:IPaddr2): Started juju-machine-1-lxc-20
 Clone Set: cl_swift_haproxy [res_swift_haproxy]
     Started: [ juju-machine-0-lxc-22 juju-machine-1-lxc-20 juju-machine-2-lxc-19 ]

* dupped nodes indeed

~# crm node status
<nodes>
  <node id="174427188" uname="juju-machine-2-lxc-19"/>
  <node id="174427191" uname="juju-machine-1-lxc-20"/>
  <node id="174427193" uname="juju-machine-0-lxc-22"/>
  <node id="1009" uname="juju-machine-1-lxc-20"/>
  <node id="1011" uname="juju-machine-2-lxc-19"/>
  <node id="1010" uname="juju-machine-0-lxc-22"/>
</nodes>

* manually removing the "extra" nodes
 (after verifying corosync.conf node ids in the 1000s)
~# cibadmin --delete --obj_type nodes --crm_xml '<node id="174427188">'
~# cibadmin --delete --obj_type nodes --crm_xml '<node id="174427191">'
~# cibadmin --delete --obj_type nodes --crm_xml '<node id="174427193">'

=> then got a clean crm status back.

Revision history for this message
James Page (james-page) wrote :

This was due to a switch in the default transport from multicast to unicast in the hacluster charm made for the 16.10 charm release:

  https://docs.openstack.org/developer/charm-guide/1610.html

although we did not have any detail about duplicate nodes in corosync at the link above.

The charm should probably make some sort of effort to cleanup the old nodes when switching between modes.

Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 17.05
James Page (james-page)
Changed in charm-hacluster:
milestone: 17.05 → 17.08
James Page (james-page)
Changed in charm-hacluster:
milestone: 17.08 → 17.11
James Page (james-page)
Changed in charm-hacluster:
milestone: 17.11 → 18.02
Ryan Beisner (1chb1n)
Changed in charm-hacluster:
milestone: 18.02 → 18.05
David Ames (thedac)
Changed in charm-hacluster:
milestone: 18.05 → 18.08
James Page (james-page)
Changed in charm-hacluster:
milestone: 18.08 → 18.11
David Ames (thedac)
Changed in charm-hacluster:
milestone: 18.11 → 19.04
David Ames (thedac)
Changed in charm-hacluster:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-hacluster:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-hacluster:
milestone: 19.10 → 20.01
tags: added: charm-upgrade
James Page (james-page)
Changed in charm-hacluster:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-hacluster:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-hacluster:
milestone: 20.08 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.