Removed nodes via Fuel UI still shown into 'crm status'

Bug #1403558 reported by Denis Klepikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Bogdan Dobrelya
5.1.x
Won't Fix
Medium
Fuel Library (Deprecated)
6.0.x
Confirmed
High
Fuel Library (Deprecated)

Bug Description

After redeploying cluster crm see old nodes
Fuel 5.1.1
Cluster: Centos, HA, neutron+gre, Sahara

All steps do via Fuel UI
1 Deploy cluster: 3 nodes controller+cinder, 2 nodes computes
2 Remove 1 controller (last one into fuel list)
3 Deploy Changes
4 Wait for the host to become available as an unallocated node
5 Add the node to the environment with the same role as before (controller+cinder)
6 Deploy Changes - error (2 controller nodes in error first one and new node)
7 Remove controllers in error (first one and new node, leave just alive controller)
8 Deploy Changes
9 Wait for the hosts to becomes available as an unallocated nodes
10 Add the nodes to the environment with the same role as before (controller+cinder)
11 Deploy Changes

Expected that crm will see just real nodes.

[root@fuel ~]# fuel nodes
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|--------|------------------|---------|-----------|-------------------|--------------------|---------------|-------
19 | ready | Untitled (bb:97) | 5 | 10.20.0.4 | 0a:e0:35:7d:1c:4b | cinder, controller | | True
22 | ready | Untitled (b8:aa) | 5 | 10.20.0.7 | e2:4a:c6:50:0a:46 | compute | | True
25 | ready | Untitled (96:9b) | 5 | 10.20.0.5 | ba:15:71:69:63:4b | cinder, controller | | True
21 | ready | Untitled (83:7e) | 5 | 10.20.0.6 | 92:95:4e:54:08:41 | compute | | True
24 | ready | Untitled (ca:b5) | 5 | 10.20.0.3 | a2:a8:20:27:7e:46 | cinder, controller | | True

But
# ssh node-19
[root@node-19 ~]# crm_node --list
50374848 node-19.domain.tld member
33597632 node-24.domain.tld member
67152064 node-25.domain.tld member

[root@node-19 ~]# crm configure show
node node-18.domain.tld \
        attributes gtid="7da74cd3-84fe-11e4-8c21-ebba14b5482b:97131"
node node-19.domain.tld \
        attributes gtid="7da74cd3-84fe-11e4-8c21-ebba14b5482b:139298"
node node-20.domain.tld \
        attributes gtid="7da74cd3-84fe-11e4-8c21-ebba14b5482b:79017"
node node-24.domain.tld \
        attributes gtid="7da74cd3-84fe-11e4-8c21-ebba14b5482b:139164"
node node-25.domain.tld \
        attributes gtid="7da74cd3-84fe-11e4-8c21-ebba14b5482b:139128"
primitive p_haproxy ocf:mirantis:ns_haproxy \
        params ns="haproxy" \
        meta migration-threshold="3" failure-timeout="120" \
        op monitor interval="20" timeout="10" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="30"
primitive p_mysql ocf:mirantis:mysql-wss \
        params test_passwd="password" pid="/var/run/mysql/mysqld.pid" test_user="wsrep_sst" socket="/var/lib/mysql/mysql.sock" \
        op monitor interval="120" timeout="115" \
        op start interval="0" timeout="475" \
        op stop interval="0" timeout="175"
primitive p_neutron-dhcp-agent ocf:mirantis:neutron-agent-dhcp \
        params password="g8DhNOtn" tenant="services" username="neutron" os_auth_url="http://192.168.0.1:35357/v2.0" \
        meta resource-stickiness="1" is-managed="true" target-role="Started" \
        op monitor interval="30" timeout="10" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
primitive p_neutron-l3-agent ocf:mirantis:neutron-agent-l3 \
        params password="g8DhNOtn" tenant="services" syslog="true" username="neutron" os_auth_url="http://192.168.0.1:35357/v2.0" debug="false" \
        meta resource-stickiness="1" is-managed="true" target-role="Started" \
        op monitor interval="20" timeout="10" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
primitive p_neutron-metadata-agent ocf:mirantis:neutron-agent-metadata \
        op monitor interval="60" timeout="10" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="30"
primitive p_neutron-openvswitch-agent ocf:mirantis:neutron-agent-ovs \
        op monitor interval="20" timeout="10" \
        op start interval="0" timeout="80" \
        op stop interval="0" timeout="80"
primitive p_openstack-heat-engine ocf:mirantis:openstack-heat-engine \
        meta resource-stickiness="1" \
        op monitor interval="20" timeout="30" \
        op start interval="0" timeout="60" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
primitive p_rabbitmq-server ocf:mirantis:rabbitmq-server \
        params node_port="5673" \
        meta migration-threshold="INFINITY" failure-timeout="60s" \
        op monitor interval="30" timeout="60" \
        op promote interval="0" timeout="120" \
        op notify interval="0" timeout="60" \
        op start interval="0" timeout="120" \
        op demote interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="27" role="Master" timeout="60"
primitive ping_vip__public_old ocf:pacemaker:ping \
        params multiplier="1000" timeout="3s" dampen="30s" host_list="172.16.0.1" \
        op monitor interval="20" timeout="30"
primitive vip__management_old ocf:mirantis:ns_IPaddr2 \
        params iflabel="ka" ip="192.168.0.1" gateway_metric="20" iptables_start_rules="iptables -t mangle -I PREROUTING -i br-mgmt-hapr -j MARK --set-mark 0x2b ; iptables -t nat -I POSTROUTING -m mark --mark 0x2b ! -o br-mgmt -j MASQUERADE" iptables_comment="masquerade-for-management-net" iptables_stop_rules="iptables -t mangle -D PREROUTING -i br-mgmt-hapr -j MARK --set-mark 0x2b ; iptables -t nat -D POSTROUTING -m mark --mark 0x2b ! -o br-mgmt -j MASQUERADE" nic="br-mgmt" ns_veth="hapr-m" base_veth="br-mgmt-hapr" gateway="link" cidr_netmask="24" ns="haproxy" \
        meta resource-stickiness="1" migration-threshold="3" failure-timeout="60" is-managed="true" target-role="Started" \
        op monitor interval="3" timeout="30" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="30"
primitive vip__public_old ocf:mirantis:ns_IPaddr2 \
        params iflabel="ka" ip="172.16.0.2" gateway_metric="10" iptables_start_rules="iptables -t mangle -I PREROUTING -i br-ex-hapr -j MARK --set-mark 0x2a ; iptables -t nat -I POSTROUTING -m mark --mark 0x2a ! -o br-ex -j MASQUERADE" iptables_comment="masquerade-for-public-net" iptables_stop_rules="iptables -t mangle -D PREROUTING -i br-ex-hapr -j MARK --set-mark 0x2a ; iptables -t nat -D POSTROUTING -m mark --mark 0x2a ! -o br-ex -j MASQUERADE" nic="br-ex" ns_veth="hapr-p" base_veth="br-ex-hapr" gateway="link" cidr_netmask="24" ns="haproxy" \
        meta resource-stickiness="1" migration-threshold="3" failure-timeout="60" is-managed="true" target-role="Started" \
        op monitor interval="3" timeout="30" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="30"
ms master_p_rabbitmq-server p_rabbitmq-server \
        meta interleave="true" ordered="false" target-role="Started" master-node-max="1" notify="true" master-max="1" is-managed="true"
clone clone_p_haproxy p_haproxy \
        meta interleave="true" is-managed="true" target-role="Started"
clone clone_p_mysql p_mysql \
        meta is-managed="true" target-role="Started"
clone clone_p_neutron-metadata-agent p_neutron-metadata-agent \
        meta interleave="false" is-managed="true" target-role="Started"
clone clone_p_neutron-openvswitch-agent p_neutron-openvswitch-agent \
        meta interleave="false" is-managed="true" target-role="Started"
clone clone_p_openstack-heat-engine p_openstack-heat-engine \
        meta is-managed="true" target-role="Started"

[root@node-19 ~]# crm status
Last updated: Wed Dec 17 15:47:51 2014
Last change: Wed Dec 17 15:47:48 2014 via crm_attribute on node-19.domain.tld
Stack: classic openais (with plugin)
Current DC: node-19.domain.tld - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
5 Nodes configured, 3 expected votes
39 Resources configured

Online: [ node-19.domain.tld node-24.domain.tld node-25.domain.tld ]
OFFLINE: [ node-18.domain.tld node-20.domain.tld ]

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-19.domain.tld
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-19.domain.tld
 Clone Set: clone_ping_vip__public_old [ping_vip__public_old]
     ping_vip__public_old (ocf::pacemaker:ping): FAILED node-24.domain.tld (unmanaged)
     Started: [ node-19.domain.tld node-25.domain.tld ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-19.domain.tld node-24.domain.tld node-25.domain.tld ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-19.domain.tld ]
     Slaves: [ node-24.domain.tld node-25.domain.tld ]
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-19.domain.tld node-24.domain.tld node-25.domain.tld ]
 Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
     Started: [ node-19.domain.tld node-24.domain.tld node-25.domain.tld ]
 Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-19.domain.tld node-24.domain.tld node-25.domain.tld ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-19.domain.tld node-24.domain.tld node-25.domain.tld ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-19.domain.tld
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-25.domain.tld
location cli-prefer-p_neutron-dhcp-agent p_neutron-dhcp-agent inf: node-19.domain.tld role=Started
location cli-prefer-p_openstack-heat-engine p_openstack-heat-engine inf: node-20.domain.tld role=Started
location clone_p_haproxy_on_node-18.domain.tld clone_p_haproxy 0: node-18.domain.tld
location clone_p_haproxy_on_node-19.domain.tld clone_p_haproxy 0: node-19.domain.tld
location clone_p_haproxy_on_node-20.domain.tld clone_p_haproxy 0: node-20.domain.tld
location clone_p_haproxy_on_node-24.domain.tld clone_p_haproxy 0: node-24.domain.tld
location clone_p_haproxy_on_node-25.domain.tld clone_p_haproxy 0: node-25.domain.tld
location clone_p_mysql_on_node-18.domain.tld clone_p_mysql 0: node-18.domain.tld
location clone_p_mysql_on_node-19.domain.tld clone_p_mysql 0: node-19.domain.tld
location clone_p_mysql_on_node-20.domain.tld clone_p_mysql 0: node-20.domain.tld
location clone_p_mysql_on_node-24.domain.tld clone_p_mysql 0: node-24.domain.tld
location clone_p_mysql_on_node-25.domain.tld clone_p_mysql 0: node-25.domain.tld
location clone_p_neutron-metadata-agent_on_node-18.domain.tld clone_p_neutron-metadata-agent 0: node-18.domain.tld
location clone_p_neutron-metadata-agent_on_node-19.domain.tld clone_p_neutron-metadata-agent 0: node-19.domain.tld
location clone_p_neutron-metadata-agent_on_node-20.domain.tld clone_p_neutron-metadata-agent 0: node-20.domain.tld
location clone_p_neutron-metadata-agent_on_node-24.domain.tld clone_p_neutron-metadata-agent 0: node-24.domain.tld
location clone_p_neutron-metadata-agent_on_node-25.domain.tld clone_p_neutron-metadata-agent 0: node-25.domain.tld
location clone_p_neutron-openvswitch-agent_on_node-18.domain.tld clone_p_neutron-openvswitch-agent 0: node-18.domain.tld
location clone_p_neutron-openvswitch-agent_on_node-19.domain.tld clone_p_neutron-openvswitch-agent 0: node-19.domain.tld
location clone_p_neutron-openvswitch-agent_on_node-20.domain.tld clone_p_neutron-openvswitch-agent 0: node-20.domain.tld
location clone_p_neutron-openvswitch-agent_on_node-24.domain.tld clone_p_neutron-openvswitch-agent 0: node-24.domain.tld
location clone_p_neutron-openvswitch-agent_on_node-25.domain.tld clone_p_neutron-openvswitch-agent 0: node-25.domain.tld
location clone_p_openstack-heat-engine_on_node-18.domain.tld clone_p_openstack-heat-engine 0: node-18.domain.tld
location clone_p_openstack-heat-engine_on_node-19.domain.tld clone_p_openstack-heat-engine 0: node-19.domain.tld
location clone_p_openstack-heat-engine_on_node-20.domain.tld clone_p_openstack-heat-engine 0: node-20.domain.tld
location clone_p_openstack-heat-engine_on_node-24.domain.tld clone_p_openstack-heat-engine 0: node-24.domain.tld
location clone_p_openstack-heat-engine_on_node-25.domain.tld clone_p_openstack-heat-engine 0: node-25.domain.tld
location clone_ping_vip__public_old_on_node-18.domain.tld clone_ping_vip__public_old 0: node-18.domain.tld
location clone_ping_vip__public_old_on_node-19.domain.tld clone_ping_vip__public_old 0: node-19.domain.tld
location clone_ping_vip__public_old_on_node-20.domain.tld clone_ping_vip__public_old 0: node-20.domain.tld
location clone_ping_vip__public_old_on_node-24.domain.tld clone_ping_vip__public_old 0: node-24.domain.tld
location clone_ping_vip__public_old_on_node-25.domain.tld clone_ping_vip__public_old 0: node-25.domain.tld
location loc_ping_vip__public_old vip__public_old \
        rule $id="loc_ping_vip__public_old-rule" -inf: not_defined pingd or pingd lte 0
location master_p_rabbitmq-server_on_node-18.domain.tld master_p_rabbitmq-server 0: node-18.domain.tld
location master_p_rabbitmq-server_on_node-19.domain.tld master_p_rabbitmq-server 0: node-19.domain.tld
location master_p_rabbitmq-server_on_node-20.domain.tld master_p_rabbitmq-server 0: node-20.domain.tld
location master_p_rabbitmq-server_on_node-24.domain.tld master_p_rabbitmq-server 0: node-24.domain.tld
location master_p_rabbitmq-server_on_node-25.domain.tld master_p_rabbitmq-server 0: node-25.domain.tld
location p_neutron-dhcp-agent_on_node-18.domain.tld p_neutron-dhcp-agent 0: node-18.domain.tld
location p_neutron-dhcp-agent_on_node-19.domain.tld p_neutron-dhcp-agent 0: node-19.domain.tld
location p_neutron-dhcp-agent_on_node-20.domain.tld p_neutron-dhcp-agent 0: node-20.domain.tld
location p_neutron-dhcp-agent_on_node-24.domain.tld p_neutron-dhcp-agent 0: node-24.domain.tld
location p_neutron-dhcp-agent_on_node-25.domain.tld p_neutron-dhcp-agent 0: node-25.domain.tld
location p_neutron-l3-agent_on_node-18.domain.tld p_neutron-l3-agent 0: node-18.domain.tld
location p_neutron-l3-agent_on_node-19.domain.tld p_neutron-l3-agent 0: node-19.domain.tld
location p_neutron-l3-agent_on_node-20.domain.tld p_neutron-l3-agent 0: node-20.domain.tld
location p_neutron-l3-agent_on_node-24.domain.tld p_neutron-l3-agent 0: node-24.domain.tld
location p_neutron-l3-agent_on_node-25.domain.tld p_neutron-l3-agent 0: node-25.domain.tld
location vip__management_old_on_node-18.domain.tld vip__management_old 0: node-18.domain.tld
location vip__management_old_on_node-19.domain.tld vip__management_old 0: node-19.domain.tld
location vip__management_old_on_node-20.domain.tld vip__management_old 0: node-20.domain.tld
location vip__management_old_on_node-24.domain.tld vip__management_old 0: node-24.domain.tld
location vip__management_old_on_node-25.domain.tld vip__management_old 0: node-25.domain.tld
location vip__public_old_on_node-18.domain.tld vip__public_old 0: node-18.domain.tld
location vip__public_old_on_node-19.domain.tld vip__public_old 0: node-19.domain.tld
location vip__public_old_on_node-20.domain.tld vip__public_old 0: node-20.domain.tld
location vip__public_old_on_node-24.domain.tld vip__public_old 0: node-24.domain.tld
location vip__public_old_on_node-25.domain.tld vip__public_old 0: node-25.domain.tld
colocation dhcp-with-metadata inf: p_neutron-dhcp-agent clone_p_neutron-metadata-agent
colocation dhcp-with-ovs inf: p_neutron-dhcp-agent clone_p_neutron-openvswitch-agent
colocation dhcp-without-l3 -100: p_neutron-dhcp-agent p_neutron-l3-agent
colocation l3-with-metadata inf: p_neutron-l3-agent clone_p_neutron-metadata-agent
colocation l3-with-ovs inf: p_neutron-l3-agent clone_p_neutron-openvswitch-agent
colocation vip_management-with-haproxy inf: vip__management_old clone_p_haproxy
colocation vip_public-with-haproxy inf: vip__public_old clone_p_haproxy
order dhcp-after-metadata inf: clone_p_neutron-metadata-agent p_neutron-dhcp-agent
order dhcp-after-ovs inf: clone_p_neutron-openvswitch-agent p_neutron-dhcp-agent
order l3-after-metadata inf: clone_p_neutron-metadata-agent p_neutron-l3-agent
order l3-after-ovs inf: clone_p_neutron-openvswitch-agent p_neutron-l3-agent
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-14.el6_5.3-368c726" \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes="3" \
        no-quorum-policy="stop" \
        stonith-enabled="false" \
        start-failure-is-fatal="false" \
        symmetric-cluster="false" \
        last-lrm-refresh="1418822829" \
        maintenance-mode="false"

Nodes node-18.domain.tld and node20.domain.tld do not exist.

description: updated
description: updated
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 5.1.2
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

As this doesn't break deployment, setting to medium and 6.1. 5.1 and 6.0 should only get high priority backports.

Changed in fuel:
milestone: 6.0.1 → 6.1
no longer affects: fuel/6.1.x
Changed in fuel:
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

In order to fix it, this nodes should be manually removed from corosync cluster. Otherwise, there could be an issues with quorum evaluation

tags: added: release-notes
Revision history for this message
Denis Klepikov (dklepikov) wrote :

Is it not the same as "nodes should be manually removed from corosync "?
I remove it manually and restart 1 alive controller- cluster continue working without quorum

# crm_node --force -R node-20.domain.tld
# crm_node --force -R node-18.domain.tld

[root@node-19 ~]# pcs status
Cluster name:
Last updated: Thu Dec 18 14:03:32 2014
Last change: Thu Dec 18 14:02:08 2014 via crm_attribute on node-19.domain.tld
Stack: classic openais (with plugin)
Current DC: node-19.domain.tld - partition WITHOUT quorum
Version: 1.1.10-14.el6_5.3-368c726
1 Nodes configured, 3 expected votes
11 Resources configured

Online: [ node-19.domain.tld ]

Full list of resources:

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-19.domain.tld
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-19.domain.tld
 Clone Set: clone_ping_vip__public_old [ping_vip__public_old]
     Started: [ node-19.domain.tld ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-19.domain.tld ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-19.domain.tld ]
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-19.domain.tld ]
 Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
     Started: [ node-19.domain.tld ]
 Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-19.domain.tld ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-19.domain.tld ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-19.domain.tld
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-19.domain.tld

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

It could operate w/o quorum if no-quorum-policy is set to ignore (check crm configure show). We use this policy while deploying the environment. Once deployed, there is an astute post deploy hook which sets the policy to 'stop'.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

By the way, then scaling the cluster down, make sure you will not lower the number of voting nodes to the values which cannot ensure the quorum

Changed in fuel:
importance: Medium → High
assignee: Fuel Library Team (fuel-library) → Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Dmitry Ilyin (idv1985)
Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Fuel Library Team (fuel-library)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bartlomiej Piotrowski (bpiotrowski)
Changed in fuel:
assignee: Bartlomiej Piotrowski (bpiotrowski) → Fuel Library Team (fuel-library)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Raised to high for the 6.0.x milestone as the release 6.0 should support the nodes horizontal scaling as appropriate

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.