[library] public_vip is not recovered if failover happens 2 times

Bug #1311749 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Sergey Vasilenko

Bug Description

{"build_id": "2014-04-23_03-41-25", "mirantis": "yes", "build_number": "127", "nailgun_sha": "ead54af79fb8d798bdf8b7cfc08ee60103c9292f", "production": "prod", "ostf_sha": "134765fcb5a07dce0cd1bb399b2290c988c3c63b", "fuelmain_sha": "a864e28fff841b40c3b4a53f00e622002f300b19", "astute_sha": "6e8fa4cc12968d7b468fc590b2f06bb59bf74511", "release": "5.0", "fuellib_sha": "646e52b7a34a0027b43c46aa5b525fe7674a8059"}

Step to reproduce:
1. Deploy ha on Centos with nova flat with 3 controllers
2. When deployment finish with success ssh on controller and see where vips are running (crm_mon -1)
3. ssh on node where public vip is running and using command ip address show find the eth
4. shut down eth with public vip
5. using crm see if public vip is recovered
6. ssh on node where vip were restarted and down eth vith public vip one more time

Expected:
crm move public vip after secone one failover

Actual:
after first shut down vip was moved but crm still saves info that public vip was failed
Online: [ node-2.test.domain.local node-4.test.domain.local node-5.test.domain.local ]

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-4.test.domain.local
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-2.test.domain.local
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-2.test.domain.local node-4.test.domain.local node-5.test.domain.local ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-2.test.domain.local node-4.test.domain.local node-5.test.domain.local ]
 openstack-heat-engine (ocf::mirantis:openstack-heat-engine): Started node-5.test.domain.local

Failed actions:
    vip__public_old_monitor_2000 on node-5.test.domain.local 'not running' (7): call=41, status=complete, last-rc-change='Wed Apr 23 14:42:07 2014', queued=0ms, exec=0ms
    openstack-heat-engine_monitor_20000 on node-4.test.domain.local 'not running' (7): call=39, status=complete, last-rc-change='Wed Apr 23 13:54:00 2014', queued=0ms, exec=0ms
    p_mysql_monitor_60000 on node-2.test.domain.local 'not running' (7): call=45, status=complete, last-rc-change='Wed Apr 23 13:56:07 2014', queued=25042ms, exec=0ms

after second fail-over public vip do not start

Online: [ node-2.test.domain.local node-4.test.domain.local node-5.test.domain.local ]

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-4.test.domain.local
 vip__public_old (ocf::mirantis:ns_IPaddr2): FAILED node-2.test.domain.local
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-2.test.domain.local node-4.test.domain.local node-5.test.domain.local ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-2.test.domain.local node-4.test.domain.local node-5.test.domain.local ]
 openstack-heat-engine (ocf::mirantis:openstack-heat-engine): Started node-5.test.domain.local

Failed actions:
    vip__public_old_monitor_2000 on node-5.test.domain.local 'not running' (7): call=41, status=complete, last-rc-change='Wed Apr 23 14:42:07 2014', queued=0ms, exec=0ms
    openstack-heat-engine_monitor_20000 on node-4.test.domain.local 'not running' (7): call=39, status=complete, last-rc-change='Wed Apr 23 13:54:00 2014', queued=0ms, exec=0ms
    vip__public_old_monitor_2000 on node-2.test.domain.local 'not running' (7): call=145, status=complete, last-rc-change='Wed Apr 23 15:09:28 2014', queued=19ms, exec=0ms
    p_mysql_monitor_60000 on node-2.test.domain.local 'not running' (7): call=45, status=complete, last-rc-change='Wed Apr 23 13:56:07 2014', queued=25042ms, exec=0ms

as a result openstack do not work

Tags: ha icehouse
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
tags: added: icehouse
summary: - public_vip is not recovered if failover happents 2 times
+ public_vip is not recovered if failover happens 2 times
Revision history for this message
Vladimir Kuklin (vkuklin) wrote : Re: public_vip is not recovered if failover happens 2 times

looks like -100 colocation problem between public and management vip

Changed in fuel:
importance: High → Medium
status: New → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergey Vasilenko (xenolog)
status: Confirmed → In Progress
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Whole pacemaker logging shows some services experincing problems even with monitoring commands. May be particular environment performance issue. We need a reproducer a here.

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

ns_Ipaddr2 resource can't make UP interface, that not management by one.
ns_Ipaddr2 manage only ethN-hapr and hapr-p interfaces.

After NN tries pacemaker make decision, than this resource can't work properly on this env.

Changed in fuel:
milestone: 5.0 → 5.1
Changed in fuel:
status: In Progress → Confirmed
Dmitry Ilyin (idv1985)
summary: - public_vip is not recovered if failover happens 2 times
+ [library] public_vip is not recovered if failover happens 2 times
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/108439

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/108439
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=40a2e36e37e122cb1f6a55c92d99afca4b001dc1
Submitter: Jenkins
Branch: master

commit 40a2e36e37e122cb1f6a55c92d99afca4b001dc1
Author: Sergey Vasilenko <email address hidden>
Date: Mon Jul 21 21:09:51 2014 +0400

    add checking for interface state to the ns_IPaddr2

    Change-Id: I5bd493ac1b661e6c7bf98b3c5659045de1e97c3e
    Closes-bug: #1311749
    Closes-bug: #1323277

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/109332

Revision history for this message
Serg Melikyan (smelikyan) wrote :

Changed status to In Progress due to commit https://review.openstack.org/109332

Changed in fuel:
status: Fix Committed → In Progress
Changed in fuel:
importance: Medium → High
tags: added: ha
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Bug was reopened incorrectly due to error in commit message of https://review.openstack.org/109332

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

verify on rc3.6.1

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.