Public vip didn't relocate after connection loss on the public nic of primary controller

Bug #1370510 reported by Kirill Omelchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Aleksandr Didenko
5.1.x
Fix Released
High
Aleksandr Didenko
6.0.x
Fix Committed
High
Aleksandr Didenko

Bug Description

Steps to reproduce:
1. Deploy HA with centos, neutron GRE (3x controllers)
2. Run Network verification, OSTF (both pass successfully)
3. Simulate connection loose on "public" nic of primary controller
by removing the corresponding interface off the bridge on the host.
4. run crm status, try to open Horizon dashboard.

Expected:
Public vip is moved to another cluster member node, Horizon stays available.

Actual:
crm status showes that nothing is changed related to vip or anything else.
Horizon becomes unavailable.
(after bringing the mentioned bridge interface back up Horizon goes back 'online')

Changed in fuel:
importance: Undecided → Critical
milestone: none → 5.1
description: updated
description: updated
Changed in fuel:
importance: Critical → High
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Shutting down interfaces with ifup/ifdown is not a valid test of network outage.

Changed in fuel:
status: New → Invalid
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

The proper scenario is to remove interface from bridge and not ifdown/ipup

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Did as Sergii suggested -- same result.
(updated scenario)

description: updated
Changed in fuel:
status: Invalid → New
Revision history for this message
Egor Kotko (ykotko) wrote :

Have got the same on ubuntu.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Kirill, please write down what you did

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Kirill, please, write down specific commands that you applied

Changed in fuel:
status: New → Incomplete
Changed in fuel:
assignee: nobody → Aleksandr Didenko (adidenko)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

3. Simulate connection loose on "public" nic of primary controller
by removing the corresponding interface of the bridge on the host

There was a similar issue with split brain scenarios reproducing and there was a conclusion made that in virtual environments manipulations with interfaces are really bad way to simulate real HW cases (there are no carrier down notifications). AFAIK, the decision made was to put down the bridge, not an interface - but I'm not sure if putting it out of bridge would work as well. So please, try to reproduce the same issue by putting the public bridge in down state or even better - at real HW, if it is possible

Changed in fuel:
status: Incomplete → Triaged
status: Triaged → Won't Fix
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

In case when Public VIP is inaccessible, but interface and link are UP on the problem controller (for example firewall/routing issues on the switch side) - it won't be migrated. It won't break your OpenStack cloud, but will make Public VIP and services you expect to access via Public VIP (like Horizon) unavailable.
I beleive we won't be able to fix it in 5.1 due to HCF, so we need to put a record in release notes about it. We'll address this in 6.0

tags: added: release-notes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/122587

Changed in fuel:
status: Won't Fix → In Progress
description: updated
Changed in fuel:
status: In Progress → Won't Fix
Changed in fuel:
status: Won't Fix → In Progress
Changed in fuel:
status: In Progress → Won't Fix
Changed in fuel:
status: Won't Fix → In Progress
Changed in fuel:
status: In Progress → Won't Fix
Changed in fuel:
status: Won't Fix → In Progress
Changed in fuel:
status: In Progress → Won't Fix
Changed in fuel:
status: Won't Fix → In Progress
Changed in fuel:
status: In Progress → Won't Fix
Changed in fuel:
milestone: 5.1 → 5.1.1
status: Won't Fix → Triaged
summary: - Public vip didn't relocate after connection loose on the public nic of
+ Public vip didn't relocate after connection loss on the public nic of
primary controller
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/122587
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=04b2df08b0717d7b091a9a3425d4c6af979afbd3
Submitter: Jenkins
Branch: master

commit 04b2df08b0717d7b091a9a3425d4c6af979afbd3
Author: Aleksandr Didenko <email address hidden>
Date: Thu Sep 18 18:05:43 2014 -0700

    Tie pub vip with public gw ping

    Add new pacemaker cloned "ping" primitive and tie it with public
    vip via "location" in order to migrate public VIP in case
    controller can't ping public gateway.

    Closes-bug: 1370510
    Change-Id: Ibf4fedee536ac9eb106206060b58a6039558ef85

Changed in fuel:
status: Triaged → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/123587

Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

I noticed that the fix for 5.1 is been on review for about a month. I think it may be reasonable to add several reviewers explicitly there.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/123587
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=636ceb39174f390c927d06749c355138ed846c84
Submitter: Jenkins
Branch: stable/5.1

commit 636ceb39174f390c927d06749c355138ed846c84
Author: Aleksandr Didenko <email address hidden>
Date: Thu Sep 18 18:05:43 2014 -0700

    Tie pub vip with public gw ping

    Add new pacemaker cloned "ping" primitive and tie it with public
    vip via "location" in order to migrate public VIP in case
    controller can't ping public gateway.

    Partial-bug: 1370510
    Change-Id: Ibf4fedee536ac9eb106206060b58a6039558ef85
    (cherry picked from commit 04b2df08b0717d7b091a9a3425d4c6af979afbd3)

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Why was this bug reset from Fix Committed back to In Progress? What work remains to be done here?

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Target milestone moved to 5.1.2. Resetting status to In Progress in 6.0.x to be consistent with 5.1.x, until it's clarified which one it should be.

Revision history for this message
Stanislav Makar (smakar) wrote :

Verified

{"build_id": "2014-12-03_01-07-36", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "48", "auth_required": true, "api": "1.0", "nailgun_sha": "500e36d08a45dbb389bf2bd97673d9bff48ee84d", "production": "docker", "fuelmain_sha": "7626c5aeedcde77ad22fc081c25768944697d404", "astute_sha": "ef8aa0fd0e3ce20709612906f1f0551b5682a6ce", "feature_groups": ["mirantis"], "release": "5.1.1", "release_versions": {"2014.1.3-5.1.1": {"VERSION": {"build_id": "2014-12-03_01-07-36", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "48", "api": "1.0", "nailgun_sha": "500e36d08a45dbb389bf2bd97673d9bff48ee84d", "production": "docker", "fuelmain_sha": "7626c5aeedcde77ad22fc081c25768944697d404", "astute_sha": "ef8aa0fd0e3ce20709612906f1f0551b5682a6ce", "feature_groups": ["mirantis"], "release": "5.1.1", "fuellib_sha": "a3043477337b4a0a8fd166dc83d6cd5d504f5da8"}}}, "fuellib_sha": "a3043477337b4a0a8fd166dc83d6cd5d504f5da8"}

Andrew Woodward (xarses)
no longer affects: fuel/6.1.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.