HA router should failover if GW address is not reachable

Bug #1365461 reported by Assaf Muller on 2014-09-04
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
High
Brian Haley

Bug Description

If a HA router has an external interface defined then it should use a custom keepalived health check to monitor the default gateway. If it isn't reachable after X attempts, lower its own priority and failover.

Yair Fried (yfried) wrote :

what happens if GW reachable but other nics aren't?
what if the router looses connectivity to the HA network? how can it know it's down then?

Assaf Muller (amuller) wrote :

I'm assuming that all tenant networks and the HA network all live on the same physical network and physical NIC, while the external network may reside on another physical network and NIC. The feature handles an agent disconnecting from the tenant network (Another router will become the master), but if the master router loses connectivity to the external network it will still be the master and will drop all packets to the external network.

Changed in neutron:
importance: Undecided → Low
Yoni Shafrir (yshafrir) on 2015-01-04
Changed in neutron:
assignee: nobody → Yoni (yshafrir)

Reviewed: https://review.openstack.org/152861
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=90df090945e9800d354c34fd03236385bff82a4e
Submitter: Jenkins
Branch: master

commit 90df090945e9800d354c34fd03236385bff82a4e
Author: Yoni Shafrir <email address hidden>
Date: Wed Feb 4 07:42:13 2015 +0200

    Remove use of keepalived 'vrrp_sync_group' as it is unused

    Now keepalived configuration wraps the VRRP instances with a
    'vrrp_sync_group'. The VRRP sync group functionality is only
    relevant when more then one VR instance is contained in it.
    In that case the VRs in the group will have the same state.
    Our use of keepalived uses a single instance per router.

    This patch simply removes the 'vrrp_sync_group'.
    In this patch VR instances are used on their own and they now
    hold the 'notify_scripts'.

    Note that the same VRRP functionality is preserved with this
    patch.

    Another motiviation for this patch, aside from removing
    useless configuration, is to lay the foundation for a future
    patch that will the related bug by adding 'track_script'
    that are not supported with 'vrrp_sync_group'.

    Change-Id: I33b81049cd9cf140244bbf121d1a71492161c77c
    Related-Bug: #1365461

Changed in neutron:
status: New → In Progress
Assaf Muller (amuller) wrote :

This is no longer in progress.

Changed in neutron:
status: In Progress → Confirmed

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/156563
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Assaf Muller (amuller) on 2015-11-16
Changed in neutron:
assignee: Yoni Shafrir (yshafrir) → nobody
Lubosz Kosnik (diltram) on 2015-12-12
Changed in neutron:
assignee: nobody → Lubosz Kosnik (diltram)

Fix proposed to branch: master
Review: https://review.openstack.org/261942

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
milestone: none → mitaka-3
importance: Low → Medium

Change abandoned by Lubosz Kosnik (<email address hidden>) on branch: master
Review: https://review.openstack.org/261942
Reason: Other solution will be prepared

Changed in neutron:
milestone: mitaka-3 → mitaka-rc1

As much as I hate postponing these to N-1, targeting a fix for RC1 is an ambitious goal. That said, I believe in miracles and there's still a chance this can make into Mitaka.

Changed in neutron:
milestone: mitaka-rc1 → newton-1
importance: Medium → High

The severity of this issue is high: there's no reasonable workaround (as far as I am aware) and not being able to rely on a robust HA solution, kinda defeats the points of HA.

tags: added: mitaka-rc-potential
tags: removed: mitaka-rc-potential
zkou (finishman1) on 2016-05-18
Changed in neutron:
assignee: Lubosz Kosnik (diltram) → zkou (finishman1)
assignee: zkou (finishman1) → nobody
wujun (wujun) on 2016-05-18
Changed in neutron:
assignee: nobody → wujun (wujun)
wujun (wujun) on 2016-05-18
Changed in neutron:
assignee: wujun (wujun) → nobody
Lubosz Kosnik (diltram) wrote :

Please do not assign that bug if I'm assigned. This patchset is in progress. We need to fix other issue before I'm gonna be able to merge that fix.
Depends-On: https://bugs.launchpad.net/neutron/+bug/1580648

Changed in neutron:
assignee: nobody → Lubosz Kosnik (diltram)
Changed in neutron:
milestone: newton-1 → newton-2
Changed in neutron:
milestone: newton-2 → newton-3
Changed in neutron:
milestone: newton-3 → newton-rc1
Changed in neutron:
milestone: newton-rc1 → ocata-1
Changed in neutron:
milestone: ocata-1 → ocata-2

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/273546
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
assignee: Lubosz Kosnik (diltram) → Artur Korzeniewski (artur-korzeniewski)
Changed in neutron:
milestone: ocata-2 → ocata-3
Changed in neutron:
milestone: ocata-3 → ocata-rc1
Changed in neutron:
assignee: Artur Korzeniewski (artur-korzeniewski) → Brian Haley (brian-haley)
tags: added: ocata-rc-potential

Reviewed: https://review.openstack.org/273546
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=185d6cbc648fd041402a5034b04b818da5c7136e
Submitter: Jenkins
Branch: master

commit 185d6cbc648fd041402a5034b04b818da5c7136e
Author: Lubosz Kosnik <email address hidden>
Date: Thu Jan 28 14:44:00 2016 +0100

    Add support for Keepalived VRRP health check

    Adds functionality to generate bash script which verifies health of current
    keepalived instance by pinging all available and configured GW addresses.
    This functionality supports IPv4 and IPv6 by detecting needed ping version
    using netaddr.

    DocImpact:
    Added a new parameter to 'l3_agent.ini' named
    'ha_vrrp_health_check_interval' which is by default set to 0 (disabled).
    Values > 0 designate health check functionality should be enabled.
    Requires allowed ICMP ECHO_REQUEST because that is disabled by default.

    Co-Authored-By: Artur Korzeniewski <email address hidden>
    Change-Id: Ib4d0691f432830357ea3f113036719645bc59a62
    Closes-Bug: #1365461

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential

This issue was fixed in the openstack/neutron 10.0.0.0rc1 release candidate.

tags: removed: neutron-proactive-backport-potential

Reviewed: https://review.openstack.org/454657
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=01b3733975402525b67906928c069d468ca275d9
Submitter: Jenkins
Branch: stable/newton

commit 01b3733975402525b67906928c069d468ca275d9
Author: Lubosz Kosnik <email address hidden>
Date: Thu Jan 28 14:44:00 2016 +0100

    Add support for Keepalived VRRP health check

    Adds functionality to generate bash script which verifies health of current
    keepalived instance by pinging all available and configured GW addresses.
    This functionality supports IPv4 and IPv6 by detecting needed ping version
    using netaddr.

    DocImpact:
    Added a new parameter to 'l3_agent.ini' named
    'ha_vrrp_health_check_interval' which is by default set to 0 (disabled).
    Values > 0 designate health check functionality should be enabled.
    Requires allowed ICMP ECHO_REQUEST because that is disabled by default.

    Conflicts:
     neutron/conf/agent/l3/ha.py
            neutron/agent/l3/ha.py
            neutron/agent/linux/keepalived.py

    Co-Authored-By: Artur Korzeniewski <email address hidden>
    Change-Id: Ib4d0691f432830357ea3f113036719645bc59a62
    Closes-Bug: #1365461
    (cherry picked from commit 185d6cbc648fd041402a5034b04b818da5c7136e)

tags: added: in-stable-newton

This issue was fixed in the openstack/neutron 9.4.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers