Default gateway can vanish from HA routers, destroying external connectivity for all VMs on that network

Bug #1404945 reported by Assaf Muller
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Assaf Muller
Juno
Fix Released
Undecided
Unassigned

Bug Description

The default gateway can vanish from the HA router namespace after certain operations.

My setup:
Fedora 20
keepalived-1.2.13-1.fc20.x86_64
Network manager turned off.

I can reproduce this reliably on my system, but cannot reproduce this on a RHEL 7 system. Even on that system, the issue manifests on its own, I just can't reproduce it at will.

How I reproduce on my system:
Create an HA router
Set it as a gateway
Go to the master instance
Observe that the namespace has a default gateway
Add an internal interface (Make sure that the IP is 'lower' than the IP of the external interface, this is explained below)
Default gateway will no longer exist

Cause:
keepalived.conf has two sections for VIPs: virtual_ipaddress, and virtual_ipaddress_excluded. The difference is that any VIPs that go in the first section will be propagated on the wire, and any VIPs in the excluded section do not. Traditional configuration of keepalived places one VIP in the normal section, henceforth known as the 'primary VIP', and all other VIPs in the excluded section. Currently the keepalived manager does this by sorting the VIPs (Internal IPs, external SNAT IP, and all floating IPs), placing the lowest one (By string comparison) as the primary, and the rest of the VIPs in the excluded section:
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/keepalived.py#L155

That code is ran, and keepalived.conf is built when ever a router is updated. This means that the primary VIP can change on router updates. As it turns out, after a conversation with a keepalived developer, keepalived assumes that the order does not change (This is possibly a keepalived bug, depending on your view on life, the ordering of the stars when keepalived is executed and the wind speed in the Falkland Islands in the past leap year). On my system, with the currently installed keepalived version, whenever the primary VIP changes, the default gateway (Present in the virtual_routes section of keepalived.conf) is violently removed.

Possible solution:
Make sure that the primary VIP never changes. For example: Fabricate an IP per HA router cluster (Derived from the VRID?), add it as a VIP on the HA device, configure it as the primary VIP. I played around with a hacky variation of this solution and I could no longer reproduce the issue.

Assaf Muller (amuller)
Changed in neutron:
assignee: nobody → Assaf Muller (amuller)
Revision history for this message
Tuomas Juntunen (tuomas-juntunen) wrote :

I am experiencing the same problem. I have found out that whenever I add the default route first, and the subnet port to the router after that, the default route disappears.

-----

When using VRRP routers with multiple network nodes, when adding gateway before adding a port to subnet, the default route disappears.

router-gateway-set 707049a1-037f-4db3-8df9-6cf0d4c9d786 f7a48a9c-651d-4dcd-9168-712697b0911d

Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 qg-d72a993b-cd
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 qg-d72a993b-cd
169.254.192.0 0.0.0.0 255.255.192.0 U 0 0 0 ha-f613be73-c5

At this point, everything is ok.

When adding the interface to subnet

router-interface-add ha_router 020f3181-ac94-4ebe-baeb-8aac27ab691d

Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 0.0.0.0 255.255.255.224 U 0 0 0 qg-d72a993b-cd
169.254.192.0 0.0.0.0 255.255.192.0 U 0 0 0 ha-f613be73-c5
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 qr-724eb301-69

The default route has disappeared and external connectivity is broken.

Revision history for this message
Tuomas Juntunen (tuomas-juntunen) wrote :

I am using Ubuntu 14.04 and newest Juno updates.

Assaf Muller (amuller)
Changed in neutron:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/143714

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/143714
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ea587e113a838f74efed9a04c78c0ff3d860d04b
Submitter: Jenkins
Branch: master

commit ea587e113a838f74efed9a04c78c0ff3d860d04b
Author: Assaf Muller <email address hidden>
Date: Tue Dec 23 13:52:41 2014 +0200

    Make L3 HA VIPs ordering consistent in keepalived.conf

    Currently the order of VIPs in keepalived.conf is determined
    by sorting the VIPs whenever one is added or removed. As it
    turns out, keepalived doesn't like it when the primary VIP
    changes. One side effect is that virtual routes, in our case
    the router's default route, may be removed.

    This patch fabricates an IP address on the router's HA interface
    and uses it as the primary VIP.

    Closes-Bug: #1404945
    Change-Id: I993daf594a28918de6fafff465f5f40e7b89305e

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/148329

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/148329
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=dbc630ae2f58dd81c0b44208059e4691eb14eaab
Submitter: Jenkins
Branch: stable/juno

commit dbc630ae2f58dd81c0b44208059e4691eb14eaab
Author: Assaf Muller <email address hidden>
Date: Tue Dec 23 13:52:41 2014 +0200

    Make L3 HA VIPs ordering consistent in keepalived.conf

    Currently the order of VIPs in keepalived.conf is determined
    by sorting the VIPs whenever one is added or removed. As it
    turns out, keepalived doesn't like it when the primary VIP
    changes. One side effect is that virtual routes, in our case
    the router's default route, may be removed.

    This patch fabricates an IP address on the router's HA interface
    and uses it as the primary VIP.

    Closes-Bug: #1404945
    Change-Id: I993daf594a28918de6fafff465f5f40e7b89305e
    (cherry picked from commit ea587e113a838f74efed9a04c78c0ff3d860d04b)
    Conflicts:
     neutron/tests/functional/agent/test_l3_agent.py
     neutron/tests/unit/test_l3_agent.py

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.