L3 HA routers have IPv6 link local address on devices, periodically send traffic, moving MACs around and disrupting traffic

Bug #1403860 reported by Assaf Muller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Assaf Muller
Juno
Fix Released
High
Assaf Muller

Bug Description

In the HA routers case we place the same Neutron port on all HA router instances. This means that they share the same MAC and IP addresses. We configure all IP addresses in keepalived.conf so that keepalived takes care to move the IP addresses, and configure them only on the master instance. The MAC address, however, is present on all HA router devices on all network nodes, and so is the IPv6 link local address that is generated from that MAC address. This means that we have an active (IPv6) address in multiple places in the network. Any traffic generated from said address on a standby node will change the MAC tables of the underlay network, causing it to think that the MAC address has moved from the master instance to any of the standbys. This causes network disruption.

Severity / reproduction:
Create an HA router on a setup with 3 network nodes. The HA router is created on all nodes. Connect it to an internal and external network. Create an instance and configure it with a floating IP. Ping the floating IP: Every two minutes, we've observed the standby nodes sending an ICMPv6 multicast listener report. The MAC address of the external interface of the master router will now move (From the perspective of the underlay), causing traffic to not reach the correct (Master) node. After 30 seconds of packet loss the client will re-issue an ARP request for the IPv4 address, which the master will answer, moving the MAC back and fixing the issue. This repeats every 2 minutes, with 30 seconds of packet loss, resulting in 75% up-time. Note: I think we can do better than 75%.

Solutions:
The sledgehammer solution would be to shut down all NICs on standby routers and open them on the master instance using the keepalived notifier scripts. In the spirit of keeping these scripts as lightweight as possible, I'd like to solve this issue instead by handling the IPv6 link local address like we do with IPv4 addresses: Not configuring them on the device, but adding them as a VIP to keepalived.conf and let keepalived configure the address on the master node only.

Assaf Muller (amuller)
Changed in neutron:
assignee: nobody → Assaf Muller (amuller)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/142843

Changed in neutron:
status: New → In Progress
Assaf Muller (amuller)
description: updated
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/142843
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c9698ca0f7a6ae5e55b8b95f900e859581046c6b
Submitter: Jenkins
Branch: master

commit c9698ca0f7a6ae5e55b8b95f900e859581046c6b
Author: Assaf Muller <email address hidden>
Date: Thu Dec 18 16:25:54 2014 +0200

    Configure IPv6 LLADDR only on master L3 HA instance

    HA standby routers must never transmit traffic from
    any of their ports. This is because we allocate the same
    port on all agents. For example, for a given external interface,
    we place the same port with the same IP/MAC on every agent
    the HA router is scheduled on. Thus, if a standby router
    transmits data out of that interface, the physical switches
    in the datacenter will re-learn the MAC address of the external
    port, and place it on a port that's looking at a standby and
    not at the master. This causes 100% packet loss for any incoming
    traffic that should be going through the master instance of the
    router.

    Keepalived manages addresses on the router interfaces, and makes
    sure that these addresses only live on the master. However, we
    forgot about IPv6 link local addresses. They are generated
    from the MAC address of the interface, and thus are identical on
    all agents.

    This patch tries to treat IPv6 link local addresses the same
    as IPv4 addresses - define them as VIPs and let keepalived
    move them around.

    Closes-Bug: #1403860
    Change-Id: Ia5071552239c9444c5105a150b268fb0437e4b85

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/154609

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/154609
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a80999ea276ad05f890ea38339b2483aa6bb3fc5
Submitter: Jenkins
Branch: stable/juno

commit a80999ea276ad05f890ea38339b2483aa6bb3fc5
Author: Assaf Muller <email address hidden>
Date: Thu Dec 18 16:25:54 2014 +0200

    Configure IPv6 LLADDR only on master L3 HA instance

    HA standby routers must never transmit traffic from
    any of their ports. This is because we allocate the same
    port on all agents. For example, for a given external interface,
    we place the same port with the same IP/MAC on every agent
    the HA router is scheduled on. Thus, if a standby router
    transmits data out of that interface, the physical switches
    in the datacenter will re-learn the MAC address of the external
    port, and place it on a port that's looking at a standby and
    not at the master. This causes 100% packet loss for any incoming
    traffic that should be going through the master instance of the
    router.

    Keepalived manages addresses on the router interfaces, and makes
    sure that these addresses only live on the master. However, we
    forgot about IPv6 link local addresses. They are generated
    from the MAC address of the interface, and thus are identical on
    all agents.

    This patch tries to treat IPv6 link local addresses the same
    as IPv4 addresses - define them as VIPs and let keepalived
    move them around.

    Closes-Bug: #1403860
    Change-Id: Ia5071552239c9444c5105a150b268fb0437e4b85
    (cherry picked from commit c9698ca0f7a6ae5e55b8b95f900e859581046c6b)
    Conflicts:
     neutron/agent/l3/agent.py
     neutron/tests/functional/agent/test_l3_agent.py

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.