linuxbridge : HA routers interact badly with l2pop

Bug #1411752 reported by Mathieu Rohon
This bug report is a duplicate of:  Bug #1365476: HA routers interact badly with l2pop. Edit Remove
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Fix Committed
Undecided
Unassigned

Bug Description

This bug comes from a split of the bug #1365476 which will now be dedicated to an OVS fix

There is big difference between OVS and LB when using vxlan tunnels managed by l2pop :

-In OVS, vxlan tunnels are plugged into the br-tun bridge. L2pop messages will manage the tunnel creation, and, if configured in the agent, will add an ARP responding entry. If no ARP entry is matched, ARP packets will be flooded to every vxlan tunnel, and the correct tunnel will be learned.

-In LB, when l2pop is activated, vxlan tunnel ports are created with the "mode proxy" to activate the ARP responder, populated by l2pop. If a ARP packet doesn't match any entry in the ARP responder table of any of the vxlan tunnel ports, ARP packets are dropped. There is no fallback mode which would flood the packet to every vxlan tunnels to learn the correct tunnel for the following packets, and populate the ARP table.

A possible implementation to fix this issue would be to add a multi-bound flag to fdb entries that correspond to a port which is hosted on several hosts (such as HA routers' ports).
The corresponding fdb message would looks like this :

{net_id:
  {port:
    {agent_ip1 :
      {mac1, ip1, multi-bound}
    }
    {agent_ip2 :
      {mac1, ip1, multi-bound}
    }
  }
   network_type:
     vxlan,
   segment_id:
     id
 }

When the LB agent will receive this fdb message, it will populate the corresponding ARP responder entry (through the "ip neigh replace" command), but won't populate the fdb entry (through the "bridge fdb add" command).
This will result in having packets to HA router ports flooded to every vxlan tunnels. Once the first response will be received, the vxlan kernel module will learn on which vxlan tunnel the following packets have to be sent.

Revision history for this message
Mike Kolesnik (mkolesni) wrote :

Perhaps 'distributed' would be a better name for this field?

Revision history for this message
James Denton (james-denton) wrote :

This same issue is also seen when using allowed-address-pairs functionality with LinuxBridge/VXLAN. Packets from instances using an IP desribed as an 'allowed address' and not the fixed IP get dropped due to the lack of a learning mechanism. Affects Icehouse and Juno.

Revision history for this message
Phil Hopkins (phil-hopkins-a) wrote : Re: [Bug 1411752] Re: linuxbridge : HA routers interact badly with l2pop

The OVS version of this bug is:
https://bugs.launchpad.net/neutron/+bug/1365476

and it seems that the OVS version is the only one being worked on. Interesting that about the address pairs. You might ping Kyle Meserly about this to try to get some priority on it.

Phil Hopkins RHCA CMDBA
Openstack Instructor
Rackspace Hostingtm
(210) 312-3584

________________________________________
From: <email address hidden> <email address hidden> on behalf of James Denton <email address hidden>
Sent: Wednesday, April 15, 2015 9:10 AM
To: Phil Hopkins
Subject: [Bug 1411752] Re: linuxbridge : HA routers interact badly with l2pop

This same issue is also seen when using allowed-address-pairs
functionality with LinuxBridge/VXLAN. Packets from instances using an IP
desribed as an 'allowed address' and not the fixed IP get dropped due to
the lack of a learning mechanism. Affects Icehouse and Juno.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1411752

Title:
  linuxbridge : HA routers interact badly with l2pop

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  This bug comes from a split of the bug #1365476 which will now be
  dedicated to an OVS fix

  There is big difference between OVS and LB when using vxlan tunnels
  managed by l2pop :

  -In OVS, vxlan tunnels are plugged into the br-tun bridge. L2pop
  messages will manage the tunnel creation, and, if configured in the
  agent, will add an ARP responding entry. If no ARP entry is matched,
  ARP packets will be flooded to every vxlan tunnel, and the correct
  tunnel will be learned.

  -In LB, when l2pop is activated, vxlan tunnel ports are created with
  the "mode proxy" to activate the ARP responder, populated by l2pop. If
  a ARP packet doesn't match any entry in the ARP responder table of any
  of the vxlan tunnel ports, ARP packets are dropped. There is no
  fallback mode which would flood the packet to every vxlan tunnels to
  learn the correct tunnel for the following packets, and populate the
  ARP table.

  A possible implementation to fix this issue would be to add a multi-bound flag to fdb entries that correspond to a port which is hosted on several hosts (such as HA routers' ports).
  The corresponding fdb message would looks like this :

  {net_id:
    {port:
      {agent_ip1 :
        {mac1, ip1, multi-bound}
      }
      {agent_ip2 :
        {mac1, ip1, multi-bound}
      }
    }
     network_type:
       vxlan,
     segment_id:
       id
   }

  When the LB agent will receive this fdb message, it will populate the corresponding ARP responder entry (through the "ip neigh replace" command), but won't populate the fdb entry (through the "bridge fdb add" command).
  This will result in having packets to HA router ports flooded to every vxlan tunnels. Once the first response will be received, the vxlan kernel module will learn on which vxlan tunnel the following packets have to be sent.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1411752/+subscriptions

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

Hi,

latest patch from mike :
https://review.openstack.org/#/c/141114/

It uses the control plane to correctly bind the HA port.
It should also fix LB bug but I have to test it.

Revision history for this message
James Denton (james-denton) wrote :

Hi Mathieu,

Thanks for the update. The patch looks like it will address the HA router issue, but allowed-address-pairs would still be a problem. Should I open a new bug for that?

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

Hi james,

this patch is dedicated to issues running HA router with l2pop.
If your bug has nothing to do with HA router, it makes sense to open a dedicated bug.

Please, fill the exact config (l2pop, LB...) and API call used, so that I can reproduce it easily. And, please subscribe me to the bug :)

Assaf Muller (amuller)
tags: added: l3-ha
Assaf Muller (amuller)
Changed in neutron:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.