[linuxbridge agent] vm can't communicate with router with l2pop

Bug #1661717 reported by Xiang Wang on 2017-02-03
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
High
Unassigned

Bug Description

When both l2pop and arp_responder enabled for linuxbridge agent, vxlan device is created in "proxy" mode. In this mode, ARP entry must be statically added by linuxbridge agent. Because of [1], l2pop driver won't notify HA router port, so linuxbridge agent can't add ARP entry for router port. As there is no router ARP entry, vxlan device is dropping ARP request packets from vm(destined to router), making vm unable to communicate with router.

This issue is only on linuxbridge agent and not on ovs agent.
Temporary solution for vm to communicate with HA router is to disable arp_responder when l2pop is enabled.
If the users need both arp_responder and l2pop features for linuxbridge agent, we need an implementation which decouples them i.e https://bugs.launchpad.net/neutron/+bug/1518392

[1] https://review.openstack.org/#/c/255237/

Assaf Muller (amuller) wrote :

This is intentional. We don't use l2pop to teach agents about internal networks connected to HA routers. The agent is supposed to learn these addresses via normal switching / MAC learning.

Are you seeing any connectivity issues?

Xiang Wang (wangxian) wrote :

Yes we are seeing connectivity issues. Instances on the internal network can not reach external network. We think this is due to the missing arp entry on the compute host. Therefore instances do not have enough information to forward the traffic. Once we manually add the missing arp entry to the arp table, connectivity starts to work.

We have 'arp_responder = True' configured. How does switching/MAC learning supposed to work when arp_responder is enabled? The VXLAN device would respond to arping locally and the arping would not flood the network.

Xiang Wang (wangxian) wrote :

https://github.com/openstack/neutron/blob/stable/newton/neutron/plugins/ml2/models.py#L86

Quoted from the link above, "Currently DEVICE_OWNER_ROUTER_SNAT(DVR+HA router), DEVICE_OWNER_DVR_INTERFACE, DEVICE_OWNER_HA_REPLICATED_INT are distributed router ports."

So maybe the filtering is correct but the port creation is incorrect for assigning device_owner=network:ha_router_replicated_interface to a nondistributed and HA router's port.

Can you confirm which should be the correct behavior?

Assaf Muller (amuller) wrote :

I believe the missing piece of the puzzle is that this is happening with the Linux Bridge agent.

tags: added: l2-pop l3-ha linuxbridge
Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
description: updated
summary: - L2pop filters out port info for HA router internal interface when
- sending out notification
+ [linuxbridge agent] vm can't communicate with router with l2pop
Changed in neutron:
milestone: none → pike-1
Changed in neutron:
importance: Undecided → Medium
importance: Medium → High
Xiang Wang (wangxian) wrote :

Temporary workaround is to set arp_responder=false to allow ARP learning.

Changed in neutron:
status: New → In Progress
Changed in neutron:
milestone: pike-1 → pike-2
Changed in neutron:
assignee: venkata anil (anil-venkata) → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers