allowed-address-pairs broken with l2pop/arp responder and LinuxBridge/VXLAN

Bug #1445089 reported by James Denton
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Mark McClain

Bug Description

Problem:

In Icehouse/Juno, when using ML2/LinuxBridge and VXLAN networks, allowed-address-pairs functionality is broken. It appears to be a case where the node drops broadcast traffic (ff:ff:ff:ff:ff:ff), specifically ARP requests, from an instance.

Steps to reproduce:

1. Create two instances in the same VXLAN network on two different hosts
2. Add a secondary IP address to instance #1, and add it to the port using --allowed-address-pairs
3. Ping from instance #1 to instance #2 using the secondary IP address
4. On the compute node hosting instance #2, observe that the ARP request can be seen on the vxlan interface, but not the parent interface

Steps to resolve:

1. Add static ARP entry to instance #2
2. -OR- Add static ARP entry/neighbor entry to compute node hosting instance #2

The resolutions above become problematic when the allowed addresses are networks rather than single IPs, as in the cases where instances are acting as routers or NFV devices of some kind.

-------------------

Example:

Create network:
neutron net-create testnet
neutron subnet-create testnet 192.168.100.0/24

Create ports, one for each instance:
neutron port-create 56c413ca-6ef1-45c8-a3e5-6241ad24bb23
neutron port-create 56c413ca-6ef1-45c8-a3e5-6241ad24bb23

Add security group and allowed-address-pairs to each port (IP to be shared)
neutron port-update 6d6796cd-455f-4b48-9e1a-8316bd336aa4 --security-group 378e3851-ae7f-40b3-94e3-c05cad5cb56b --allowed-address-pairs type=dict list=true ip_address=192.168.100.254
neutron port-update 0715121b-4cc8-4437-8840-aa74be619c2e --security-group 378e3851-ae7f-40b3-94e3-c05cad5cb56b --allowed-address-pairs type=dict list=true ip_address=192.168.100.254

Boot instances:
nova boot --flavor 2 --image 0af87835-f50f-4461-abaa-b6f088c64744 --nic port-id=6d6796cd-455f-4b48-9e1a-8316bd336aa4 --key_name rpc_support --availability-zone nova:626976-Compute001 20150331-COMP1-TEST
nova boot --flavor 2 --image 0af87835-f50f-4461-abaa-b6f088c64744 --nic port-id=0715121b-4cc8-4437-8840-aa74be619c2e --key_name rpc_support --availability-zone nova:626977-Compute002 20150331-COMP2-TEST

Observe that the proper iptables rules are in place on the compute nodes:

root@Compute001:~# iptables-save | grep 6d6796cd
-A neutron-linuxbri-s6d6796cd-4 -s 192.168.100.254/32 -m mac --mac-source FA:16:3E:BF:B0:A1 -j RETURN
-A neutron-linuxbri-s6d6796cd-4 -s 192.168.100.5/32 -m mac --mac-source FA:16:3E:BF:B0:A1 -j RETURN
-A neutron-linuxbri-s6d6796cd-4 -j DROP

root@Compute002:~# iptables-save | grep 0715121b
-A neutron-linuxbri-s0715121b-4 -s 192.168.100.254/32 -m mac --mac-source FA:16:3E:1C:9D:55 -j RETURN
-A neutron-linuxbri-s0715121b-4 -s 192.168.100.6/32 -m mac --mac-source FA:16:3E:1C:9D:55 -j RETURN
-A neutron-linuxbri-s0715121b-4 -j DROP

Verify that ARP entries exist on the compute nodes (instances can ping each other at fixed IP as expected):

root@Compute001:~# arp -an | grep 192.168.100
? (192.168.100.4) at fa:16:3e:4d:73:7b [ether] PERM on vxlan-2
? (192.168.100.6) at fa:16:3e:1c:9d:55 [ether] PERM on vxlan-2
? (192.168.100.2) at fa:16:3e:d4:53:75 [ether] PERM on vxlan-2
? (192.168.100.3) at fa:16:3e:a6:a4:03 [ether] PERM on vxlan-2

root@Compute002:~# arp -an | grep 192.168.100
? (192.168.100.3) at fa:16:3e:a6:a4:03 [ether] PERM on vxlan-2
? (192.168.100.4) at fa:16:3e:4d:73:7b [ether] PERM on vxlan-2
? (192.168.100.2) at fa:16:3e:d4:53:75 [ether] PERM on vxlan-2
? (192.168.100.5) at fa:16:3e:bf:b0:a1 [ether] PERM on vxlan-2

!!!!! TEST !!!!!

Test: Configure 192.168.100.254 as a secondary address on INSTANCE#1 and ping out to INSTANCE#2

root@20150331-comp1-test:~# ip a a 192.168.100.254/32 dev eth0

root@20150331-comp1-test:~# ping -I 192.168.100.254 192.168.100.6
PING 192.168.100.6 (192.168.100.6) from 192.168.100.254 : 56(84) bytes of data.
^C
--- 192.168.100.6 ping statistics ---
26 packets transmitted, 0 received, 100% packet loss, time 25200ms

Result: Failure to reach destination

!!!!! TROUBLESHOOT !!!!!

Process:
1. Start ping:

root@20150331-comp1-test:~# ping -I 192.168.100.254 192.168.100.6
PING 192.168.100.6 (192.168.100.6) from 192.168.100.254 : 56(84) bytes of data.

2. Dump on vxlan interface on local compute node:

root@Compute001:~# tcpdump -i vxlan-2 -ne
tcpdump: WARNING: vxlan-2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vxlan-2, link-type EN10MB (Ethernet), capture size 65535 bytes
14:22:06.595700 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 28, length 64
14:22:07.603721 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 29, length 64
14:22:08.611701 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 30, length 64
14:22:09.619712 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 31, length 64

3. Dump on parent interface of local compute node:

root@Compute001:~# tcpdump -i bond1.206 -ne
tcpdump: WARNING: bond1.206: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1.206, link-type EN10MB (Ethernet), capture size 65535 bytes
14:31:15.655396 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 4, length 64
14:31:16.663468 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 5, length 64
14:31:17.671412 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 6, length 64
14:31:18.679443 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 7, length 64
14:31:19.687445 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 8, length 64
^C

NOTE: ICMP requests are being sent to 192.168.100.6 from 192.168.100.254 with no response.

4. Dump on parent interface on remote compute node:

root@Compute002:~# tcpdump -i bond1.206 -ne
tcpdump: WARNING: bond1.206: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1.206, link-type EN10MB (Ethernet), capture size 65535 bytes
14:27:12.889311 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 333, length 64
14:27:13.889318 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 334, length 64
14:27:14.889392 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 335, length 64
14:27:15.889315 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 336, length 64
14:27:16.889357 90:e2:ba:73:71:cd > 90:e2:ba:71:3b:1d, ethertype IPv4 (0x0800), length 148: 172.28.240.20.37449 > 172.28.240.21.8472: OTV, flags [I] (0x08), overlay 0, instance 2
fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 337, length 64

5. Dump on bridge interface on remote compute node:

root@Compute002:~# tcpdump -i brq56c413ca-6e -ne
tcpdump: WARNING: brq56c413ca-6e: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on brq56c413ca-6e, link-type EN10MB (Ethernet), capture size 65535 bytes
14:34:00.950062 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:34:00.969137 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 168, length 64
14:34:01.977167 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 169, length 64
14:34:01.977443 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:34:02.974092 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:34:02.985166 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 170, length 64
14:34:03.974131 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:34:03.993172 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 171, length 64
14:34:05.001197 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 172, length 64
14:34:05.001449 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:34:05.998187 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:34:06.009204 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1527, seq 173, length 64

6. Dump on vxlan interface on remote compute node:

root@Compute002:~# tcpdump -i vxlan-2 -ne
tcpdump: WARNING: vxlan-2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vxlan-2, link-type EN10MB (Ethernet), capture size 65535 bytes
14:23:04.052320 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 85, length 64
14:23:04.052704 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:23:05.049944 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:23:05.060333 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 86, length 64
14:23:06.049961 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:23:06.068312 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 87, length 64
14:23:07.076355 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 88, length 64
14:23:07.076655 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:23:08.074033 fa:16:3e:1c:9d:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.100.254 tell 192.168.100.6, length 28
14:23:08.084299 fa:16:3e:bf:b0:a1 > fa:16:3e:1c:9d:55, ethertype IPv4 (0x0800), length 98: 192.168.100.254 > 192.168.100.6: ICMP echo request, id 1521, seq 89, length 64

NOTE: The remote instance is attempting ARP requests for source addr but is getting no response. In fact, the request appears to be dropped through vxlan-2 to its parent, bond1.206..

!!!!! GETTING IT TO WORK !!!!!

1a. Add an ARP entry on instance2

arp -s 192.168.100.254 fa:16:3e:bf:b0:a1

Result: Success!

root@20150331-comp1-test:~# ping -I 192.168.100.254 192.168.100.6
PING 192.168.100.6 (192.168.100.6) from 192.168.100.254 : 56(84) bytes of data.

64 bytes from 192.168.100.6: icmp_seq=455 ttl=64 time=2014 ms
64 bytes from 192.168.100.6: icmp_seq=456 ttl=64 time=1014 ms
64 bytes from 192.168.100.6: icmp_seq=457 ttl=64 time=14.9 ms
64 bytes from 192.168.100.6: icmp_seq=458 ttl=64 time=0.939 ms

1b. -OR- Add an ARP entry to compute02

arp -s 192.168.100.254 fa:16:3e:bf:b0:a1 -i vxlan-2

Result: Success!

root@20150331-comp1-test:~# ping -I 192.168.100.254 192.168.100.6
PING 192.168.100.6 (192.168.100.6) from 192.168.100.254 : 56(84) bytes of data.
64 bytes from 192.168.100.6: icmp_seq=543 ttl=64 time=1.17 ms
64 bytes from 192.168.100.6: icmp_seq=544 ttl=64 time=0.812 ms
64 bytes from 192.168.100.6: icmp_seq=545 ttl=64 time=0.819 ms
64 bytes from 192.168.100.6: icmp_seq=546 ttl=64 time=0.810 ms
64 bytes from 192.168.100.6: icmp_seq=547 ttl=64 time=0.794 ms
64 bytes from 192.168.100.6: icmp_seq=548 ttl=64 time=0.820 ms

tags: added: l2pop
removed: l2population
tags: added: l2-pop
removed: l2pop
Changed in neutron:
status: New → Confirmed
Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

Yeah, broadcasts are working okay, ping -b 255.255.255.255 is seen everywhere. But the VxLAN devices are intercepting all ARP requests, and unfortunately they don't pass them on when they don't know the answer. I can't see how to change this behavior in the ip-link man page. The only solution I can see would be to provide a new option that would allow you to disable proxy ARP when l2_population is enabled.

yalei wang (yalei-wang)
Changed in neutron:
assignee: nobody → yalei wang (yalei-wang)
Revision history for this message
yalei wang (yalei-wang) wrote :

I think it's because the code missed to add "neighbor" for ip in address-pairs

Revision history for this message
James Denton (james-denton) wrote :

What I would be looking for is more of a learned ARP behavior, not a programmed ARP. Especially if the use case is to add a subnet as an allowed address pair, or if you expect two or more ports to allow a particular IP (such as a floating IP or VIP). Adding a neighbor entry would not solve those use cases. I think Darragh's idea of disabling proxy ARP when creating the vxlan interface using a config option may be the way to go here until some changes are made to the vxlan module, if ever.

Revision history for this message
yalei wang (yalei-wang) wrote :

yes, I agree, I also think about how to implement when address-pairs is a subnet. It is difficult to add all the IPs one by one. and ebtables not support extension like ipset too.

If we add a config option to enable the ARP proxy of not, does it equal that enable/disable l2-pop dynamicly ?
even though we have this kind of option, we cannot control the packages with fine-grain.

And I think the same problem should be in OVS/vxlan too. and feature like port-security could not work with l2pop too.

Revision history for this message
yalei wang (yalei-wang) wrote :

seem 'ip link set' have no optional config to remove the vxlan-xxx device 's proxy feature.

Revision history for this message
Shaival Chokshi (schokshi) wrote :

I am seeing this bug in Kilo release. I believe it exist in master as well. Can someone confirm?

Revision history for this message
Brad Behle (behle) wrote :

We just recreated this in master using devstack and Linux Bridge ML2 plugin. There was a suggestion above that one option was to disable proxy arp on the vxlan interface, however looking at the environment I created, proxy_arp is disabled on all the interfaces on the compute nodes.

In my environment, the ARP request for the address pair (Request who-has 192.168.100.254 tell 192.168.100.6) is actually getting to the tap interface for the vm on the compute node, but not showing up on the vm's eth0 interface.

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

The proxy mode is just an optimization. Since l2pop is populating the neighbor table with the mapping between IP addresses and MAC addresses, the vxlan module is able to answer ARP requests instead of broadcasting them in the entire network. see [1] for explanations.

Using learning instead of population would be wonderful, but it seems unavailable for unicast VXLan networks, only for multicast vxlan networks [2].

As stated in [1], currently, it seems that the use of the vxlan module in the context of overlays networks must be combined with a "sdn controller" able to populate neighbor and fdb tables proactively (l2pop, bgp e-vpn) or by reacting to l2miss/l3miss netlink messages (IBM Dove).

[1]https://www.youtube.com/watch?feature=player_detailpage&v=leYZPMMleQI#t=805
[2]https://www.youtube.com/watch?feature=player_detailpage&v=leYZPMMleQI#t=457

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

After few tests, and a short conversation with françois who dug into the code of the vxlan module, it seems that learning works even with vxlan unicast networks.

I probably misundestood the previous video or/and it seems some improvements have been made on the vxlan module since the video has been posted.

As yalei said eralier, the issue comes from the proxy mode. If i disable this mode, fdb/neighbor tables are correctly populated, thanks to ARP requests.

@brad : can you tell us more on your test setup? did you remove the "proxy" mode of vxlan interfaces by modifying the code of the linuxbridge agent? As said yalei, it doesn't seem that this mode can be disabled dynamically.

As already proposed, a possible workaround would be to add a flag to disable the proxy mode, but of course, the consequence would be to flood every tunnel with ARP requests.
A long term alternative would be to have a fallback mode in the vxlan module so that vxlan interfaces with proxy mode set can fallback to classical ARP learning when no entry exists in the neighbor table.

Revision history for this message
Brad Behle (behle) wrote :

I'm running this scenario as is with no code modifications. I looked at the /proc/sys/net/ipv4/conf/*/proxy_arp files and all show 0 meaning disabled. But it sounds like you might be talking about a different way that the linuxbridge agent handles arp proxying? If you can describe what I would need to remove or disable I will give it a try.

Revision history for this message
Brad Behle (behle) wrote :

Okay, I found the code to modify in /opt/stack/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py. I commented out the line that adds the "proxy" parameter to the "ip link add vxlan-XXXX type vxlan ..." command that creates the vxlan device, so arp proxy is not enabled for new vxlans when recreated. I then created the network, subnet, etc, and created a few vms as in the recreation steps above, and it did in fact allow the ping the allowed address pair address.

Revision history for this message
Brad Behle (behle) wrote :

I have been unable to find a way to fix this problem, other than simply not enabling arp proxy by hacking the code as mentioned above. And doing that defeats the purpose of having L2Population since the ARP requests then get sent out anyway.

It seems to me that there isn't a good way to fix this bug. It is more of a design gap when L2Population and allowed address pairs are used together.
    a) L2Population requires that all IP addresses are put in the arp tables of the compute nodes, since the vxlan devices on the compute nodes do not forward arp requests
    b) Allowed address pairs enables an IP address that can dynamically move between VMs without Neutron being notified. This means Neutron can not add the required arp table entries for this movable IP.

Both these functions work on their own, but don't work when both used together. The best possible solution I've heard is what Mathieu suggested above: "A long term alternative would be to have a fallback mode in the vxlan module so that vxlan interfaces with proxy mode set can fallback to classical ARP learning when no entry exists in the neighbor table."

Does anyone else have suggestions for possible solutions to this?

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

@brad I don't think that disabling the ARP proxy feature is "defeating the purpose of having L2Population". L2population does two things :
-1/ Partial-mesh : it creates vxlan tunnels dynamically and efficiently, without having to create full-meshed tunnels between every nodes. For instances, if l2pop is not turned on with the ovs agent, vxlan tunnels will exist on each nodes, even if nodes are not hosting any VM in that vxlan network segment.
-2/ ARP responder : it populates arp tables to avoid flooding of arp requests in the network fabric.

So removing the "proxy" mode for vxlan interfaces will only disable the second feature of l2pop.

I think a reasonable patch would be to add a config parameter to disable the proxy mode for the linux bridge agent. This config parameter already exists for the ovs agent, since ovs didn't have the ARP responder feature by the time l2pop has been implemented. It is called "arp_responder" in the [agent] section of the config file of the ovs agent. It might be a good idea to reuse the same parameter name in this context.

Then, having this config parameter set to "false" or "true" by default depends on which feature we want to prioritize.

I would go for having it set to "false" by default, since being able to implement HA scenario with a VIP when l2pop is in use seems more important than getting rid of ARP broadcasting messages in the network fabric.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/278597

Changed in neutron:
assignee: yalei wang (yalei-wang) → Mark McClain (markmcclain)
status: Confirmed → In Progress
Revision history for this message
Jeroen Grusewski (t-jeroen) wrote :

I am running liberty, and it seems that I have the same issue as well.

tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/278597
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bbd881f3a970143e1954cb277e5235dddd26c5d0
Submitter: Jenkins
Branch: master

commit bbd881f3a970143e1954cb277e5235dddd26c5d0
Author: Mark McClain <email address hidden>
Date: Wed Feb 10 13:28:21 2016 -0500

    add arp_responder flag to linuxbridge agent

    When the ARP responder is enabled, secondary IP addresses explicitly
    allowed by via the allowed-address-pairs extensions do not resolve.
    This change adds the ability to enable the local ARP responder similar
    to the feature in the OVS agent. This change disables local ARP
    responses by default, so ARP traffic will be sent over the overlay.

    DocImpact
    UpgradeImpact

    Change-Id: I5da4afa44fc94032880ea59ec574df504470fb4a
    Closes-Bug: 1445089

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b3

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/288050

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/288050
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c823e8ccb951a4abdf247dd094a794b7741d7cca
Submitter: Jenkins
Branch: stable/liberty

commit c823e8ccb951a4abdf247dd094a794b7741d7cca
Author: Mark McClain <email address hidden>
Date: Wed Feb 10 13:28:21 2016 -0500

    add arp_responder flag to linuxbridge agent

    When the ARP responder is enabled, secondary IP addresses explicitly
    allowed by via the allowed-address-pairs extensions do not resolve.
    This change adds the ability to enable the local ARP responder similar
    to the feature in the OVS agent. This change disables local ARP
    responses by default, so ARP traffic will be sent over the overlay.

    DocImpact
    UpgradeImpact

    Change-Id: I5da4afa44fc94032880ea59ec574df504470fb4a
    Closes-Bug: 1445089
    (cherry picked from commit bbd881f3a970143e1954cb277e5235dddd26c5d0)
    Signed-off-by: Kevin Carter <email address hidden>

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

tags: removed: liberty-backport-potential
Revision history for this message
Shashank Jain (jain-sm) wrote :

We have a similar issue. The difference is that we have two different L3 networks and using allowed address pair between them. Does Openstack enforce some limitation for security or otherwise to not allow allowed address pair span subnets (we derive IP from one subnet and on failover we do a GARP from other subnet VM).
Thanks
Shashank

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.