Arp_responder: two VMs share one VIP, but the arp table not updated with GARP.

Bug #1928738 reported by mark zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

Consult with one problem:
I'm not sure, if there already exist enhancement for such scenario.
Please help check, how to update router the new MAC by GARP with arp_responder enabled, thanks a lot.

Problem scenario openstack with neutron:
1. VM A and VM B works in active-standby mode, with one floating IP.
2. If VM A, is one active one, then the floating IP is resident on the VM A.
3. If switchover happened, the floating IP will be added into VM B, and VM B will send-out GARP to update the router, the IP resident on new MAC now.
4. But with arp_responder enabled, the GARP couldn't reach to router.

[root@overcloud-ovscompute-pl-36-4 ~]# grep -r 'arp_responder' /var/lib/config-data/neutron/
/var/lib/config-data/neutron/etc/neutron/plugins/ml2/openvswitch_agent.ini:arp_responder=True
/var/lib/config-data/neutron/etc/puppet/hieradata/ovscompute_extraconfig.json: "neutron::agents::ml2::ovs::arp_responde ": true
/var/lib/config-data/neutron/etc/puppet/hieradata/service_configs.json: "neutron::agents::ml2::ovs::arp_responder": true,

tcpdump in VM level when VM switchover happened; (10.0.0.7 is one floating IP, fa:16:3e:b6:d7:5c is the MAC of VM B, fa:16:3e:63:d0:39 is the MAC of VM A).
07:43:14.216913 ARP, Request who-has 10.0.0.7 (Broadcast) tell 10.0.0.7, length 28
07:43:14.216939 ARP, Reply 10.0.0.7 is-at fa:16:3e:b6:d7:5c, length 28
07:43:14.217131 ARP, Reply 10.0.0.7 is-at fa:16:3e:63:d0:39, length 28
07:43:14.217155 ARP, Reply 10.0.0.7 is-at fa:16:3e:63:d0:39, length 28

Thanks,
Mark

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I think that this is the same issue as https://launchpad.net/bugs/1774459 which is opened for very long time already.
One question - are You using dvr or centralized routers in Your case?

Revision history for this message
LIU Yulong (dragon889) wrote :

Precisely, floating IP is binding to a fixed IP of one VM port's. And the GARP will be send to out side world with MAC of router external gateway (qg-device) mac or floating agent gateway (fg-device). Nothing is related to VM port mac. So for your "Problem scenario", how could "the floating IP will be added into VM B" work? Your local system will re-bind the floating IP to the VM B's port?
If you enable the VIP (allowed_address_pair) to the VM A port and VM B port, I guess you missed something like add allowed_address_pair (VIP) to the VM A port and VM B port. Please consider redescribing the problem in more details.

Some clarifications:
1. arp_responder is for VM port's MAC answer (local) proxy
2. GARP is for floating IP resident device MAC notification

Changed in neutron:
status: New → Incomplete
Revision history for this message
mark zhang (mzhan017) wrote :

Hello Yulong & Kaplonski,

For the question about router type: I'm asking our vlab team to confirm.

By checking the open-stack dashboard, the floating IP(10.0.0.7) is added as one fixed IP for VM-A;
VM-A port with allowed_address_pair (10.0.0.0/24 - fa:16:3e:63:d0:39)
VM-B port with allowed_address_pair (10.0.0.0/24 - fa:16:3e:b6:d7:5c)

When switchover happened, our application in the VM-B, will do open one raw socket, and send out one GARP with the floating IP (10.0.0.7) with MAC of VM-B on the interface, to notice the router ARP update.

Sorry I don't understand the "And the GARP will be send to out side world with MAC of router external gateway (qg-device) mac or floating agent gateway (fg-device)".

Our system works before, until the openstack upgrade with arp_responder enabled.

Thanks,
Mark

Revision history for this message
mark zhang (mzhan017) wrote :

I found one network arch from our document.
The GARP sent by our application should go out to external, by path eth0(VM-B)->vnet0->qbrxxx->br-int->br-eth1-> external switch->router.

Please help check, which component send back the replay for GARP from Neutron point of view, in case arp_responder enabled.

Thanks,
Mark

Changed in neutron:
status: Incomplete → New
Revision history for this message
mark zhang (mzhan017) wrote :

Please help check, which component send back the reply for GARP from Neutron point of view, in case arp_responder enabled.
And is there any workaround for such scenario when arp_responder enabled?

Thanks,
Mark

Revision history for this message
LIU Yulong (dragon889) wrote :

In neutron floating IP is an exclusive resource name, it is IP from external network and can be binded to VM's port by floating_ips API. So your case is VIP (virtual IP) between two VM ports.

summary: - Arp_responder: two VMs share one floating IP, but the arp table not
- updated with GARP.
+ Arp_responder: two VMs share one VIP, but the arp table not updated with
+ GARP.
Revision history for this message
LIU Yulong (dragon889) wrote :

So as Slawek mentioned, it is a known issue https://bugs.launchpad.net/neutron/+bug/1774459 with some duplicates:

https://bugs.launchpad.net/neutron/+bug/1821357
https://bugs.launchpad.net/neutron/+bug/1873375
https://bugs.launchpad.net/neutron/+bug/1859638

One more thing, if arp_responder is enabled, the ARP packets will not go out to the physical devices if vxlan or tunnel is enabled.

This could be a complicated issue, you can try to use "ovs-appctl ofproto/trace" to simulate the GARP packet coming from VM port to trace where it goes.

Revision history for this message
mark zhang (mzhan017) wrote :

Yulong,
Thanks for your help.

Thanks,
Mark

Revision history for this message
mark zhang (mzhan017) wrote :

By dump the: br-tun, we found there is one arp rule in table-21:
[root@overcloud-controller-pl-36-1 ~]# ovs-ofctl dump-flows br-tun | grep "fa:16:3e:63:d0:39"
 cookie=0x56431b3cae6a278, duration=1226228.810s, table=21, n_packets=23, n_bytes=966, idle_age=65534, hard_age=65534, priority=1,arp,dl_vlan=35,arp_tpa=10.0.0.7 actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e63d039->NXM_NX_ARP_SHA[],load:0xa000007->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:63:d0:39,IN_PORT

That matches the GARP and detail explanation from link:
https://wiki.openstack.org/wiki/Ovs-flow-logic#OVS_flows_logic_with_local_ARP_responder

Thanks,
Mark

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.