Comment 2 for bug 1644821

Revision history for this message
Andras Olah (andras-olah) wrote : Re: VIP becomes unavailable after its Controller reboot if Zabbix with OVS bridges are used

Hi,

Let me add some info to the analysis. The false positive tests by arping happen because the default Linux settings allow any interface to respond to ARP requests (see e.g., https://lwn.net/Articles/45373/).

Here're the relevant settings in the haproxy namespace:
root@cic-0-1:~# ip netns exec haproxy sysctl net.ipv4.conf.all | grep arp
net.ipv4.conf.all.arp_accept = 1
net.ipv4.conf.all.arp_announce = 0
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.all.arp_notify = 0
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.proxy_arp_pvlan = 0
root@cic-0-1:~# ip netns exec haproxy sysctl net.ipv4.conf.b_management | grep arp
net.ipv4.conf.b_management.arp_accept = 1
net.ipv4.conf.b_management.arp_announce = 0
net.ipv4.conf.b_management.arp_filter = 0
net.ipv4.conf.b_management.arp_ignore = 0
net.ipv4.conf.b_management.arp_notify = 0
net.ipv4.conf.b_management.proxy_arp = 0
net.ipv4.conf.b_management.proxy_arp_pvlan = 0

Therefore, the false positive arping tests happen if there are two VIP addresses managed by ns_IPaddr2 on the same subnet and one of them is up while the other is not accessible due to the OVS interface problem shown in the analysis above. The broadcast ARP requests are received and replied by the "other" VIP, while the ARPed IP is not accessible for normal IP traffic.

In my view, the proper solution would be to add checks to the service start script so that it checks not only that the port exists in the OVS bridge, but it also checks that the port is operational (e.g., the ofport of the corresponding interface is not -1).

As an additional measure, the ARP sysctl settings could be changed in such a way that interfaces respond to ARP requests only with their own IP. In my view, this would make sense in hosts having multiple interfaces on the same subnet.

Andras