OpenVSwitch with LACP sometimes stops accepting ARPs.
Fuel version 4.0 somehow customised. Ubuntu cluster.
This was reproduced only on a particular customer deployment with dual 10G NICs. The behavior is as follows:
1 - Bond is up and operational and acting normal
2 - Several hours (or even days) pass
3 - Connectivity on all bridges attached to this bond fail. ARP requests go out, but reply is filtered by OVS.
Example error in log:
2014-01-21T23:00:39Z|04956|ofproto_dpif|WARN|in_port(3),eth(src=2e:f3:36:37:29:40,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.119.238.65,tip=10.119.238.34,op=1,sha=2e:f3:36:37:29:40,tha=00:00:00:00:00:00): inconsistency in subfacet (actions were: push_vlan(vid=30,pcp=0),1,13,16,5) (correct actions: push_vlan(vid=30,pcp=0),1,12,16,5)
4 - Issue continues to exist indefinitely
Workaround: ifconfig (phys interface) down, service openvswitch restart, then ifconfig (phys interface) up
One proposed workaround is to try deploying a newer version of OpenVSwitch to see if it fixes the issue.