[DVR] Recovery from openvswitch restart fails when veth are used for bridges interconnection

Bug #1877977 reported by Slawek Kaplonski
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
Unassigned

Bug Description

In case of DVR routers, when use_veth_interconnection is set to True, when openvswitch service is restared, recovery from that isn't done correctly and FIPs aren't reachable until neutron-ovs-agent is restarted.

All works fine when patch ports are used for interconnection.

Revision history for this message
Bence Romsics (bence-romsics) wrote :
Download full text (3.8 KiB)

I managed to reproduce this and also noticed that the reproduction is indeterministic. Sometimes connectivity recovers after ovs restart, other times it does not. Both cases are quite frequent, so they can be easily caught.

For the record this is the exact reproduction:

# the default is to not use veth pairs, check that we don't have them at start
$ sudo ip l | egrep phy-br-ex
[nothing]

# change the config to use veth interconnections
$ vim /etc/neutron/plugins/ml2/ml2_conf.ini
[ovs]
use_veth_interconnection = True

$ sudo systemctl restart devstack@neutron-agent

# now we have veth interconnections
$ sudo ip l | egrep phy-br-ex
37: phy-br-ex@int-br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
38: int-br-ex@phy-br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
$ sudo ethtool -S phy-br-ex
NIC statistics:
     peer_ifindex: 38

# boot vm with floating ip
$ openstack server create vm0 --flavor cirros256 --image cirros-0.4.0-x86_64-disk --nic net-id=private --wait
$ openstack floating ip create --port "$( openstack port list --device-id "$( openstack server show vm0 -f value -c id )" -f value -c id | head -1 )" public -f value -c floating_ip_address
172.24.4.211

# start ping and keep it running, while...
$ ping 172.24.4.211

# ... we restart ovs
$ sudo systemctl restart openvswitch-switch

In some cases ping recovers in a few seconds. In other cases it never recovers.

flow diff for br-int (.0 is the working state before ovs restart, .1 is when ping did not recover):

# diff -u <( cat dump-flows.br-int.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-int.1 | cut -d ' ' -f4,8- | sort )
--- /dev/fd/63 2020-05-18 13:25:50.235895198 +0000
+++ /dev/fd/62 2020-05-18 13:25:50.239895241 +0000
@@ -4,8 +4,12 @@
 table=0, priority=10,icmp6,in_port=18,icmp_type=136 actions=resubmit(,24)
 table=0, priority=2,in_port=23 actions=drop
 table=0, priority=2,in_port=24 actions=drop
-table=0, priority=3,in_port=23,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
-table=0, priority=3,in_port=24,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,60)
+table=0, priority=2,in_port=41 actions=drop
+table=0, priority=2,in_port=42 actions=drop
+table=0, priority=2,in_port=43 actions=drop
+table=0, priority=2,in_port=ANY actions=drop
+table=0, priority=3,in_port=43,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,60)
+table=0, priority=3,in_port=ANY,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
 table=0, priority=5,in_port=23,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,4)
 table=0, priority=5,in_port=24,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,4)
 table=0, priority=5,in_port=3,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,3)

flow diff for br-ex:

# diff -u <( cat dump-flows.br-ex.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-ex.1 | cut -d ' ' -f4,8- | sort )
--- /dev/fd/63 2020-05-18 13:27:07.036710753 +0000
+++ /dev/fd/62 2020-05-18 13:27:07.036710753 +0000
@@ -1,8 +1,10 @@

 table=0, priority=0 actions=NORMAL
 table=0, priority=1 actions=resubmit(,3)
+table=0, priority=2,in_port=14 acti...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.