neutron

Bug #1877977
Comment #1

Comment 1 for bug 1877977

Revision history for this message

Bence Romsics (bence-romsics) wrote on 2020-05-18:

I managed to reproduce this and also noticed that the reproduction is indeterministic. Sometimes connectivity recovers after ovs restart, other times it does not. Both cases are quite frequent, so they can be easily caught.

For the record this is the exact reproduction:

# the default is to not use veth pairs, check that we don't have them at start
$ sudo ip l | egrep phy-br-ex
[nothing]

# change the config to use veth interconnections
$ vim /etc/neutron/plugins/ml2/ml2_conf.ini
[ovs]
use_veth_interconnection = True

$ sudo systemctl restart devstack@neutron-agent

# now we have veth interconnections
$ sudo ip l | egrep phy-br-ex
37: phy-br-ex@int-br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
38: int-br-ex@phy-br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
$ sudo ethtool -S phy-br-ex
NIC statistics:
peer_ifindex: 38

# boot vm with floating ip
$ openstack server create vm0 --flavor cirros256 --image cirros-0.4.0-x86_64-disk --nic net-id=private --wait
$ openstack floating ip create --port "$( openstack port list --device-id "$( openstack server show vm0 -f value -c id )" -f value -c id | head -1 )" public -f value -c floating_ip_address
172.24.4.211

# start ping and keep it running, while...
$ ping 172.24.4.211

# ... we restart ovs
$ sudo systemctl restart openvswitch-switch

In some cases ping recovers in a few seconds. In other cases it never recovers.

flow diff for br-int (.0 is the working state before ovs restart, .1 is when ping did not recover):

# diff -u <( cat dump-flows.br-int.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-int.1 | cut -d ' ' -f4,8- | sort )
--- /dev/fd/63 2020-05-18 13:25:50.235895198 +0000
+++ /dev/fd/62 2020-05-18 13:25:50.239895241 +0000
@@ -4,8 +4,12 @@
table=0, priority=10,icmp6,in_port=18,icmp_type=136 actions=resubmit(,24)
table=0, priority=2,in_port=23 actions=drop
table=0, priority=2,in_port=24 actions=drop
-table=0, priority=3,in_port=23,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
-table=0, priority=3,in_port=24,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,60)
+table=0, priority=2,in_port=41 actions=drop
+table=0, priority=2,in_port=42 actions=drop
+table=0, priority=2,in_port=43 actions=drop
+table=0, priority=2,in_port=ANY actions=drop
+table=0, priority=3,in_port=43,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,60)
+table=0, priority=3,in_port=ANY,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
table=0, priority=5,in_port=23,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,4)
table=0, priority=5,in_port=24,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,4)
table=0, priority=5,in_port=3,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,3)

flow diff for br-ex:

table=0, priority=0 actions=NORMAL
table=0, priority=1 actions=resubmit(,3)
+table=0, priority=2,in_port=14 actions=drop
+table=0, priority=2,in_port=15 actions=drop
table=0, priority=2,in_port=5 actions=resubmit(,1)
-table=0, priority=4,in_port=5,dl_vlan=2 actions=strip_vlan,NORMAL
+table=0, priority=4,in_port=15,dl_vlan=2 actions=strip_vlan,NORMAL
table=1, priority=0 actions=resubmit(,2)
table=2, priority=2,in_port=5 actions=drop
table=3, priority=1 actions=NORMAL

In ovs-agent log there's no unexpected error message:

máj 18 13:14:35 devstack1 neutron-openvswitch-agent[8674]: ERROR neutron.agent.common.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: ovsdb-client: tcp:127.0.0.1:6640: Open_vSwitch database was removed

For the record this is the exact reproduction:

# the default is to not use veth pairs, check that we don't have them at start
$ sudo ip l | egrep phy-br-ex
[nothing]

# change the config to use veth interconnections
$ vim /etc/neutron/plugins/ml2/ml2_conf.ini
[ovs]
use_veth_interconnection = True

$ sudo systemctl restart devstack@neutron-agent

# start ping and keep it running, while...
$ ping 172.24.4.211

# ... we restart ovs
$ sudo systemctl restart openvswitch-switch

In some cases ping recovers in a few seconds. In other cases it never recovers.

flow diff for br-int (.0 is the working state before ovs restart, .1 is when ping did not recover):

# diff -u <( cat dump-flows.br-int.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-int.1 | cut -d ' ' -f4,8- | sort )
--- /dev/fd/63  2020-05-18 13:25:50.235895198 +0000
+++ /dev/fd/62  2020-05-18 13:25:50.239895241 +0000
@@ -4,8 +4,12 @@
 table=0, priority=10,icmp6,in_port=18,icmp_type=136 actions=resubmit(,24)
 table=0, priority=2,in_port=23 actions=drop
 table=0, priority=2,in_port=24 actions=drop
-table=0, priority=3,in_port=23,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
-table=0, priority=3,in_port=24,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,60)
+table=0, priority=2,in_port=41 actions=drop
+table=0, priority=2,in_port=42 actions=drop
+table=0, priority=2,in_port=43 actions=drop
+table=0, priority=2,in_port=ANY actions=drop
+table=0, priority=3,in_port=43,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,60)
+table=0, priority=3,in_port=ANY,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
 table=0, priority=5,in_port=23,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,4)
 table=0, priority=5,in_port=24,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,4)
 table=0, priority=5,in_port=3,dl_dst=fa:16:3f:ca:bf:17 actions=resubmit(,3)

flow diff for br-ex:

# diff -u <( cat dump-flows.br-ex.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-ex.1 | cut -d ' ' -f4,8- | sort )
--- /dev/fd/63  2020-05-18 13:27:07.036710753 +0000
+++ /dev/fd/62  2020-05-18 13:27:07.036710753 +0000
@@ -1,8 +1,10 @@
 
 table=0, priority=0 actions=NORMAL
 table=0, priority=1 actions=resubmit(,3)
+table=0, priority=2,in_port=14 actions=drop
+table=0, priority=2,in_port=15 actions=drop
 table=0, priority=2,in_port=5 actions=resubmit(,1)
-table=0, priority=4,in_port=5,dl_vlan=2 actions=strip_vlan,NORMAL
+table=0, priority=4,in_port=15,dl_vlan=2 actions=strip_vlan,NORMAL
 table=1, priority=0 actions=resubmit(,2)
 table=2, priority=2,in_port=5 actions=drop
 table=3, priority=1 actions=NORMAL

In ovs-agent log there's no unexpected error message: