I managed to reproduce this and also noticed that the reproduction is indeterministic. Sometimes connectivity recovers after ovs restart, other times it does not. Both cases are quite frequent, so they can be easily caught.
For the record this is the exact reproduction:
# the default is to not use veth pairs, check that we don't have them at start
$ sudo ip l | egrep phy-br-ex
[nothing]
# change the config to use veth interconnections
$ vim /etc/neutron/plugins/ml2/ml2_conf.ini
[ovs]
use_veth_interconnection = True
$ sudo systemctl restart devstack@neutron-agent
# now we have veth interconnections
$ sudo ip l | egrep phy-br-ex
37: phy-br-ex@int-br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
38: int-br-ex@phy-br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
$ sudo ethtool -S phy-br-ex
NIC statistics:
peer_ifindex: 38
# boot vm with floating ip
$ openstack server create vm0 --flavor cirros256 --image cirros-0.4.0-x86_64-disk --nic net-id=private --wait
$ openstack floating ip create --port "$( openstack port list --device-id "$( openstack server show vm0 -f value -c id )" -f value -c id | head -1 )" public -f value -c floating_ip_address
172.24.4.211
# start ping and keep it running, while...
$ ping 172.24.4.211
# ... we restart ovs
$ sudo systemctl restart openvswitch-switch
In some cases ping recovers in a few seconds. In other cases it never recovers.
flow diff for br-int (.0 is the working state before ovs restart, .1 is when ping did not recover):
I managed to reproduce this and also noticed that the reproduction is indeterministic. Sometimes connectivity recovers after ovs restart, other times it does not. Both cases are quite frequent, so they can be easily caught.
For the record this is the exact reproduction:
# the default is to not use veth pairs, check that we don't have them at start
$ sudo ip l | egrep phy-br-ex
[nothing]
# change the config to use veth interconnections plugins/ ml2/ml2_ conf.ini interconnection = True
$ vim /etc/neutron/
[ovs]
use_veth_
$ sudo systemctl restart devstack@ neutron- agent
# now we have veth interconnections ex@int- br-ex: <BROADCAST, MULTICAST, UP,LOWER_ UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000 ex@phy- br-ex: <BROADCAST, MULTICAST, UP,LOWER_ UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
$ sudo ip l | egrep phy-br-ex
37: phy-br-
38: int-br-
$ sudo ethtool -S phy-br-ex
NIC statistics:
peer_ifindex: 38
# boot vm with floating ip 0.4.0-x86_ 64-disk --nic net-id=private --wait
$ openstack server create vm0 --flavor cirros256 --image cirros-
$ openstack floating ip create --port "$( openstack port list --device-id "$( openstack server show vm0 -f value -c id )" -f value -c id | head -1 )" public -f value -c floating_ip_address
172.24.4.211
# start ping and keep it running, while...
$ ping 172.24.4.211
# ... we restart ovs
$ sudo systemctl restart openvswitch-switch
In some cases ping recovers in a few seconds. In other cases it never recovers.
flow diff for br-int (.0 is the working state before ovs restart, .1 is when ping did not recover):
# diff -u <( cat dump-flows.br-int.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-int.1 | cut -d ' ' -f4,8- | sort ) 10,icmp6, in_port= 18,icmp_ type=136 actions= resubmit( ,24) 2,in_port= 23 actions=drop 2,in_port= 24 actions=drop 3,in_port= 23,vlan_ tci=0x0000/ 0x1fff actions= mod_vlan_ vid:2,resubmit( ,60) 3,in_port= 24,dl_vlan= 100 actions= mod_vlan_ vid:3,resubmit( ,60) 2,in_port= 41 actions=drop 2,in_port= 42 actions=drop 2,in_port= 43 actions=drop 2,in_port= ANY actions=drop 3,in_port= 43,dl_vlan= 100 actions= mod_vlan_ vid:3,resubmit( ,60) 3,in_port= ANY,vlan_ tci=0x0000/ 0x1fff actions= mod_vlan_ vid:2,resubmit( ,60) 5,in_port= 23,dl_dst= fa:16:3f: ca:bf:17 actions= resubmit( ,4) 5,in_port= 24,dl_dst= fa:16:3f: ca:bf:17 actions= resubmit( ,4) 5,in_port= 3,dl_dst= fa:16:3f: ca:bf:17 actions= resubmit( ,3)
--- /dev/fd/63 2020-05-18 13:25:50.235895198 +0000
+++ /dev/fd/62 2020-05-18 13:25:50.239895241 +0000
@@ -4,8 +4,12 @@
table=0, priority=
table=0, priority=
table=0, priority=
-table=0, priority=
-table=0, priority=
+table=0, priority=
+table=0, priority=
+table=0, priority=
+table=0, priority=
+table=0, priority=
+table=0, priority=
table=0, priority=
table=0, priority=
table=0, priority=
flow diff for br-ex:
# diff -u <( cat dump-flows.br-ex.0 | cut -d ' ' -f4,8- | sort ) <( cat dump-flows.br-ex.1 | cut -d ' ' -f4,8- | sort )
--- /dev/fd/63 2020-05-18 13:27:07.036710753 +0000
+++ /dev/fd/62 2020-05-18 13:27:07.036710753 +0000
@@ -1,8 +1,10 @@
table=0, priority=0 actions=NORMAL resubmit( ,3) 2,in_port= 14 actions=drop 2,in_port= 15 actions=drop 2,in_port= 5 actions= resubmit( ,1) 4,in_port= 5,dl_vlan= 2 actions= strip_vlan, NORMAL 4,in_port= 15,dl_vlan= 2 actions= strip_vlan, NORMAL resubmit( ,2) 2,in_port= 5 actions=drop
table=0, priority=1 actions=
+table=0, priority=
+table=0, priority=
table=0, priority=
-table=0, priority=
+table=0, priority=
table=1, priority=0 actions=
table=2, priority=
table=3, priority=1 actions=NORMAL
In ovs-agent log there's no unexpected error message:
máj 18 13:14:35 devstack1 neutron- openvswitch- agent[8674] : ERROR neutron. agent.common. async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport, external_ ids --format=json]: ovsdb-client: tcp:127.0.0.1:6640: Open_vSwitch database was removed