flows lost with noop firewall driver at ovs-agent restart while the db is down

Bug #2025341 reported by Bence Romsics
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Bence Romsics

Bug Description

If we restart ovs-agent while neutron-server is up but neutron DB is down, then the agent deletes and cannot recover the per-port flows, if we also use the noop firewall driver. Because the affected flows include the mod_vlan_vid flows this means traffic loss until another agent restart (with the db up) or a full successful resync happens.

For example:

[securitygroup]
firewall_driver = noop

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.1

# execute these by hand and make sure that each command took effect before moving on to the next
sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/noop-db-stop.2

# diff the flows (for the sake of simplicity this devstack environment has a single vm with a single port, started above)
a=1 ; b=2 ; base=noop-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

--- /dev/fd/63 2023-06-29 08:10:00.142623814 +0000
+++ /dev/fd/62 2023-06-29 08:10:00.142623814 +0000
@@ -1,19 +1,10 @@
 table=0 priority=0 actions=resubmit(,58)
-table=0 priority=10,arp,in_port=12 actions=resubmit(,24)
-table=0 priority=10,icmp6,in_port=12,icmp_type=136 actions=resubmit(,24)
 table=0 priority=200,reg3=0 actions=set_queue:0,load:0x1->NXM_NX_REG3[0],resubmit(,0)
 table=0 priority=2,in_port=1 actions=drop
 table=0 priority=2,in_port=2 actions=drop
-table=0 priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,58)
-table=0 priority=3,in_port=2,dl_vlan=100 actions=mod_vlan_vid:3,resubmit(,58)
 table=0 priority=65535,dl_vlan=4095 actions=drop
-table=0 priority=9,in_port=12 actions=resubmit(,25)
 table=23 priority=0 actions=drop
 table=24 priority=0 actions=drop
-table=24 priority=2,arp,in_port=12,arp_spa=10.0.0.19 actions=resubmit(,25)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fd17:d094:5207:0:f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=24 priority=2,icmp6,in_port=12,icmp_type=136,nd_target=fe80::f816:3eff:fe8e:b23f actions=resubmit(,58)
-table=25 priority=2,in_port=12,dl_src=fa:16:3e:8e:b2:3f actions=resubmit(,30)
 table=30 priority=0 actions=resubmit(,58)
 table=31 priority=0 actions=resubmit(,58)
 table=58 priority=0 actions=resubmit(,60)

The same loss of flows does not happen with the openvswitch firewall driver:

[securitygroup]
firewall_driver = openvswitch

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.1

sudo systemctl stop mysql
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/openvswitch-db-stop.2

a=1 ; b=2 ; base=openvswitch-db-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

[no diff]

The same loss of flows does not happen either if neutron-server is down while ovs-agent restarts:

[securitygroup]
firewall_driver = noop

openstack server delete vm0 --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.2-x86_64-disk --nic net-id=private vm0 --wait

sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.1

sudo systemctl stop devstack@q-svc
sudo systemctl restart devstack@q-agt

sudo ovs-ofctl dump-flows br-int > ~/noop-server-stop.2

a=1 ; b=2 ; base=noop-server-stop. ; colordiff -u <( cat ~/$base$a | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort ) <( cat ~/$base$b | egrep -v ^NXST_FLOW | sed -r -e 's/(cookie|duration|n_packets|n_bytes|idle_age|hard_age)=[^ ]+ //g' -e 's/^ *//' -e 's/, +/ /g' | sort )

[no diff]

devstack b10c0602
neutron 0c5d4b8728

I'll push a proposed fix soon.

Tags: ovs
Revision history for this message
Bence Romsics (bence-romsics) wrote :
Changed in neutron:
status: New → In Progress
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/887257
Committed: https://opendev.org/openstack/neutron/commit/6c513217c225e7eede998e588183a046b0cb03ec
Submitter: "Zuul (22348)"
Branch: master

commit 6c513217c225e7eede998e588183a046b0cb03ec
Author: Bence Romsics <email address hidden>
Date: Tue Jun 27 13:24:43 2023 +0200

    ovs-agent: React to DB down just like to server down

    When neutron-server is down, ovs-agent waits for it to become available
    during agent startup. When neutron-server is up, but it cannot reach the
    DB, it can do nothing pretty much the same way. However ovs-agent
    reacted differently to this failure. With this patch it reacts the same
    way and delays its startup until neutron-server is up together with its
    DB.

    Change-Id: Ia55e82540aedc236e9b016bb58047d0b437eeb99
    Closes-Bug: #2025341

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.