Missing flows with ovs dvr after openvswitch restart

Bug #2004041 reported by Jan Horstmann
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
LIU Yulong

Bug Description

Certain flows are missing in a distributed openstack setup after restart of openvswitch.
I have tested this on openstack ussuri deployed with kolla-ansible on ubuntu bionic, so there is a chance that this has been either been fixed or is caused by specifics of the deployment.

## Steps to reproduce

There might be a simpler reproducer, but this is what I did:

* Setup a distributed openstack with at least one control node and two compute nodes
* Configure neutron with OVS and DVR
* Configure octavia with amphora driver
* Setup an external network as floating ip pool
* Create an instance with an http server
* Create a loadbalancer with an http listener/pool
* Add the instance as pool member to the loadbalancer
* Attach a floating IP to the loadbalancer's virtual IP
* Make sure that the loadbalancer amphora and the instance are on different compute nodes
* Ensure that you can make an http request, e.g.:

  ```
  # curl -I http://${FLOATING_IP}
  HTTP/1.1 200 OK
  Server: nginx/1.18.0 (Ubuntu)
  Date: Fri, 27 Jan 2023 15:00:00 GMT
  Content-Type: text/html
  Content-Length: 612
  Last-Modified: Fri, 27 Jan 2023 13:45:11 GMT
  ETag: "63d3d567-264"
  Accept-Ranges: bytes

    0 612 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
  ```

* Restart openvswitch

  ```
  # docker restart openvswitch_vswitchd
  openvswitch_vswitchd
  ```

* Observe that the connection fails with, e.g.:

  ```
  # curl -I http://${FLOATING_IP}
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
  curl: (7) Failed to connect to ${FLOATING_IP} port 80: No route to host
  ```

* Connections will re-establish only after restarting neutron-openvswitch-agent

## Flows before and after restart of openvswitch

Looking at the flows on the controller node on the tunnel bridge one can see, that flows are missing after restarting openvswitch:
```
# docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > before_ovs_restart.log
# docker restart openvswitch_vswitchd
openvswitch_vswitchd
# docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > after_ovs_restart.log
# awk '{print $3" "$(NF)}' < before_ovs_restart.log > before_ovs_restart_cleaned.log
# awk '{print $3" "$(NF)}' < after_ovs_restart.log > after_ovs_restart_cleaned.log
# diff before_ovs_restart_cleaned.log after_ovs_restart_cleaned.log
3,4d2
< table=0, actions=resubmit(,4)
< table=0, actions=resubmit(,4)
6,7d3
< table=1, actions=drop
< table=1, actions=mod_dl_src:fa:16:3f:56:bb:5a,resubmit(,2)
13d8
< table=4, actions=mod_vlan_vid:53,resubmit(,9)
20,22d14
< table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22
< table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:23
< table=20, actions=load:0->NXM_OF_VLAN_TCI[],load:0x2ed->NXM_NX_TUN_ID[],output:22
24,25d15
< table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163eb4cf96->NXM_NX_ARP_SHA[],load:0xa000165->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:b4:cf:96,IN_PORT
< table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e77e67e->NXM_NX_ARP_SHA[],load:0xa0000a3->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:77:e6:7e,IN_PORT
27,28d16
< table=22, actions=drop
< table=22, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22,output:23
```

Please let me know if you need more information. I also have a heat stack which automates the openstack resource part of the reproducer, in case this makes things easier.

Tags: ovs
Revision history for this message
LIU Yulong (dragon889) wrote :

Seems this is duplicated to this bug:
https://bugs.launchpad.net/neutron/+bug/1978088

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Not sure this is a duplicate of https://bugs.launchpad.net/neutron/+bug/1978088 as here openvswitch service is restarted, not neutron-openvswitch-agent.

@janhorstmann is it possible that just not enough time passed since OVS restart for neutron-ovs-agent to be able to detect and recover OVS flows? Can you please check and attach ovs-agent logs during openvswitch service restart?

tags: added: ovs
Changed in neutron:
status: New → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/872265

Changed in neutron:
status: Incomplete → In Progress
Revision history for this message
LIU Yulong (dragon889) wrote :

Ordinarily, if restart openvswitch, we should restart neutron-openvswitch-agent as well. For instance, for some RPM release, it has systemd services which will restart neutron-openvswitch-agent after openvswitch is restarted implicitly. So, for container based deployment, I'm not sure if there is such similar mechanism. It's better to restart neutron-openvswitch-agent container after restart ovs manually to avoid some flows missing, besause restart ovs is mean to re-process all ports for neutron-openvswitch-agent.

Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
importance: Undecided → Medium
Revision history for this message
Jan Horstmann (janhorstmann) wrote :

Thank you for providing a fix so quickly.
I have tested a cherry-pick of https://review.opendev.org/c/openstack/neutron/+/872265 together with https://review.opendev.org/c/openstack/neutron/+/770058 in our stable/ussuri deployment and can confirm that it resolves the problem.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/872265
Committed: https://opendev.org/openstack/neutron/commit/7573fca58c147eddddbfff6eebc3554fcdd23306
Submitter: "Zuul (22348)"
Branch: master

commit 7573fca58c147eddddbfff6eebc3554fcdd23306
Author: LIU Yulong <email address hidden>
Date: Tue Jan 31 16:08:34 2023 +0800

    Notify neutron-server ovs is restarted

    If openvswitch is restarted, try to notify neutron-server
    that to refresh tunnel flows for every ports.

    Closes-Bug: #2004041
    Change-Id: Iba0ae947e3595674e63b998826daae2582bb7668

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.