reboot neutron-ovs-agent introduces a short interrupt of vlan traffic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Queens |
Fix Released
|
Critical
|
Unassigned | ||
Rocky |
Fix Released
|
Critical
|
Unassigned | ||
Stein |
Fix Released
|
Undecided
|
Unassigned | ||
Train |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
Victoria |
Fix Released
|
Undecided
|
Unassigned | ||
neutron |
Fix Released
|
Low
|
norman shen | ||
neutron (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Critical
|
Trent Lloyd | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
(SRU template copied from comment 42)
[Impact]
- When there is a RabbitMQ or neutron-api outage, the neutron-
- In the same situation, the neutron-l3-agent can delete the L3 router (Bug #1871850), or may need to refresh the tunnel (Bug #1853613), or may need to update flows or reconfigure bridges (Bug #1864822)
[Test Plan]
(1) Deploy Openstack Bionic-Queens with DVR and a *VLAN* tenant network (VXLAN or FLAT will not reproduce the issue). With a standard deployment, simply enabling DHCP on the ext_net subnet will allow VMs to be booted directly on the ext_net provider network. "openstack subnet set --dhcp ext_net and then deploy the VM directly to ext_net"
(2) Deploy a VM to the VLAN network
(3) Start pinging the VM from an external network
(4) Stop all RabbitMQ servers
(5) Restart neutron-
(6) Ping traffic should NOT see interruption
(7) Start all RabbitMQ servers
(8) Ping traffic should still be fine
[Where problems could occur]
These patches are all cherry-picked from the upstream stable branches, and have existed upstream including the stable/queens branch for many months and in Ubuntu all supported subsequent releases (Stein onwards) have also had these patches for many months with the exception of Queens.
There is a chance that not installing these drop flows during startup could have traffic go somewhere that's not expected when the network is in a partially setup case, this was the case for DVR and in setups where more than 1 DVR external network port existed a network loop was possibly temporarily created. This was already addressed with the included patch for Bug #1869808. Checked and could not locate any other merged changes to this drop_port logic that also need to be backported.
[Other Info]
[original description]
We are using Openstack Neutron 13.0.6 and it is deployed using OpenStack-helm.
I test ping servers in the same vlan while rebooting neutron-ovs-agent. The result shows
root@mgt01:~# openstack server list
+------
| ID | Name | Status | Networks | Image | Flavor |
+------
| 22d55077-
| 726bc888-
$ ping 172.31.10.4
PING 172.31.10.4 (172.31.10.4): 56 data bytes
......
64 bytes from 172.31.10.4: seq=59 ttl=64 time=0.465 ms
64 bytes from 172.31.10.4: seq=60 ttl=64 time=0.510 ms <--------
64 bytes from 172.31.10.4: seq=61 ttl=64 time=0.446 ms
64 bytes from 172.31.10.4: seq=63 ttl=64 time=0.744 ms
64 bytes from 172.31.10.4: seq=64 ttl=64 time=0.477 ms
64 bytes from 172.31.10.4: seq=65 ttl=64 time=0.441 ms
64 bytes from 172.31.10.4: seq=66 ttl=64 time=0.376 ms
64 bytes from 172.31.10.4: seq=67 ttl=64 time=0.481 ms
As one can see, packet seq 62 is lost, I believe, during rebooting ovs agent.
Right now, I am suspecting https:/
Because when I dump flows on phys bridge, I can see duration is rewinding to 0 which suggests flow has been deleted and created again
""" duration=secs
The time, in seconds, that the entry has been in the table.
secs includes as much precision as the switch provides, possibly
to nanosecond resolution.
"""
root@compute01:~# ovs-ofctl dump-flows br-floating
...
cookie=
priority=
...
IMO, rebooting ovs-agent should not affecting data plane.
Changed in neutron: | |
assignee: | nobody → norman shen (jshen28) |
status: | New → In Progress |
tags: | added: l3-dvr-backlog |
Changed in neutron (Ubuntu Hirsute): | |
status: | New → Fix Released |
Changed in neutron (Ubuntu Groovy): | |
status: | New → Fix Released |
Changed in neutron (Ubuntu Focal): | |
status: | New → Fix Released |
description: | updated |
Changed in neutron (Ubuntu Bionic): | |
assignee: | nobody → Trent Lloyd (lathiat) |
importance: | Undecided → Critical |
status: | New → In Progress |
description: | updated |
tags: | removed: verification-needed |
Hi Norman - are you able to reproduce this on a later version of neutron like 15.0.x?
Since this is just causing a small impact and the restart of the ovs-agent isn't a frequent event, I'm going to set the priority to Low.