OVS 2.9+ systemd integration issues
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openvswitch (Ubuntu) |
Fix Released
|
Medium
|
James Page | ||
Bionic |
In Progress
|
Medium
|
James Page | ||
Disco |
Won't Fix
|
Medium
|
Unassigned | ||
Eoan |
Fix Released
|
Medium
|
James Page |
Bug Description
For a few months now, we have been using OVS 2.9 (or newer) on Ubuntu Xenial in OPNFV, both with and without DPDK.
A while ago, we observed a couple of rare race conditions when multiple Linux interfaces/bridges are mixed with OVS ports/bridges. We also observed races between DPDK binding and openvswitch-switch (actually openvswitch-
We worked around those issues by using a solution derived from the official OVS Debian readme, which recommends avoiding using `auto` for OVS bridges. Instead, we used `auto` for OVS bridges, but omitted the `auto` for the OVS ports in them. That worked almost perfectly for a while.
However, we recently bumped a few unrelated software components (since we migrated from Queens to Rocky in OPNFV) and we started experiecing race conditions again.
So I dugg a bit and found a couple of things:
1. Broken dependency between ovsdb-server/
This is probably a copy-pasta error from [1] `Before: network.service` which should probably be `Before: networking.service` on Debian systems.
The consequence is quite serious - on Debian systems, the OVS services start *after* networking.service.
Changing this leads to a service order change, which turns out to be quite the rabbit hole ...
2. Outdated ifupdown scripts
For example /etc/network/
Luckily, this is not critical, as the fallback uses `service openvswitch-switch [...]`, so I'm not sure this should be changed, but I thought it's worth mentioning.
3. Debian OVS does *not* handle OVS bridges without `auto`
Upstream OVS readme recommends ommitting `auto` for OVS bridges, as mentioned earlier, to avoid exactly the race conditions we saw.
Although following the recommendation in the upstream readme leads to a working system (`networking.
Imo, networking.service (or some *other* mechanism) should call `/sbin/ifup --allow=ovs -a --read-environment` *after* the initial `/sbin/ifup -a --read-enviroment` (provided the ordering issue #1 was changed to start OVS first, of course).
4. ovsdb-server should never start before DPDK service if DPDK is installed
This should actually be easy to fix and I have to admit I haven't run into it lately, although I remember it being an issue a while ago.
Anyway, a simple `After: dpdk.service` wouldn't hurt.
5. If OVS starts before networking.service, cloud-init causes cyclic dependencies
If we configure OVS services to start first, systemd might decide to randomly remove some units to break the following circular dependency:
ovs-vswitchd --> ovsdb-server -(default dep)-> sysinit.target -->
cloud-
In my tests, I just set 'DefaultDepende
On my test systems, I didn't bother handling #2, as for the others I have some systemd drop-ins (see below), which so far seem to produce reproductible working environments.
# cat /etc/systemd/
[Unit]
After=dpdk.service
Before=
DefaultDependen
# cat /etc/systemd/
[Service]
ExecStart=
# cat /etc/systemd/
[Unit]
Before=
DefaultDependen
# lsb_release -rd
Description: Ubuntu 16.04.5 LTS
Release: 16.04
# apt-cache policy openvswitch-switch
openvswitch-switch:
Installed: 2.9.0-0ubuntu1~
Candidate: 2.9.0-0ubuntu1~
Version table:
*** 2.9.0-0ubuntu1~
500 http://
100 /var/lib/
[1] https:/
Changed in openvswitch (Ubuntu Eoan): | |
status: | Confirmed → In Progress |
assignee: | nobody → James Page (james-page) |
importance: | Undecided → Medium |
Changed in openvswitch (Ubuntu Disco): | |
importance: | Undecided → Medium |
Changed in openvswitch (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in openvswitch (Ubuntu Disco): | |
status: | New → Triaged |
Changed in openvswitch (Ubuntu Bionic): | |
status: | New → Triaged |
Changed in openvswitch (Ubuntu Disco): | |
status: | Triaged → Won't Fix |
Status changed to 'Confirmed' because the bug affects multiple users.