"systemctl stop openvswitch-switch" will remove /var/run/openvswitch

Bug #1910209 reported by Yi Yang
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Server Guide
Fix Released
Undecided
Unassigned
dpdk (Ubuntu)
Invalid
Undecided
Unassigned
openvswitch (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Confirmed
Undecided
Unassigned
Focal
Confirmed
Undecided
Unassigned
Groovy
Won't Fix
Undecided
Unassigned

Bug Description

[Impact]

 * The current systemd profile (only active in Debian/Ubuntu) in that form
   has a runtime directory. But in the default that means the runtime
   dir is removed on service stop or restart.

 * In the past dpdhvhostuser connections used to use paths under that run dir
   which was no problem as they were dead on restart anyway. But more modern
   dpdkvhostuserclient connections might (out of habit) use the same path
   and the dir removal kills that and effectively prevents to keep guest
   networking alive.

 * The fix ensures the directory is kept around via the proper systemd
   statement

[Test Case]

 * start the service and touch any new file in there e.g.
   $ touch /var/run/openvswitch/foo
   After a restart this should still be there
   $ systemctl restart openvswitch-switch
   $ ls -laF /var/run/openvswitch/foo

[Where problems could occur]

 * In our discussions we didn't find a reason that requires to clean that
   directory. But if there are any setup scenarios we have forgotten that need
   it then on restart they will have to deal with that "old content".
   Therefore on service restart is the place to watch out for regressions.

[Other Info]

 * n/a

---

TL;DR:
- stoping/restarting OVS clears /var/run/openvswitch
- out of the "vhostuser" connection times a common socket path used
  was at /var/run/openvswitch
- if that path used with "vhostuserclient" that removes the sockets
  on OVS stop/restart
- Since qemu in server mode only creates this sockets once (as by
  the client/server design makes sense) that breaks the guests until
  restarted which is what the tech of vhostuserclient wanted to avoid.
+ Workaround: do use a different path like e.g.
  "/var/run/vhostuserclient/vhost-user-client-1"
+ Solution: let us think if we could keep the path around on stop/restart

--- vv original report vv ---

My system is Ubuntu 18.04, I installed ovs DPDK by apt-get and used ovs-vswitchd DPDK version, but when I stop openvswitch-switch (sudo systemctl stop openvswitch-switch), /var/run/openvswitch is removed, so the exisitng VMs can't be accessed any more. I don't know why it is removed and who removed it.

Related branches

Revision history for this message
Yi Yang (yangyi01) wrote :

# ps aux | grep ovs-vswitchd
root 75975 200 0.3 135321916 222240 ? S<Lsl Jan04 2957:13 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root 2101431 0.0 0.0 13136 1004 pts/1 S+ 08:34 0:00 grep --color=auto ovs-vswitchd
# ls /var/run/openvswitch/
br-ex.mgmt br-int.snoop data.mgmt ovs-vswitchd.75975.ctl
br-ex.snoop br-smgt.mgmt data.snoop ovs-vswitchd.pid
br-floating.mgmt br-smgt.snoop db.sock
br-floating.snoop br-tun.mgmt ovsdb-server.75906.ctl
br-int.mgmt br-tun.snoop ovsdb-server.pid
# systemctl stop openvswitch-switch
# ls -l /var/run/openvswitch/
ls: cannot access '/var/run/openvswitch/': No such file or directory
#

Revision history for this message
Yi Yang (yangyi01) wrote :

# dpkg -l | grep openvswitch
ii openvswitch-common 2.9.5-0ubuntu0.18.04.1 amd64 Open vSwitch common components
ii openvswitch-switch 2.9.5-0ubuntu0.18.04.1 amd64 Open vSwitch switch implementations
ii openvswitch-switch-dpdk 2.9.5-0ubuntu0.18.04.1 amd64 DPDK enabled Open vSwitch switch implementation
ii python-openvswitch 2.9.5-0ubuntu0.18.04.1 all Python bindings for Open vSwitch
# cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
#

Revision history for this message
Yi Yang (yangyi01) wrote :

By the way, VM's vhost-server-path is

Port "vhu8dcfe028-32"
            tag: 4095
            Interface "vhu8dcfe028-32"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/var/run/openvswitch/vhostuser/vhu8dcfe028-32"}

It can't be removed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Yi Yang,
TBH that isn't surprising.
This is like saying "I have a server here, and after I pulled the power plug no one can connect to it anymore".

The important bit for this to read into IMHO is vhost-user vs. vhost-user-client
=> https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/#vhost-user-vs-vhost-user-client

From there you'll see: " ... if OVS dies, all VMs must be restarted." that is effectively the scenario you are in when you stop the service.

But Recent versions fully support vhostuserclient in DPDK and OVS.
Using that allows you to restart the OVS without loosing the guests.
They might see a short network outage thou (which is reasonable and ok).

Now your config already states type: dpdkvhostuserclient which means you started to do the right thing. But you haven't mentioned it at all.

While OVS is down it is ok/expected that nothing works.
But when OVS is back up, then the VMs should re-connect to the sockets and networking should get up again.

Note I'm not 100% certain if the DPDK/OVS in 18.04 already had all it needed - you might consider using https://launchpad.net/~canonical-server/+archive/ubuntu/server-backports to try if that version suddenly makes you happy. If it does consider upgrading to 20.04 for a long term setup.

If I missed some detail in your report please explain and let me know.
Kind Regards
Christian

Changed in dpdk (Ubuntu):
status: New → Incomplete
Revision history for this message
Yi Yang (yangyi01) wrote :

Christian, thank you so much for quickly replying, but in Ubuntu 16.04, /var/run/openvswitch wasn't removed after "systemctl stop openvswitch-switch", in Ubuntu 18.04, ovs deamon also doesn't remove /var/run/openvswitch, can you tell me who created /var/run/openvswitch and who removed it? I just want to let it work normally.

For dpdkvhostuserclient case, this unix socket /var/run/openvswitch/vhostuser/vhu* is created by qemu once, ovs DPDK won't create it when it is restarted, this isn't ovs DPDK's mistake, it should be so. Right now, it is removed by "systemctl stop openvswitch-switch", this is root cause it can't work.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For the 16.04 case there was no dpdkvhostuserclient yet, so there it had to stay around anyway as OVS was the server.

But I agree that it is in issue in regard to "/var/run/openvswitch/vhostuser/vhu* is created by qemu once" that matches what I expect and indeed in that case clearing the var/run directory is bad indeed.

For the time being a temporary workaround could be to try using a different path.
I'm unsure if - without further tweaks - OVS and the Qemu rocessed are allowed to access it (apparmor is in place), but you could try putting the socket paths at e.g. /var/run/ovs-qemu-sockets/ and see if this works as an interim solution.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI:
- systemctl reload works fine (as that calls /usr/share/openvswitch/scripts/ovs-systemd-reload)
- systemctl restart also clears the /var/run/openvswitch directory

Changed in dpdk (Ubuntu):
status: Incomplete → Confirmed
Changed in openvswitch (Ubuntu):
status: New → Confirmed
Changed in dpdk (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

While the behavior has "more impact" when running with openvswitch-dpdk as one common path to put the sockets is under /var/run/openvswitch the problem is only in openvswitch.
I updated the bug tasks accordingly.

@Yi: I've checked my setups why I haven't seen this before and indeed I usually use a path like
"/var/run/vhostuserclient/vhost-user-client-1" and it works fine - so using something like that might be more than just a workaround for you.

Never the less I happen to know that /var/run/openvswitch was in some guides in the past. I even fixed one stray reference in our own docs [2] now to fully use "/var/run/vhostuserclient/vhost-user-client-1" as shown above (its examples had a mix of paths up to now).
So thanks for the report for that alone !

The upstream OVS docs [1] usually seem to use a path like "/tmp/dpdkvhostclient0".
But even in there are examples of "/usr/local/var/run/openvswitch/vhost-user-1".
Reading that has made me realize that those paths with the pattern "/var/run/openvswitch" are all from the pre vhostuserclient time, and back then restarting OVS killed the connection anyway.
So it wasn't a problem back then, but if users keep using those paths with vhostuserclient - then it is a problem.

Adding a TL;DR to the bug description.

[1]: https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/
[2]: https://ubuntu.com/server/docs/openvswitch-dpdk

description: updated
Changed in openvswitch (Ubuntu):
importance: Undecided → Medium
Changed in serverguide:
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

openvswitch-switch consists of three services actually:

openvswitch-switch.service (top level)
  -> ovsdb-server.service
    -> ovs-vswitchd.service

Of those the start/stop/restart of ovsdb-server.service is the one creating/removing the directory of /var/run/openvswitch/. The latter ovs-vswitchd.service then re-uses that directory.

The path is created due to /lib/systemd/system/ovsdb-server.service:
RuntimeDirectory=openvswitch
RuntimeDirectoryMode=0755

Acoording to [1] that will be "In case of RuntimeDirectory= the innermost subdirectories are removed when the unit is stopped. It is possible to preserve the specified directories in this case if RuntimeDirectoryPreserve= is configured to restart or yes (see below)"

So the behavior atm is exactly as configured.

The service files are "ours" (= packaging). Upstream only has rhel service files and those do not dynamically add/remove the path. Instead I've even seen discussions [2] saying about RuntimeDirectoryPreserve "We need to have this either as 'yes', or 'restart' - OVN daemons depend on this directory persisting even when the OVS daemons go away."

We can add a change via:
$ sudo systemctl edit ovsdb-server.service
[Service]
RuntimeDirectoryPreserve=yes

That will keep the directory alive and avoid the issue.
@Yi - for now that could be your solution as it would avoid having you to reconfigure (potentially many) other places.

Packaging wise that would be as easy as [3]

But I'm unsure if there is intention in removing this directory - e.g. to get the DB state cleaned and re-initialized for sure. I'd leave that up to James/Frode who have looked more at OVS. Based on their decision we can try to make this statement part of the default config (or not).
Subscribing James/Frode here on the bug and on the MP to carry on from here.

[1]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html
[2]: https://github.com/openvswitch/ovs/pull/247/files#r208172722
[3]: https://code.launchpad.net/~paelzer/ubuntu/+source/openvswitch/+git/openvswitch/+merge/395882

Revision history for this message
Yi Yang (yangyi01) wrote :

Christian, thank you so much, the below solution works for me.

$ sudo systemctl edit ovsdb-server.service
[Service]
RuntimeDirectoryPreserve=yes

BTW, only one way to change vhostuser path is using ovs other_config:vhost-sock-dir, but it is a subdir under /var/run/openvswitch, so it can't fix this.

https://github.com/openvswitch/ovs/blob/master/lib/dpdk.c#L360.

If you create dpdkvhostuserclient port by yourself and add-port it by yourself, you can specify full path, but for me, these are handled by openstack, so I can't control this, other_config:vhost-sock-dir is only one way to change this, but it is relative path, it can't be beyond /var/run/openvswitch.

Revision history for this message
Yi Yang (yangyi01) wrote :

BTW, /var/run/openvswitch doesn't keep ovsdb data there, so preserving it is safe. It just saved some unix socket files and pid files, they will be cleaned by ovs deamons when daenons are stopped.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This is uploaded to Ubuntu 21.04 already.
Just needs some builds and tests to complete.

I'm slightly unsure if this should be SRUed or if - for older releases we'd ask users to adapt the configuration if affected. I'm slightly leaning to an SRU but look at James/Frode for a decsion.

no longer affects: dpdk (Ubuntu Bionic)
no longer affects: dpdk (Ubuntu Focal)
no longer affects: dpdk (Ubuntu Groovy)
Changed in openvswitch (Ubuntu Bionic):
status: New → Confirmed
Changed in openvswitch (Ubuntu Focal):
status: New → Confirmed
Changed in openvswitch (Ubuntu Groovy):
status: New → Confirmed
Changed in openvswitch (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.15.0~git20210104.def6eb1ea-0ubuntu3

---------------
openvswitch (2.15.0~git20210104.def6eb1ea-0ubuntu3) hirsute; urgency=medium

  * d/openvswitch-switch.ovsdb-server.service: avoid removing the state
    dir on restart (LP: #1910209)

 -- Christian Ehrhardt <email address hidden> Thu, 07 Jan 2021 12:14:35 +0000

Changed in openvswitch (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote :

The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release

Changed in openvswitch (Ubuntu Groovy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.