Error starting domain since update

Bug #1583009 reported by Kev Bowring
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Incomplete
Undecided
Unassigned
libvirt (Ubuntu)
Fix Released
Medium
Chris J Arges

Bug Description

Had no problems yesterday using virt-manager to open (or create new virtual machines)

Received updates - unable to now open existing or create:

Error starting domain: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: libvirt-bin 1.3.4-1ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
ApportVersion: 2.20.1-0ubuntu4
Architecture: amd64
CurrentDesktop: XFCE
Date: Wed May 18 07:18:47 2016
InstallationDate: Installed on 2016-01-11 (127 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160111)
SourcePackage: libvirt
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.init.d.virtlogd: [modified]
modified.conffile..etc.libvirt.libvirtd.conf: [modified]
modified.conffile..etc.libvirt.libxl.conf: [modified]
modified.conffile..etc.libvirt.qemu.conf: [inaccessible: [Errno 13] Permission denied: '/etc/libvirt/qemu.conf']
modified.conffile..etc.libvirt.qemu.networks.default.xml: [inaccessible: [Errno 13] Permission denied: '/etc/libvirt/qemu/networks/default.xml']
mtime.conffile..etc.init.d.virtlogd: 2016-05-01T04:06:42
mtime.conffile..etc.libvirt.libvirtd.conf: 2016-05-14T15:01:59
mtime.conffile..etc.libvirt.libxl.conf: 2016-05-14T15:01:55

Revision history for this message
Kev Bowring (flocculant) wrote :
Chris J Arges (arges)
Changed in libvirt (Ubuntu):
assignee: nobody → Chris J Arges (arges)
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Chris J Arges (arges) wrote :

I can reproduce this bug.

Changed in libvirt (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Chris J Arges (arges) wrote :

Uploaded a fix for Yakkety.

Revision history for this message
Kev Bowring (flocculant) wrote :

Thanks - grabbed it from -proposed :)

Revision history for this message
Kev Bowring (flocculant) wrote :

Got updates to virtinst and virt-manager today - issue is back.

Revision history for this message
Chris J Arges (arges) wrote :

Just did an update again and things work for me. Can you confirm which versions of virt-manager/libvirt you are using? In addition is it the same error message?

Revision history for this message
Kev Bowring (flocculant) wrote :

ii libvirt-bin 1.3.4-1ubuntu2
ii libvirt-clients 1.3.4-1ubuntu2
ii libvirt-daemon 1.3.4-1ubuntu2
ii libvirt-daemon-system 1.3.4-1ubuntu2
ii libvirt-glib-1.0-0:amd64 0.2.2-0.1ubuntu1
ii libvirt0:amd64 1.3.4-1ubuntu2

ii virt-manager 1:1.3.2-3ubuntu2
ii virt-viewer 3.1-1
ii virtinst 1:1.3.2-3ubuntu2

screenshot of error attached.

list of files in

drwxr-xr-x 2 root root 40 May 20 07:05 hostdevmgr
srwxrwx--- 1 root libvirtd 0 May 20 07:05 libvirt-sock
srwxrwxrwx 1 root libvirtd 0 May 20 07:05 libvirt-sock-ro
drwxr-xr-x 2 root root 40 May 20 07:05 lxc
drwxr-xr-x 2 root root 100 May 20 07:05 network
drwxr-xr-x 2 root root 40 May 20 07:05 qemu
drwxr-xr-x 2 root root 40 May 20 07:05 storage
drwxr-xr-x 2 root root 40 May 20 07:05 uml-guest

Details of error (if that helps)

Error starting domain: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 90, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 126, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1402, in startup
    self._backend.create()
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1035, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

Revision history for this message
Kev Bowring (flocculant) wrote :

mmk - so when I go look at the versions I've got and check them against packages.ubuntu.com, then remove them all, and reinstall - I get different versions.

Not sure how I did - I suspect yesterday and grabbing stuff from -proposed.

Long and short is - it works with what I *now* have installed.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.3.4-1ubuntu4

---------------
libvirt (1.3.4-1ubuntu4) yakkety; urgency=medium

  * Re-enable the upstart job by renaming the file.
  * Include patchby @guessi to continally wait for libvirtd to start when
    using sysvinit or upstart. (LP: #1571209)

 -- Serge Hallyn <email address hidden> Mon, 23 May 2016 13:50:22 -0500

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Kev Bowring (flocculant) wrote :

and broken again

Revision history for this message
Thiago Martins (martinx) wrote :

I'm seeing this problem while trying to launch an Instance on OpenStack Mitaka on Xenial.

Revision history for this message
Chris J Arges (arges) wrote : Re: [Bug 1583009] Re: Error starting domain since update

Hi,
If you are seeing this problem can you give me the versions of libvirt and
qemu you are using?
Thanks,
--chris

On Mon, Jun 13, 2016 at 5:20 PM, Thiago Martins <email address hidden>
wrote:

> I'm seeing this problem while trying to launch an Instance on OpenStack
> Mitaka on Xenial.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1583009
>
> Title:
> Error starting domain since update
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1583009/+subscriptions
>

Revision history for this message
Rich Art (riccardo-patane-ch) wrote :
Download full text (3.3 KiB)

Hi there,

I am also having same troubles. This is in the controller /var/log/nova/nova-conductor.log file after trying to create a new instance as described in http://docs.openstack.org/mitaka/install-guide-ubuntu/launch-instance-provider.html:

2016-06-29 13:27:03.084 2828 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n04 (node n04.mgmt.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:21.459 2828 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n01 (node n01.mgmt.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:36.501 2827 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n02 (node n02.mgmt.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:36.508 2827 WARNING nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance b5020d53-2e48-453b-bec9-cbca4fac5708. Last exception: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

2016-06-29 13:27:36.513 2827 WARNING nova.scheduler.utils [req-756b7528-65c0-4389-95f1...

Read more...

Revision history for this message
mogliii (mogliii) wrote :

As a workaround I do the following after each restart of the host:

# systemctl start virtlogd
# systemctl start libvirtd

Ubuntu 16.04 with latest packages installed.

Revision history for this message
Rich Art (riccardo-patane-ch) wrote :

Hey mogliii,

Than you very much. It worked.
I first did on the controller node: $ apt-get upgrade

And then on every compute node:

$ sudo service virtlogd restart
 * Restarting libvirt logging daemon /usr/sbin/virtlogd
$ sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 3586

Check:
$ ps -ef | grep libvirt
root 3586 1 0 13:14 ? 00:00:00 /usr/sbin/libvirtd -d
$ ps -ef | grep virtlogd
root 3523 1 0 13:13 ? 00:00:00 /usr/sbin/virtlogd -d

That worked for me and I could create a VM with: $ openstack server create ....

Since it is a workaround, has anybody an idea what the permanent solution would be?

Riccardo

Revision history for this message
mogliii (mogliii) wrote :

This is actually a bit strange as I have now two very similar computers with similar setup, but this issue of virtlogd only appears on one of them. Both have all updates installed.

If anyone can point me to relevant startupscripts/configuration files I can post the content from both machines to help debugging this issue.

Revision history for this message
mogliii (mogliii) wrote :

I found now one difference:
On the working installation, I have "/etc/init.d/libvirt-bin", while this file is "/etc/init.d/libvirtd" on the not working installation.

Below is a diff of the two files. The most important difference might be the "Provides" line? The others are related to the name.

--- workinglibvirt-bin 2016-08-02 19:43:55.999309028 +0900
+++ notworkinglibvirtd 2016-08-02 19:43:17.588065711 +0900
@@ -6,7 +6,7 @@
 # based on the skeletons that comes with dh_make
 #
 ### BEGIN INIT INFO
-# Provides: libvirtd libvirt-bin
+# Provides: libvirtd
 # Required-Start: $network $local_fs $remote_fs $syslog virtlogd
 # Required-Stop: $local_fs $remote_fs $syslog virtlogd
 # Should-Start: avahi-daemon cgconfig
@@ -31,13 +31,13 @@
 DODTIME=1 # Time to wait for the server to die, in seconds

 # Include libvirtd defaults if available
-if [ -f /etc/default/libvirt-bin ] ; then
- . /etc/default/libvirt-bin
+if [ -f /etc/default/libvirtd ] ; then
+ . /etc/default/libvirtd
 fi

 check_start_libvirtd_option() {
   if [ ! "$start_libvirtd" = "yes" ]; then
- log_warning_msg "Not starting libvirt management daemon libvirtd, disabled via /etc/default/libvirt-bin"
+ log_warning_msg "Not starting libvirt management daemon libvirtd, disabled via /etc/default/libvirtd"
     return 1
   else
     return 0
@@ -135,17 +135,17 @@
 }

 wait_on_sockfile() {
- sockfile=/var/run/libvirt/libvirt-sock
- sockfile_check_retries=5
- while [ ! -S $sockfile ]; do
- echo "Waiting for $sockfile - recheck in 2 sec"
- sleep 2
- if ! sockfile_check_retries=`expr $sockfile_check_retries - 1`; then
- echo "Giving up waiting for $sockfile."
- exit 1
- fi
- done
- return 0
+ sockfile=/var/run/libvirt/libvirt-sock
+ while [ ! -S $sockfile ] ; do
+ if ! running ; then
+ # stop/restart/force-stop event triggered before sockfile is created
+ exit 1
+ fi
+ echo "waiting for $sockfile."
+ sleep 0.5
+ done
+ echo "$sockfile ready."
+ return 0
 }

 case "$1" in
@@ -238,7 +238,7 @@
  fi
  ;;
   *)
- N=/etc/init.d/libvirt-bin
+ N=/etc/init.d/libvirtd
  echo "Usage: $N {start|stop|restart|reload|force-reload|status|force-stop}" >&2
  exit 1
  ;;

Revision history for this message
Khairul Aizat Kamarudzzaman (fenris) wrote :

rror starting domain: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 90, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 126, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1402, in startup
    self._backend.create()
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1035, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

http://paste.ubuntu.com/23160863/

Revision history for this message
Mikhail S Medvedev (msmedved) wrote :

I just had the same symptoms after upgrading Ubuntu 16.04 OpenStack Mitaka cluster packages to latest. Some servers had no problems. Very few had the "Failed to connect socket" error when you try to e.g. 'virsh list --all'. Note that libvirtd itself did start, and had no apparent errors. Package reinstall did not work, all suggestions above did not work. I then tried manually running libvirt with

  libvirtd -l

It did start fine, and there was no error connecting to domain this time. After that I stopped it and started the service usual way. No error this time as well. Sorry for not providing more details for package versions etc, putting it here as another possible workaround for reference.

Revision history for this message
Saverio Proto (zioproto) wrote :

I confirm the workaround on commment 14 worked also for me.

I had this bug after upgrading from Trusty to Xenial using Openstack Mitaka.

You can see the problem in nova-compute.log

For Openstack this is very nasty bug because everything will fail to be scheduled on that compute-node, but the neutron port will not be cleaned up. So when finally the instance will be spawned on another compute node, it will come up with two neutron ports and two IP addresses.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Saverio for your ping on this.
I wonder why I haven't seen this yet out of a openstack context, might be just the scale of it but I still wonder. I run all sort of libvirt/qemu upgrade tests on a regular cadence and among others this includes Trusty->Xenial and so far none hit this issue - all guests where fine afterwards and libvirt reachable.
This might be just some sort of race or even special config, but since Mikhail in comment #19 reported to see this on some of more equally configured systems the latter is unlikely.

Since recently in weird "something failed" issues around libvirt there was apparmor involved I wonder if one of you had uncommon apparmor reports in dmesg. There are always a few depending on features you use, but if you could attach the log here we could parse through if there is an unusual one.

Bumping the prio on the Cloud Archive Task if they have seen that or reports of it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Well I can't bump prio, but I'll assign just to get a hit in ones inbox to look for this.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It seems I have no rights whatsoever on UCA bugs, so I pinged on IRC

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@zioproto - also given that the fixes were for Yakkety's dev version 1.3.4 (release with 2.1) and you are seeing this on Trusty(Mitaka?)->Xenial (right?) this might be a different bug we would want to file differently.

Revision history for this message
Jeff Silverman (jeffsilverm) wrote :

I think I am having the same problem. I am running 16.04.3 LTS. I tried apt-get update and apt-get upgrade, but the software revision levels did not change, still at 1.3.1.

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# dpkg -l "*libvirt*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-========================-========================-=================================================================================
ii gir1.2-libvirt-glib-1.0:amd64 0.2.2-0.1ubuntu1 amd64 libvirt glib mainloop integration
ii libvirt-bin 1.3.1-1ubuntu10.12 amd64 programs for the libvirt library
un libvirt-daemon <none> <none> (no description available)
un libvirt-daemon-system <none> <none> (no description available)
ii libvirt-glib-1.0-0:amd64 0.2.2-0.1ubuntu1 amd64 libvirt glib mainloop integration
ii libvirt0:amd64 1.3.1-1ubuntu10.12 amd64 library for interfacing with different virtualization systems
ii python-libvirt 1.3.1-1ubuntu1 amd64 libvirt Python bindings
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

However, I looked at the directory, and I see named sockets but they don't have the names anticipated:

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# virsh net-edit ?
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt/libvirt-sock
ls: cannot access '/var/run/libvirt/libvirt-sock': No such file or directory
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt
total 0
drwxr-xr-x 2 root root 40 Aug 3 12:58 hostdevmgr
drwxr-xr-x 2 root root 40 Aug 3 12:58 lxc
drwxr-xr-x 2 root root 100 Aug 3 23:02 network
drwxr-xr-x 2 root root 40 Aug 3 12:58 qemu
drwxr-xr-x 2 root root 40 Aug 3 12:58 storage
drwxr-xr-x 2 root root 40 Aug 3 12:58 uml-guest
srw-rw-rw- 1 root root 0 Aug 3 12:58 virtlockd-sock
srw-rw-rw- 1 root root 0 Aug 3 12:58 virtlogd-sock
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

What do I do next?

Thank you

Jeff

Revision history for this message
Jeff Silverman (jeffsilverm) wrote :

There is something else I find confusing: the output I get from lsof doesn't agree with what I see in the file system. At this point, my knowledge of sockets is getting stretched.

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# pgrep virt
21491
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# lsof -p 21491 | fgrep var
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
libvirtd 21491 root 3u unix 0xffff8803a04b3800 0t0 136181 /var/run/libvirt/libvirt-sock type=STREAM
libvirtd 21491 root 4u unix 0xffff8803a04b7c00 0t0 136182 /var/run/libvirt/libvirt-sock-ro type=STREAM
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt/lib*
ls: cannot access '/var/run/libvirt/lib*': No such file or directory
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

Thank you

Jeff

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.5 KiB)

I still think this is a "new" issue and we should keep the old one fix released and upen a new bug, but for a minor discussion here an update:

If the socket is not around usually the service isn't running.
Which would make sense for the workaround in comment #14 that essentially restarted things.
But your examples show the socket still opened by the PID, just not available at the path.

I wonder if in your cases some part of the upgrade procedure eliminated the directories (and recreated them) but leaving the old sockets open.

That might explain why you see no file via "ls" but you see one in lsof.

I tried to take a Trusty and upgrade to UCA-Mitaka as mentioned as one of the triggering cases.
Any virsh command should do, but using the net-edit as reported int he comments to be sure.

Before:
$ service libvirt-bin status
libvirt-bin start/running, process 5305
$ virsh list
 Id Name State
----------------------------------------------------
 2 kvmtest running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug 7 09:23 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 5305 root 12u unix 0xffff880152537c00 0t0 13001500 /var/run/libvirt/libvirt-sock

Ok all seems normal before the upgrade, service is running and working.
The socket is around in file-system and owned by the service's PID.

Now upgrading to UCA-Mitaka - the upgrade worked and lifted the versions to current Mitaka
(at the moment 1.3.1-1ubuntu10.11~cloud0).

$ service libvirt-bin status
libvirt-bin start/running, process 7266
$ virsh list
 Id Name State
----------------------------------------------------
 2 kvmtest running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 7266 root 12u unix 0xffff88001d404c00 0t0 13031405 /var/run/libvirt/libvirt-sock

Ok, so a "normal" upgrade does not trigger this in general.
We'd need someone who is in the error state to debug what happened to push his system into this state.

@Jeff (and others) - you seem to have the socket still open by the process, but not in the path.
Are there any mounts over that path that might hide them?
Anything on the upgrade output that might indicate a failed restart or something like it?

Issue could be something like:
# ls -l /var/run/libvirt/lib*
srwxrwx--- 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock
srwxrwxrwx 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock-ro
# mkdir /tmp/test
# mount -o bind /tmp/test /var/run/libvirt
# ls -l /var/run/libvirt/lib*
ls: cannot access /var/run/libvirt/lib*: No such file or directory
# virsh list
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Also the following would lead to such a case:
# rm /var/run/libvirt/libvirt-sock
# virsh list
error: Failed to connect socket to '/var/run/...

Read more...

Changed in cloud-archive:
status: New → Incomplete
Revision history for this message
Marc Gariépy (mgariepy) wrote :

Just had this issue with upgrading from Newton > Ocata via Openstack-ansible.

the issue I noticed it that is seems that only a reload "systemctl reload virtlogd.service" isn't enough.

after instlalling the update, I can list the instances with "virsh list" but when I use nova to interract with the vms (start,stop,create,...) virsh just hangs.

restarting libvirtd make the virsh list working again but nova cannot do actions on it.

restarting virtlogd seems to fixe the issue, maybe the pkg update should restart the service instead of just reloading it ?

here is the update i installed on the server:
2018-01-25 21:16:14 upgrade libvirt-bin:amd64 1.3.1-1ubuntu10.15 2.5.0-3ubuntu5.6~cloud0
2018-01-25 21:16:15 upgrade libvirt0:amd64 1.3.1-1ubuntu10.15 2.5.0-3ubuntu5.6~cloud0
2018-01-25 21:16:20 upgrade python-libvirt:amd64 1.3.1-1ubuntu1.1 3.0.0-2~cloud0

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Newton->Ocata would be libvirt 1.3.1 (Xenial) -> 2.5 Upgrade (Zesty).

I tried two updates:
1. X-Newton -> X-Ocata
2. Xenial -> Zesty (EOL, so used Artful with 3.6 here)

Test according to commend #28 was:
1. get on X/X-N a guest
2. check status of virtlogd
3. upgrade
4. virsh shutdown/start after upgrade (which is supposed to hang)
5. checking virtlogd service status

The service was still:
  active (running) since Mon 2018-01-29 09:12:42 UTC; 8min ago

But you see on the 8 min ago, it was still "the old service" it was still on the same PID.
But the reload that is supposed to happen was triggered one could see the new entries in the log:
Jan 29 09:21:21 xenial-uca-N-O-upgrade systemd[1]: Reloading Virtual machine log manager.
Jan 29 09:21:21 xenial-uca-N-O-upgrade systemd[1]: Reloaded Virtual machine log manager.
Jan 29 09:21:21 xenial-uca-N-O-upgrade virtlogd[3965]: libvirt version: 2.5.0, package: 3ubuntu5.6~cloud0 (Openstack U
Jan 29 09:21:21 xenial-uca-N-O-upgrade virtlogd[3965]: hostname: xenial-uca-N-O-upgrade.lxd

But not only list, but also stop/start worked just fine.
Something must be different in your case - any suggestion which part of the setup/config might be special in your case?

Revision history for this message
Marc Gariépy (mgariepy) wrote :

did you use virsh to stop/start the vms ? or you did it via Nova ?

In my case I can list the vm just fine after a libvirtd restart, it's only when nova is trying to stop/start or create new instance that it hangs.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I used virsh to start/stop them as that is how comment #28 described it.
"virsh just hangs".

With more context is it that you interact with nova on it - and then while that is kind of "in transaction" virsh is hung by it?

Revision history for this message
Marc Gariépy (mgariepy) wrote :

Sorry about the confusion.

1- update libvirt
2- virst list (I see running vms)
3- try to stop the vm with nova, "openstack server stop uuid"
4- virsh list. (now it hangs)

after restarting virtlogd and libvirtd, everything seems to work correctly again.

anyhow, how does a reload of those process could even work ?
If the PID is kept the same it's defenetly not the "new version" that is running.
I think it would be better to restart the services instead of only reloading them.

Thanks

Marc

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> Sorry about the confusion.

Never mind, now things are clear - thanks!

> 1- update libvirt
> 2- virst list (I see running vms)
> 3- try to stop the vm with nova, "openstack server stop uuid"
> 4- virsh list. (now it hangs)
>

I'd need insight from the OpenStack Team on this how nova might interfere here.
This is out of my area of expertise now - but UCA Team is already
subscribed so that should be ok.

>
> anyhow, how does a reload of those process could even work ?
> If the PID is kept the same it's defenitely not the "new version" that is running.
> I think it would be better to restart the services instead of only reloading them.

This is intentional and ok.
If you'd fully restart the service it would loose some of the logs.
Instead see [1] in section "signals" virtlogd is designed to reload
itself via exeve.
This is how it is same-pid but new code while maitaining open logs.

[1]: http://manpages.ubuntu.com/manpages/bionic/en/man8/virtlogd.8.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.