Bug #1583009 “Error starting domain since update” : Bugs : Ubuntu Cloud Archive

Revision history for this message

Kev Bowring (flocculant) wrote on 2016-05-18:

#1

Dependencies.txt Edit (6.8 KiB, text/plain; charset="utf-8")
JournalErrors.txt Edit (11.4 KiB, text/plain; charset="utf-8")
KernLog.txt Edit (5.3 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (106 bytes, text/plain; charset="utf-8")
RelatedPackageVersions.txt Edit (164 bytes, text/plain; charset="utf-8")

Chris J Arges (arges) on 2016-05-18

Changed in libvirt (Ubuntu):
assignee:	nobody → Chris J Arges (arges)
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Chris J Arges (arges) wrote on 2016-05-18:

#2

I can reproduce this bug.

Changed in libvirt (Ubuntu):
status:	Triaged → In Progress

Revision history for this message

Chris J Arges (arges) wrote on 2016-05-18:

#3

Uploaded a fix for Yakkety.

Revision history for this message

Kev Bowring (flocculant) wrote on 2016-05-19:

#4

Thanks - grabbed it from -proposed :)

Revision history for this message

Kev Bowring (flocculant) wrote on 2016-05-20:

#5

Got updates to virtinst and virt-manager today - issue is back.

Revision history for this message

Chris J Arges (arges) wrote on 2016-05-20:

#6

Just did an update again and things work for me. Can you confirm which versions of virt-manager/libvirt you are using? In addition is it the same error message?

Revision history for this message

Kev Bowring (flocculant) wrote on 2016-05-20:

#7

Screenshot_2016-05-20_17-25-19.png Edit (9.7 KiB, image/png)

ii libvirt-bin 1.3.4-1ubuntu2
ii libvirt-clients 1.3.4-1ubuntu2
ii libvirt-daemon 1.3.4-1ubuntu2
ii libvirt-daemon-system 1.3.4-1ubuntu2
ii libvirt-glib-1.0-0:amd64 0.2.2-0.1ubuntu1
ii libvirt0:amd64 1.3.4-1ubuntu2

ii virt-manager 1:1.3.2-3ubuntu2
ii virt-viewer 3.1-1
ii virtinst 1:1.3.2-3ubuntu2

screenshot of error attached.

list of files in

drwxr-xr-x 2 root root 40 May 20 07:05 hostdevmgr
srwxrwx--- 1 root libvirtd 0 May 20 07:05 libvirt-sock
srwxrwxrwx 1 root libvirtd 0 May 20 07:05 libvirt-sock-ro
drwxr-xr-x 2 root root 40 May 20 07:05 lxc
drwxr-xr-x 2 root root 100 May 20 07:05 network
drwxr-xr-x 2 root root 40 May 20 07:05 qemu
drwxr-xr-x 2 root root 40 May 20 07:05 storage
drwxr-xr-x 2 root root 40 May 20 07:05 uml-guest

Details of error (if that helps)

Error starting domain: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 90, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 126, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1402, in startup
    self._backend.create()
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1035, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

Revision history for this message

Kev Bowring (flocculant) wrote on 2016-05-20:

#8

mmk - so when I go look at the versions I've got and check them against packages.ubuntu.com, then remove them all, and reinstall - I get different versions.

Not sure how I did - I suspect yesterday and grabbing stuff from -proposed.

Long and short is - it works with what I *now* have installed.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-05-23:

#9

This bug was fixed in the package libvirt - 1.3.4-1ubuntu4

---------------
libvirt (1.3.4-1ubuntu4) yakkety; urgency=medium

  * Re-enable the upstart job by renaming the file.
  * Include patchby @guessi to continally wait for libvirtd to start when
    using sysvinit or upstart. (LP: #1571209)

-- Serge Hallyn <email address hidden> Mon, 23 May 2016 13:50:22 -0500

Changed in libvirt (Ubuntu):
status:	In Progress → Fix Released

Revision history for this message

Kev Bowring (flocculant) wrote on 2016-05-25:

#10

and broken again

Revision history for this message

Thiago Martins (martinx) wrote on 2016-06-13:

#11

I'm seeing this problem while trying to launch an Instance on OpenStack Mitaka on Xenial.

Revision history for this message

Chris J Arges (arges) wrote on 2016-06-14: Re: [Bug 1583009] Re: Error starting domain since update

#12

Hi,
If you are seeing this problem can you give me the versions of libvirt and
qemu you are using?
Thanks,
--chris

On Mon, Jun 13, 2016 at 5:20 PM, Thiago Martins <email address hidden>
wrote:

> I'm seeing this problem while trying to launch an Instance on OpenStack
> Mitaka on Xenial.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1583009
>
> Title:
> Error starting domain since update
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1583009/+subscriptions
>

Revision history for this message

Rich Art (riccardo-patane-ch) wrote on 2016-06-29:

#13

Download full text (3.3 KiB)

Hi there,

I am also having same troubles. This is in the controller /var/log/nova/nova-conductor.log file after trying to create a new instance as described in http://docs.openstack.org/mitaka/install-guide-ubuntu/launch-instance-provider.html:

2016-06-29 13:27:03.084 2828 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n04 (node n04.mgmt.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:21.459 2828 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n01 (node n01.mgmt.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:36.501 2827 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n02 (node n02.mgmt.local): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:36.508 2827 WARNING nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance b5020d53-2e48-453b-bec9-cbca4fac5708. Last exception: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

2016-06-29 13:27:36.513 2827 WARNING nova.scheduler.utils [req-756b7528-65c0-4389-95f1...

Hi there,

I am also having same troubles. This is in the controller /var/log/nova/nova-conductor.log file after trying to create a new instance as described in http://docs.openstack.org/mitaka/install-guide-ubuntu/launch-instance-provider.html:

2016-06-29 13:27:03.084 2828 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n04 (node n04.mgmt.local): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:21.459 2828 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n01 (node n01.mgmt.local): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:36.501 2827 ERROR nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Error from last host: n02 (node n02.mgmt.local): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance b5020d53-2e48-453b-bec9-cbca4fac5708 was re-scheduled: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory\n"]

2016-06-29 13:27:36.508 2827 WARNING nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance b5020d53-2e48-453b-bec9-cbca4fac5708. Last exception: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

2016-06-29 13:27:36.513 2827 WARNING nova.scheduler.utils [req-756b7528-65c0-4389-95f1-146ccfd4dabf 048e5bedaf17478da2bb2c52829cf561 e236c9a8172e4f7ca3b0c9cb37891ab8 - - -] [instance: b5020d53-2e48-453b-bec9-cbca4fac5708] Setting instance to ERROR state.

Somebody can help?

Thank you very much.
Richard

Revision history for this message

mogliii (mogliii) wrote on 2016-06-30:

#14

As a workaround I do the following after each restart of the host:

# systemctl start virtlogd
# systemctl start libvirtd

Ubuntu 16.04 with latest packages installed.

Revision history for this message

Rich Art (riccardo-patane-ch) wrote on 2016-07-01:

#15

Hey mogliii,

Than you very much. It worked.
I first did on the controller node: $ apt-get upgrade

And then on every compute node:

$ sudo service virtlogd restart
* Restarting libvirt logging daemon /usr/sbin/virtlogd
$ sudo service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 3586

Check:
$ ps -ef | grep libvirt
root 3586 1 0 13:14 ? 00:00:00 /usr/sbin/libvirtd -d
$ ps -ef | grep virtlogd
root 3523 1 0 13:13 ? 00:00:00 /usr/sbin/virtlogd -d

That worked for me and I could create a VM with: $ openstack server create ....

Since it is a workaround, has anybody an idea what the permanent solution would be?

Riccardo

Revision history for this message

mogliii (mogliii) wrote on 2016-07-21:

#16

This is actually a bit strange as I have now two very similar computers with similar setup, but this issue of virtlogd only appears on one of them. Both have all updates installed.

If anyone can point me to relevant startupscripts/configuration files I can post the content from both machines to help debugging this issue.

Revision history for this message

mogliii (mogliii) wrote on 2016-08-02:

#17

I found now one difference:
On the working installation, I have "/etc/init.d/libvirt-bin", while this file is "/etc/init.d/libvirtd" on the not working installation.

Below is a diff of the two files. The most important difference might be the "Provides" line? The others are related to the name.

--- workinglibvirt-bin 2016-08-02 19:43:55.999309028 +0900
+++ notworkinglibvirtd 2016-08-02 19:43:17.588065711 +0900
@@ -6,7 +6,7 @@
# based on the skeletons that comes with dh_make
#
### BEGIN INIT INFO
-# Provides: libvirtd libvirt-bin
+# Provides: libvirtd
# Required-Start: $network $local_fs $remote_fs $syslog virtlogd
# Required-Stop: $local_fs $remote_fs $syslog virtlogd
# Should-Start: avahi-daemon cgconfig
@@ -31,13 +31,13 @@
DODTIME=1 # Time to wait for the server to die, in seconds

# Include libvirtd defaults if available
-if [ -f /etc/default/libvirt-bin ] ; then
- . /etc/default/libvirt-bin
+if [ -f /etc/default/libvirtd ] ; then
+ . /etc/default/libvirtd
fi

check_start_libvirtd_option() {
   if [ ! "$start_libvirtd" = "yes" ]; then
- log_warning_msg "Not starting libvirt management daemon libvirtd, disabled via /etc/default/libvirt-bin"
+ log_warning_msg "Not starting libvirt management daemon libvirtd, disabled via /etc/default/libvirtd"
     return 1
   else
     return 0
@@ -135,17 +135,17 @@
}

wait_on_sockfile() {
- sockfile=/var/run/libvirt/libvirt-sock
- sockfile_check_retries=5
- while [ ! -S $sockfile ]; do
- echo "Waiting for $sockfile - recheck in 2 sec"
- sleep 2
- if ! sockfile_check_retries=`expr $sockfile_check_retries - 1`; then
- echo "Giving up waiting for $sockfile."
- exit 1
- fi
- done
- return 0
+ sockfile=/var/run/libvirt/libvirt-sock
+ while [ ! -S $sockfile ] ; do
+ if ! running ; then
+ # stop/restart/force-stop event triggered before sockfile is created
+ exit 1
+ fi
+ echo "waiting for $sockfile."
+ sleep 0.5
+ done
+ echo "$sockfile ready."
+ return 0
}

case "$1" in
@@ -238,7 +238,7 @@
  fi
  ;;
   *)
- N=/etc/init.d/libvirt-bin
+ N=/etc/init.d/libvirtd
  echo "Usage: $N {start|stop|restart|reload|force-reload|status|force-stop}" >&2
  exit 1
  ;;

I found now one difference:
On the working installation, I have "/etc/init.d/libvirt-bin", while this file is "/etc/init.d/libvirtd" on the not working installation.

Below is a diff of the two files. The most important difference might be the "Provides" line? The others are related to the name.

--- workinglibvirt-bin	2016-08-02 19:43:55.999309028 +0900
+++ notworkinglibvirtd	2016-08-02 19:43:17.588065711 +0900
@@ -6,7 +6,7 @@
 # based on the skeletons that comes with dh_make
 #
 ### BEGIN INIT INFO
-# Provides:          libvirtd libvirt-bin
+# Provides:          libvirtd
 # Required-Start:    $network $local_fs $remote_fs $syslog virtlogd
 # Required-Stop:     $local_fs $remote_fs $syslog virtlogd
 # Should-Start:      avahi-daemon cgconfig
@@ -31,13 +31,13 @@
 DODTIME=1                   # Time to wait for the server to die, in seconds
 
 # Include libvirtd defaults if available
-if [ -f /etc/default/libvirt-bin ] ; then
-	. /etc/default/libvirt-bin
+if [ -f /etc/default/libvirtd ] ; then
+	. /etc/default/libvirtd
 fi
 
 check_start_libvirtd_option() {
   if [ ! "$start_libvirtd" = "yes" ]; then
-    log_warning_msg "Not starting libvirt management daemon libvirtd, disabled via /etc/default/libvirt-bin"
+    log_warning_msg "Not starting libvirt management daemon libvirtd, disabled via /etc/default/libvirtd"
     return 1
   else
     return 0
@@ -135,17 +135,17 @@
 }
 
 wait_on_sockfile() {
-	sockfile=/var/run/libvirt/libvirt-sock
-	sockfile_check_retries=5
-	while [ ! -S $sockfile ]; do
-		echo "Waiting for $sockfile - recheck in 2 sec"
-		sleep 2
-		if ! sockfile_check_retries=`expr $sockfile_check_retries - 1`; then
-			echo "Giving up waiting for $sockfile."
-			exit 1
-		fi
-	done
-	return 0
+    sockfile=/var/run/libvirt/libvirt-sock
+    while [ ! -S $sockfile ] ; do
+        if ! running ; then
+            # stop/restart/force-stop event triggered before sockfile is created
+            exit 1
+        fi
+        echo "waiting for $sockfile."
+        sleep 0.5
+    done
+    echo "$sockfile ready."
+    return 0
 }
 
 case "$1" in
@@ -238,7 +238,7 @@
 	fi
 	;;
   *)
-	N=/etc/init.d/libvirt-bin
+	N=/etc/init.d/libvirtd
 	echo "Usage: $N {start|stop|restart|reload|force-reload|status|force-stop}" >&2
 	exit 1
 	;;

Revision history for this message

Khairul Aizat Kamarudzzaman (fenris) wrote on 2016-09-10:

#18

rror starting domain: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 90, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 126, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1402, in startup
    self._backend.create()
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1035, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

http://paste.ubuntu.com/23160863/

Revision history for this message

Mikhail S Medvedev (msmedved) wrote on 2016-10-19:

#19

I just had the same symptoms after upgrading Ubuntu 16.04 OpenStack Mitaka cluster packages to latest. Some servers had no problems. Very few had the "Failed to connect socket" error when you try to e.g. 'virsh list --all'. Note that libvirtd itself did start, and had no apparent errors. Package reinstall did not work, all suggestions above did not work. I then tried manually running libvirt with

libvirtd -l

It did start fine, and there was no error connecting to domain this time. After that I stopped it and started the service usual way. No error this time as well. Sorry for not providing more details for package versions etc, putting it here as another possible workaround for reference.

Revision history for this message

Saverio Proto (zioproto) wrote on 2017-04-10:

#20

I confirm the workaround on commment 14 worked also for me.

I had this bug after upgrading from Trusty to Xenial using Openstack Mitaka.

You can see the problem in nova-compute.log

For Openstack this is very nasty bug because everything will fail to be scheduled on that compute-node, but the neutron port will not be cleaned up. So when finally the instance will be spawned on another compute node, it will come up with two neutron ports and two IP addresses.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-04-18:

#21

Thanks Saverio for your ping on this.
I wonder why I haven't seen this yet out of a openstack context, might be just the scale of it but I still wonder. I run all sort of libvirt/qemu upgrade tests on a regular cadence and among others this includes Trusty->Xenial and so far none hit this issue - all guests where fine afterwards and libvirt reachable.
This might be just some sort of race or even special config, but since Mikhail in comment #19 reported to see this on some of more equally configured systems the latter is unlikely.

Since recently in weird "something failed" issues around libvirt there was apparmor involved I wonder if one of you had uncommon apparmor reports in dmesg. There are always a few depending on features you use, but if you could attach the log here we could parse through if there is an unusual one.

Bumping the prio on the Cloud Archive Task if they have seen that or reports of it.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-04-18:

#22

Well I can't bump prio, but I'll assign just to get a hit in ones inbox to look for this.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-04-18:

#23

It seems I have no rights whatsoever on UCA bugs, so I pinged on IRC

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-04-18:

#24

@zioproto - also given that the fixes were for Yakkety's dev version 1.3.4 (release with 2.1) and you are seeing this on Trusty(Mitaka?)->Xenial (right?) this might be a different bug we would want to file differently.

Revision history for this message

Jeff Silverman (jeffsilverm) wrote on 2017-08-04:

#25

I think I am having the same problem. I am running 16.04.3 LTS. I tried apt-get update and apt-get upgrade, but the software revision levels did not change, still at 1.3.1.

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# dpkg -l "*libvirt*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-========================-========================-=================================================================================
ii gir1.2-libvirt-glib-1.0:amd64 0.2.2-0.1ubuntu1 amd64 libvirt glib mainloop integration
ii libvirt-bin 1.3.1-1ubuntu10.12 amd64 programs for the libvirt library
un libvirt-daemon <none> <none> (no description available)
un libvirt-daemon-system <none> <none> (no description available)
ii libvirt-glib-1.0-0:amd64 0.2.2-0.1ubuntu1 amd64 libvirt glib mainloop integration
ii libvirt0:amd64 1.3.1-1ubuntu10.12 amd64 library for interfacing with different virtualization systems
ii python-libvirt 1.3.1-1ubuntu1 amd64 libvirt Python bindings
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

However, I looked at the directory, and I see named sockets but they don't have the names anticipated:

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# virsh net-edit ?
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt/libvirt-sock
ls: cannot access '/var/run/libvirt/libvirt-sock': No such file or directory
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt
total 0
drwxr-xr-x 2 root root 40 Aug 3 12:58 hostdevmgr
drwxr-xr-x 2 root root 40 Aug 3 12:58 lxc
drwxr-xr-x 2 root root 100 Aug 3 23:02 network
drwxr-xr-x 2 root root 40 Aug 3 12:58 qemu
drwxr-xr-x 2 root root 40 Aug 3 12:58 storage
drwxr-xr-x 2 root root 40 Aug 3 12:58 uml-guest
srw-rw-rw- 1 root root 0 Aug 3 12:58 virtlockd-sock
srw-rw-rw- 1 root root 0 Aug 3 12:58 virtlogd-sock
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

What do I do next?

Thank you

Jeff

I think I am having the same problem.  I am running 16.04.3 LTS.  I tried apt-get update and apt-get upgrade, but the software revision levels did not change, still at 1.3.1.

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# dpkg -l "*libvirt*" 
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                   Version                  Architecture             Description
+++-======================================-========================-========================-=================================================================================
ii  gir1.2-libvirt-glib-1.0:amd64          0.2.2-0.1ubuntu1         amd64                    libvirt glib mainloop integration
ii  libvirt-bin                            1.3.1-1ubuntu10.12       amd64                    programs for the libvirt library
un  libvirt-daemon                         <none>                   <none>                   (no description available)
un  libvirt-daemon-system                  <none>                   <none>                   (no description available)
ii  libvirt-glib-1.0-0:amd64               0.2.2-0.1ubuntu1         amd64                    libvirt glib mainloop integration
ii  libvirt0:amd64                         1.3.1-1ubuntu10.12       amd64                    library for interfacing with different virtualization systems
ii  python-libvirt                         1.3.1-1ubuntu1           amd64                    libvirt Python bindings
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

However, I looked at the directory, and I see named sockets but they don't have the names anticipated:

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# virsh net-edit ?
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt/libvirt-sock
ls: cannot access '/var/run/libvirt/libvirt-sock': No such file or directory
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt
total 0
drwxr-xr-x 2 root root  40 Aug  3 12:58 hostdevmgr
drwxr-xr-x 2 root root  40 Aug  3 12:58 lxc
drwxr-xr-x 2 root root 100 Aug  3 23:02 network
drwxr-xr-x 2 root root  40 Aug  3 12:58 qemu
drwxr-xr-x 2 root root  40 Aug  3 12:58 storage
drwxr-xr-x 2 root root  40 Aug  3 12:58 uml-guest
srw-rw-rw- 1 root root   0 Aug  3 12:58 virtlockd-sock
srw-rw-rw- 1 root root   0 Aug  3 12:58 virtlogd-sock
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

What do I do next?

Thank you

Jeff

Revision history for this message

Jeff Silverman (jeffsilverm) wrote on 2017-08-04:

#26

There is something else I find confusing: the output I get from lsof doesn't agree with what I see in the file system. At this point, my knowledge of sockets is getting stretched.

root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# pgrep virt
21491
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# lsof -p 21491 | fgrep var
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
libvirtd 21491 root 3u unix 0xffff8803a04b3800 0t0 136181 /var/run/libvirt/libvirt-sock type=STREAM
libvirtd 21491 root 4u unix 0xffff8803a04b7c00 0t0 136182 /var/run/libvirt/libvirt-sock-ro type=STREAM
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13# ls -l /var/run/libvirt/lib*
ls: cannot access '/var/run/libvirt/lib*': No such file or directory
root@jeff-desktop:/home/jeffs/work/juniper/vmx-17.2R.13#

Thank you

Jeff

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-08-07:

#27

Download full text (3.5 KiB)

I still think this is a "new" issue and we should keep the old one fix released and upen a new bug, but for a minor discussion here an update:

If the socket is not around usually the service isn't running.
Which would make sense for the workaround in comment #14 that essentially restarted things.
But your examples show the socket still opened by the PID, just not available at the path.

I wonder if in your cases some part of the upgrade procedure eliminated the directories (and recreated them) but leaving the old sockets open.

That might explain why you see no file via "ls" but you see one in lsof.

I tried to take a Trusty and upgrade to UCA-Mitaka as mentioned as one of the triggering cases.
Any virsh command should do, but using the net-edit as reported int he comments to be sure.

Before:
$ service libvirt-bin status
libvirt-bin start/running, process 5305
$ virsh list
Id Name State
----------------------------------------------------
2 kvmtest running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug 7 09:23 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 5305 root 12u unix 0xffff880152537c00 0t0 13001500 /var/run/libvirt/libvirt-sock

Ok all seems normal before the upgrade, service is running and working.
The socket is around in file-system and owned by the service's PID.

Now upgrading to UCA-Mitaka - the upgrade worked and lifted the versions to current Mitaka
(at the moment 1.3.1-1ubuntu10.11~cloud0).

$ service libvirt-bin status
libvirt-bin start/running, process 7266
$ virsh list
Id Name State
----------------------------------------------------
2 kvmtest running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 7266 root 12u unix 0xffff88001d404c00 0t0 13031405 /var/run/libvirt/libvirt-sock

Ok, so a "normal" upgrade does not trigger this in general.
We'd need someone who is in the error state to debug what happened to push his system into this state.

@Jeff (and others) - you seem to have the socket still open by the process, but not in the path.
Are there any mounts over that path that might hide them?
Anything on the upgrade output that might indicate a failed restart or something like it?

Issue could be something like:
# ls -l /var/run/libvirt/lib*
srwxrwx--- 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock
srwxrwxrwx 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock-ro
# mkdir /tmp/test
# mount -o bind /tmp/test /var/run/libvirt
# ls -l /var/run/libvirt/lib*
ls: cannot access /var/run/libvirt/lib*: No such file or directory
# virsh list
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Also the following would lead to such a case:
# rm /var/run/libvirt/libvirt-sock
# virsh list
error: Failed to connect socket to '/var/run/...

I still think this is a "new" issue and we should keep the old one fix released and upen a new bug, but for a minor discussion here an update:

If the socket is not around usually the service isn't running.
Which would make sense for the workaround in comment #14 that essentially restarted things.
But your examples show the socket still opened by the PID, just not available at the path.

I wonder if in your cases some part of the upgrade procedure eliminated the directories (and recreated them) but leaving the old sockets open.

That might explain why you see no file via "ls" but you see one in lsof.

I tried to take a Trusty and upgrade to UCA-Mitaka as mentioned as one of the triggering cases.
Any virsh command should do, but using the net-edit as reported int he comments to be sure.

Before:
$ service libvirt-bin status
libvirt-bin start/running, process 5305
$ virsh list
 Id    Name                           State
----------------------------------------------------
 2     kvmtest                        running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug  7 09:23 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
libvirtd 5305 root   12u  unix 0xffff880152537c00      0t0 13001500 /var/run/libvirt/libvirt-sock

Ok all seems normal before the upgrade, service is running and working.
The socket is around in file-system and owned by the service's PID.

Now upgrading to UCA-Mitaka - the upgrade worked and lifted the versions to current Mitaka
(at the moment 1.3.1-1ubuntu10.11~cloud0).

$ service libvirt-bin status
libvirt-bin start/running, process 7266
$ virsh list
 Id    Name                           State
----------------------------------------------------
 2     kvmtest                        running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug  7 09:33 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
libvirtd 7266 root   12u  unix 0xffff88001d404c00      0t0 13031405 /var/run/libvirt/libvirt-sock

Ok, so a "normal" upgrade does not trigger this in general.
We'd need someone who is in the error state to debug what happened to push his system into this state.

@Jeff (and others) - you seem to have the socket still open by the process, but not in the path.
Are there any mounts over that path that might hide them?
Anything on the upgrade output that might indicate a failed restart or something like it?

Issue could be something like:
# ls -l /var/run/libvirt/lib*
srwxrwx--- 1 root libvirtd 0 Aug  7 09:33 /var/run/libvirt/libvirt-sock
srwxrwxrwx 1 root libvirtd 0 Aug  7 09:33 /var/run/libvirt/libvirt-sock-ro
# mkdir /tmp/test
# mount -o bind /tmp/test /var/run/libvirt
# ls -l /var/run/libvirt/lib*
ls: cannot access /var/run/libvirt/lib*: No such file or directory
# virsh list
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Also the following would lead to such a case:
# rm /var/run/libvirt/libvirt-sock
# virsh list
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
# lsof -p 7266 | grep -- libvirt-sock
libvirtd 7266 root   12u     unix 0xffff88001d404c00      0t0   13031405 /var/run/libvirt/libvirt-sock

So the question is - where would a wild rm or mount come from in your cases.
Setting the Cloud-Archive Task to incomplete until info is provided that allows further debugging.

Changed in cloud-archive:
status:	New → Incomplete

Revision history for this message

Marc Gariépy (mgariepy) wrote on 2018-01-26:

#28

Just had this issue with upgrading from Newton > Ocata via Openstack-ansible.

the issue I noticed it that is seems that only a reload "systemctl reload virtlogd.service" isn't enough.

after instlalling the update, I can list the instances with "virsh list" but when I use nova to interract with the vms (start,stop,create,...) virsh just hangs.

restarting libvirtd make the virsh list working again but nova cannot do actions on it.

restarting virtlogd seems to fixe the issue, maybe the pkg update should restart the service instead of just reloading it ?

here is the update i installed on the server:
2018-01-25 21:16:14 upgrade libvirt-bin:amd64 1.3.1-1ubuntu10.15 2.5.0-3ubuntu5.6~cloud0
2018-01-25 21:16:15 upgrade libvirt0:amd64 1.3.1-1ubuntu10.15 2.5.0-3ubuntu5.6~cloud0
2018-01-25 21:16:20 upgrade python-libvirt:amd64 1.3.1-1ubuntu1.1 3.0.0-2~cloud0

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-01-29:

#29

Newton->Ocata would be libvirt 1.3.1 (Xenial) -> 2.5 Upgrade (Zesty).

I tried two updates:
1. X-Newton -> X-Ocata
2. Xenial -> Zesty (EOL, so used Artful with 3.6 here)

Test according to commend #28 was:
1. get on X/X-N a guest
2. check status of virtlogd
3. upgrade
4. virsh shutdown/start after upgrade (which is supposed to hang)
5. checking virtlogd service status

The service was still:
active (running) since Mon 2018-01-29 09:12:42 UTC; 8min ago

But you see on the 8 min ago, it was still "the old service" it was still on the same PID.
But the reload that is supposed to happen was triggered one could see the new entries in the log:
Jan 29 09:21:21 xenial-uca-N-O-upgrade systemd[1]: Reloading Virtual machine log manager.
Jan 29 09:21:21 xenial-uca-N-O-upgrade systemd[1]: Reloaded Virtual machine log manager.
Jan 29 09:21:21 xenial-uca-N-O-upgrade virtlogd[3965]: libvirt version: 2.5.0, package: 3ubuntu5.6~cloud0 (Openstack U
Jan 29 09:21:21 xenial-uca-N-O-upgrade virtlogd[3965]: hostname: xenial-uca-N-O-upgrade.lxd

But not only list, but also stop/start worked just fine.
Something must be different in your case - any suggestion which part of the setup/config might be special in your case?

Revision history for this message

Marc Gariépy (mgariepy) wrote on 2018-01-29:

#30

did you use virsh to stop/start the vms ? or you did it via Nova ?

In my case I can list the vm just fine after a libvirtd restart, it's only when nova is trying to stop/start or create new instance that it hangs.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-01-29:

#31

I used virsh to start/stop them as that is how comment #28 described it.
"virsh just hangs".

With more context is it that you interact with nova on it - and then while that is kind of "in transaction" virsh is hung by it?

Revision history for this message

Marc Gariépy (mgariepy) wrote on 2018-01-29:

#32

Sorry about the confusion.

1- update libvirt
2- virst list (I see running vms)
3- try to stop the vm with nova, "openstack server stop uuid"
4- virsh list. (now it hangs)

after restarting virtlogd and libvirtd, everything seems to work correctly again.

anyhow, how does a reload of those process could even work ?
If the PID is kept the same it's defenetly not the "new version" that is running.
I think it would be better to restart the services instead of only reloading them.

Thanks

Marc

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-01-30:

#33

> Sorry about the confusion.

Never mind, now things are clear - thanks!

> 1- update libvirt
> 2- virst list (I see running vms)
> 3- try to stop the vm with nova, "openstack server stop uuid"
> 4- virsh list. (now it hangs)
>

I'd need insight from the OpenStack Team on this how nova might interfere here.
This is out of my area of expertise now - but UCA Team is already
subscribed so that should be ok.

>
> anyhow, how does a reload of those process could even work ?
> If the PID is kept the same it's defenitely not the "new version" that is running.
> I think it would be better to restart the services instead of only reloading them.

This is intentional and ok.
If you'd fully restart the service it would loose some of the logs.
Instead see [1] in section "signals" virtlogd is designed to reload
itself via exeve.
This is how it is same-pid but new code while maitaining open logs.

[1]: http://manpages.ubuntu.com/manpages/bionic/en/man8/virtlogd.8.html

Ubuntu Cloud Archive

Error starting domain since update

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	Ubuntu Cloud Archive	Incomplete	Undecided	Unassigned
	libvirt (Ubuntu)	Fix Released	Medium	Chris J Arges