Comment 27 for bug 1583009

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I still think this is a "new" issue and we should keep the old one fix released and upen a new bug, but for a minor discussion here an update:

If the socket is not around usually the service isn't running.
Which would make sense for the workaround in comment #14 that essentially restarted things.
But your examples show the socket still opened by the PID, just not available at the path.

I wonder if in your cases some part of the upgrade procedure eliminated the directories (and recreated them) but leaving the old sockets open.

That might explain why you see no file via "ls" but you see one in lsof.

I tried to take a Trusty and upgrade to UCA-Mitaka as mentioned as one of the triggering cases.
Any virsh command should do, but using the net-edit as reported int he comments to be sure.

Before:
$ service libvirt-bin status
libvirt-bin start/running, process 5305
$ virsh list
 Id Name State
----------------------------------------------------
 2 kvmtest running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug 7 09:23 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 5305 root 12u unix 0xffff880152537c00 0t0 13001500 /var/run/libvirt/libvirt-sock

Ok all seems normal before the upgrade, service is running and working.
The socket is around in file-system and owned by the service's PID.

Now upgrading to UCA-Mitaka - the upgrade worked and lifted the versions to current Mitaka
(at the moment 1.3.1-1ubuntu10.11~cloud0).

$ service libvirt-bin status
libvirt-bin start/running, process 7266
$ virsh list
 Id Name State
----------------------------------------------------
 2 kvmtest running
$ ls -laF /var/run/libvirt/libvirt-sock
srwxrwx--- 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock=
$ lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 7266 root 12u unix 0xffff88001d404c00 0t0 13031405 /var/run/libvirt/libvirt-sock

Ok, so a "normal" upgrade does not trigger this in general.
We'd need someone who is in the error state to debug what happened to push his system into this state.

@Jeff (and others) - you seem to have the socket still open by the process, but not in the path.
Are there any mounts over that path that might hide them?
Anything on the upgrade output that might indicate a failed restart or something like it?

Issue could be something like:
# ls -l /var/run/libvirt/lib*
srwxrwx--- 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock
srwxrwxrwx 1 root libvirtd 0 Aug 7 09:33 /var/run/libvirt/libvirt-sock-ro
# mkdir /tmp/test
# mount -o bind /tmp/test /var/run/libvirt
# ls -l /var/run/libvirt/lib*
ls: cannot access /var/run/libvirt/lib*: No such file or directory
# virsh list
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Also the following would lead to such a case:
# rm /var/run/libvirt/libvirt-sock
# virsh list
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
# lsof -p 7266 | grep -- libvirt-sock
libvirtd 7266 root 12u unix 0xffff88001d404c00 0t0 13031405 /var/run/libvirt/libvirt-sock

So the question is - where would a wild rm or mount come from in your cases.
Setting the Cloud-Archive Task to incomplete until info is provided that allows further debugging.