Ubuntu

libvirtd stops responding in oneiric

Reported by Andrew Glen-Young on 2011-12-12
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
libvirt (Fedora)
Unknown
Unknown
libvirt (Ubuntu)
High
Chuck Short
Oneiric
Undecided
Unassigned

Bug Description

I am running libvirtd to control qemu/kvm machines (via openstack nova-compute) and have found that a long-running libvirtd daemon eventually stops responding to commands.

Admittedly, these systems are a little loaded, however I do not expect libvirtd to stop responding entirely. Please let me know if there is any further information that you require?

Restarting libvirtd as a work-around allows libvirtd to start responding again.

== Server A ==

= Process Information =

$ time sudo virsh -c qemu:///system list
^C
real 0m30.094s
user 0m0.000s
sys 0m0.000s

$ ps axuwwwf | grep libvirtd
root 15606 0.1 0.0 276984 4564 ? Sl Nov22 32:04 /usr/sbin/libvirtd -d
root 16389 0.0 0.0 276984 3552 ? S Dec04 0:00 \_ /usr/sbin/libvirtd -d

$ sudo strace -p 15606
Process 15606 attached - interrupt to quit
futex(0x7f2f0b91b9d0, FUTEX_WAIT, 15607, NULL

$ sudo strace -p 16389
Process 16389 attached - interrupt to quit
futex(0x7f2f0f980e28, FUTEX_WAIT_PRIVATE, 2, NULL

= System Information =

$ cat /etc/lsb-release·
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.10
DISTRIB_CODENAME=oneiric
DISTRIB_DESCRIPTION="Ubuntu 11.10"

$ dpkg -l | grep libvirt
ii libvirt-bin 0.9.2-4ubuntu15.1 the programs for the libvirt library
ii libvirt0 0.9.2-4ubuntu15.1 library for interfacing with different virtualization systems

== Server B ==

= Process Information =

$ time sudo virsh -c qemu:///system list
^C
real 0m34.162s
user 0m0.000s
sys 0m0.010s

root 6053 0.1 0.0 277132 3344 ? Sl Dec05 12:51 /usr/sbin/libvirtd -d
root 27262 0.0 0.0 277132 2132 ? S Dec10 0:00 \_ /usr/sbin/libvirtd -d

$ sudo strace -p 27262
Process 27262 attached - interrupt to quit
futex(0x7f0238cf6e28, FUTEX_WAIT_PRIVATE, 2, NULL

$ sudo strace -p 6053
Process 6053 attached - interrupt to quit
futex(0x7f0234c919d0, FUTEX_WAIT, 6054, NULL

$ sudo fuser /var/run/libvirt/libvirt-sock
/run/libvirt/libvirt-sock: 6053

$ sudo fuser /var/run/libvirt/libvirt-sock-ro
/run/libvirt/libvirt-sock-ro: 6053

$ sudo lsof /var/run/libvirt/libvirt-sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
libvirtd 6053 root 10u unix 0xffff8805f61b3740 0t0 78644591 /var/run/libvirt/libvirt-sock
libvirtd 6053 root 33u unix 0xffff8805f61b6b40 0t0 78643065 /var/run/libvirt/libvirt-sock

Swap (from /proc/$pid/smaps):

1672 kB - 27262 (libvirtd)
1684 kB - 6053 (libvirtd)

= System Information =

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.10
DISTRIB_CODENAME=oneiric
DISTRIB_DESCRIPTION="Ubuntu 11.10"

$ dpkg -l | grep libvirt
ii libvirt-bin 0.9.2-4ubuntu15.1 the programs for the libvirt library
ii libvirt0 0.9.2-4ubuntu15.1 library for interfacing with different virtualization systems

Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to report this bug.

To get some extra debugging information, could you please edit /etc/libvirt/libvirtd.conf and add the line

log_level = 1

Then restart libvirtd. Then, when this reoccurs, do

strace -f -ovirsh.debug virsh list

then attach both the /var/log/libvirt/libvirtd.log and virsh.debug.

Changed in libvirt (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Andrew Glen-Young (aglenyoung) wrote :

Attached virsh.debug

Andrew Glen-Young (aglenyoung) wrote :

I had to prune the libvirt.log as it is ~5GB.

Attached is the libvirtd.log when the above virsh command was run. If you need more I may be able to trim the non-unique log lines.

Serge Hallyn (serge-hallyn) wrote :

Thanks for that info. Is there anything helpful in /var/log/libvirt/qemu/xp1.log?

Are you able to run kvm or qemu by hand?

Changed in libvirt (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Andrew Glen-Young (aglenyoung) wrote :

@Serge:

None of the machines that are experiencing this problem have a /var/log/libvirt/qemu/xp1.log log file.

These are Openstack nova-compute nodes running KVM and (un)fortunately I do not have a node that is currently not responding. One I do, I will attempt to run KVM manually.

Andrew Glen-Young (aglenyoung) wrote :

I have verified on a machine in this state and I can start a kvm instance by hand when libvirtd is not responding. Is there any further diagnosis that I can do?

Serge Hallyn (serge-hallyn) wrote :

First please attach the output of:

ps -ef
find /var/lib/libvirt
x=`pidof libvirtd`
for y in `/bin/ls /proc/$x/task/`; do
   echo "XXXX $y XXXX"
   cat /proc/$y/status
done

Now, this is a bit of a big stick, but you could trace all libvirtd processes at once:

x=`pidof libvirtd`
for y in `/bin/ls /proc/$x/task/`; do strace -f -ostrace.out.$y -p $y & done

(done as root) then trace a libvirt action

strace -f -o strace.out.virsh virsh list

and tar up and upload strace.*.

Make sure to sudo killall strace when done.

David Lawson (deej) wrote :

We encountered this again. There appear to be two separate libvirtd processes running on the machine, which I have a feeling makes this a pathological case, but I captured the first part of the information you're looking for. When we see this again, I'll grab the same info plus the various strace bits.

Changed in libvirt (Ubuntu):
status: Incomplete → New
Serge Hallyn (serge-hallyn) wrote :

Interesting. Thanks for the information.

One libvirtd task appears to be on its own, while the other has all the expected threads.

Which pid is reported by 'status libvirt-bin' ?

(I realize you've probably reconfigured this machine by now, and we'll need to wait for the next time this happens. In whicih case, please provide the above output again.

Serge Hallyn (serge-hallyn) wrote :

Chuck, is there anything nova could be doing to manually start its own seconnd libvirtd without going through upstart?

Changed in libvirt (Ubuntu):
status: New → Incomplete
LaMont Jones (lamont) wrote :
Serge Hallyn (serge-hallyn) wrote :

The second libvirtd is a child of the first, so I suspect it's actually fine.

User PID PPID
root 30895 1 0 Feb13 ? 00:02:05 /usr/sbin/libvirtd -d
root 20389 30895 0 13:04 ? 00:00:00 /usr/sbin/libvirtd -d

Chuck, would it be possible to try reproducing this on precise?

Changed in libvirt (Ubuntu):
assignee: nobody → Chuck Short (zulcss)
James Troup (elmo) on 2012-03-05
tags: added: canonistack
Serge Hallyn (serge-hallyn) wrote :

Marking confirmed based on the fedora bug.

Per comment 21 in the fedora bug, we should cherrypick these four patches to oneiric:

3ec12898
32d3ec74
a8bb75a3
b265beda

Changed in libvirt (Ubuntu):
status: Incomplete → Fix Released
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu Oneiric):
status: New → Confirmed
Serge Hallyn (serge-hallyn) wrote :

Unfortunately those patches don't apply cleanly.

Serge Hallyn (serge-hallyn) wrote :

I've done by best to backport the patches. The gist is to get rid of localtime_r usage. A package with the backported patches should soon be building in ppa:serge-hallyn/virt.

Serge Hallyn (serge-hallyn) wrote :

Sorry, for version numbering reasons I had to put the package in ppa:serge-hallyn/libvirt-mav.

summary: - libvirtd stops responding
+ libvirtd stops responding in oneiric
Christian Wittwer (wittwerch) wrote :

When will that bugfix be available in Oneiric?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.