Bug #903212 “libvirtd stops responding in oneiric” : Bugs : libvirt package : Ubuntu

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-26:

#20

Download full text (3.3 KiB)

Description of problem:
Not exactly sure how to reproduce the problem. I am doing quite a bit of heavy development that uses libvirt. The libvirtd.log file prints out:

11:25:21.784: 24375: error : virNetServerDispatchNewClient:220 : Too many active clients (20), dropping connection from 127.0.0.1;0
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.935: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.655: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:33:39.736: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:33:39.736: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error

A second process is spawned by libvirt as well:
root 22049 24375 0 11:49 ? 00:00:00 libvirtd --daemon
root 22353 22671 0 11:58 pts/4 00:00:00 grep --color=auto libvirt
root 24375 1 2 11:00 ? 00:01:40 libvirtd --daemon

kill -9 of the non-init parented libvirt unsticks the system ie:
[root@beast libvirt]# kill -9 22049

Then things start working again.

Version-Release number of selected component (if applicable):
[root@beast libvirt]# rpm -qa | grep libvirt
libvirt-0.9.6-2.fc16.x86_64
libvirt-python-0.9.6-2.fc16.x86_64
libvirt-client-0.9.6-2.fc16.x86_64

How reproducible:
hard to reproduce

Steps to Repr...

Description of problem:
Not exactly sure how to reproduce the problem.  I am doing quite a bit of heavy development that uses libvirt.  The libvirtd.log file prints out:

11:25:21.784: 24375: error : virNetServerDispatchNewClient:220 : Too many active clients (20), dropping connection from 127.0.0.1;0
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:23.935: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.655: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:33:39.736: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error
11:33:39.736: 24375: error : virNetSocketReadWire:912 : End of file while reading data: Input/output error

A second process is spawned by libvirt as well:
root     22049 24375  0 11:49 ?        00:00:00 libvirtd --daemon
root     22353 22671  0 11:58 pts/4    00:00:00 grep --color=auto libvirt
root     24375     1  2 11:00 ?        00:01:40 libvirtd --daemon

kill -9 of the non-init parented libvirt unsticks the system ie:
[root@beast libvirt]# kill -9 22049

Then things start working again.

Version-Release number of selected component (if applicable):
[root@beast libvirt]# rpm -qa | grep libvirt
libvirt-0.9.6-2.fc16.x86_64
libvirt-python-0.9.6-2.fc16.x86_64
libvirt-client-0.9.6-2.fc16.x86_64

How reproducible:
hard to reproduce

Steps to Reproduce:
1. execute vm starts/stops 
2.
3.
  
Actual results:
libvirt launches a second process and wedges

Expected results:
libvirt should not wedge

Additional info:

killing the second process unwedges libvirt

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-28:

#21

Not sure if too many active clients error from log is related to the second process being spawned. They may be separate events. I have changed my program behavior not to use so many sockets, but occasionally the software still wedges.

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-28:

#22

Notice the CPU time on the forked process is quite high. This ps was taken ~1-2 minutes after the system wedged.

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-28:

#23

ignore comment #2, the 2nd process in the list wasn't the forked child.

Revision history for this message

In Red Hat Bugzilla #757382, Laine (laine-redhat-bugs) wrote on 2011-11-29:

#24

When this happens could you try attaching gdb to the 2nd libvirtd process and executing the command "thread apply all bt"? That will give us more of a clue what is (isn't) happening.

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#25

Laine,

Next time it happens, I'll do that for you. I noticed there often appears a dmidecode error when this occurs which is documented in Bug #754909. I am going to try running with dmidecode installed to see if that eliminates the problem. Fedora 16 does not by default install dmidecode.

Regards
-steve

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#26

Download full text (3.1 KiB)

__lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:96
96 2: movl %edx, %eax
(gdb) thread apply all bt
Thread 1 (Thread 0x7f8b9e029700 (LWP 5904)):
#0 __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:96
#1 0x00000031246ad2d4 in _L_lock_2197 () from /lib64/libc.so.6
#2 0x00000031246ad0e7 in __tz_convert (timer=0x31249b0ee8, use_localtime=1,
    tp=0x7f8b9e027910) at tzset.c:619
#3 0x0000003e90c5552f in virLogMessage (
    category=0x3e90de9fea "file.util/command.c", priority=1,
    funcname=0x3e90deac33 "virCommandHook", linenr=1962, flags=0,
    fmt=<optimized out>) at util/logging.c:734
#4 0x0000003e90c45cce in virCommandHook (data=0x7f8b880c3280)
    at util/command.c:1962
#5 0x0000003e90c466cc in virExecWithHook (argv=0x7f8b88071920,
    envp=0x7f8b880c08a0, keepfd=<optimized out>, retpid=<optimized out>,
    infd=27, outfd=0x7f8b880c33e8, errfd=0x7f8b880c33ec, flags=4,
    data=0x7f8b880c3280, pidfile=0x0, hook=0x3e90c45a10 <virCommandHook>)
    at util/command.c:527
#6 0x0000003e90c482cc in virCommandRunAsync (cmd=0x7f8b880c3280, pid=0x0)
    at util/command.c:2046
#7 0x0000003e90c48acc in virCommandRun (cmd=0x7f8b880c3280, exitstatus=0x0)
    at util/command.c:1841
#8 0x0000000000465d92 in qemuCapsExtractVersionInfo (

    qemu=0x7f8b880c0e90 "/usr/bin/qemu", arch=0x4f9db6 "i686", retversion=0x0,
    retflags=0x7f8b9e028448) at qemu/qemu_capabilities.c:1297
#9 0x00000000004665ed in qemuCapsInitGuest (caps=0x7f8b880c1b10,
    old_caps=0x7f8b8c000a70, hostmachine=<optimized out>, info=0x733440, hvm=1)
    at qemu/qemu_capabilities.c:592
#10 0x0000000000466e09 in qemuCapsInit (old_caps=0x7f8b8c000a70)
    at qemu/qemu_capabilities.c:846
#11 0x000000000044c8a0 in qemuCreateCapabilities (oldcaps=<optimized out>,
    driver=0x7f8b940146b0) at qemu/qemu_driver.c:242
#12 0x000000000045fb86 in qemudGetCapabilities (conn=<optimized out>)
    at qemu/qemu_driver.c:1004
#13 0x0000003e90cc835e in virConnectGetCapabilities (conn=0x7f8b84000b60)
    at libvirt.c:5877
#14 0x0000000000434ea3 in remoteDispatchGetCapabilities (ret=0x7f8b880adbf0,
    rerr=0x7f8b9e028c70, hdr=<optimized out>, client=0x1834970,
    server=<optimized out>) at remote_dispatch.h:5555
#15 remoteDispatchGetCapabilitiesHelper (server=<optimized out>,
    client=0x1834970, hdr=<optimized out>, rerr=0x7f8b9e028c70,
    args=<optimized out>, ret=0x7f8b880adbf0) at remote_dispatch.h:5536
#16 0x000000000043f42e in virNetServerProgramDispatchCall (msg=0x1877430,
    client=0x1834970, server=0x182c410, prog=0x182c3e0)
    at rpc/virnetserverprogram.c:401
#17 virNetServerProgramDispatch (prog=0x182c3e0, server=0x182c410,

#18 0x000000000044169c in virNetServerHandleJob (jobOpaque=<optimized out>,
    opaque=0x182c410) at rpc/virnetserver.c:136
#19 0x0000003e90c65eae in virThreadPoolWorker (opaque=<optimized out>)
    at util/threadpool.c:144
#20 0x0000003e90c65962 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:157
#21 0x0000003124a07d90 in start_thread (arg=0x7f8b9e029700)
    at pthread_create.c:309
#22 0x00000031246eed8d in clone ()
    at ../sysdeps/unix/sysv/l...

__lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:96
96	2:	movl	%edx, %eax
(gdb) thread apply all bt
Thread 1 (Thread 0x7f8b9e029700 (LWP 5904)):
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:96
#1  0x00000031246ad2d4 in _L_lock_2197 () from /lib64/libc.so.6
#2  0x00000031246ad0e7 in __tz_convert (timer=0x31249b0ee8, use_localtime=1, 
    tp=0x7f8b9e027910) at tzset.c:619
#3  0x0000003e90c5552f in virLogMessage (
    category=0x3e90de9fea "file.util/command.c", priority=1, 
    funcname=0x3e90deac33 "virCommandHook", linenr=1962, flags=0, 
    fmt=<optimized out>) at util/logging.c:734
#4  0x0000003e90c45cce in virCommandHook (data=0x7f8b880c3280)
    at util/command.c:1962
#5  0x0000003e90c466cc in virExecWithHook (argv=0x7f8b88071920, 
    envp=0x7f8b880c08a0, keepfd=<optimized out>, retpid=<optimized out>, 
    infd=27, outfd=0x7f8b880c33e8, errfd=0x7f8b880c33ec, flags=4, 
    data=0x7f8b880c3280, pidfile=0x0, hook=0x3e90c45a10 <virCommandHook>)
    at util/command.c:527
#6  0x0000003e90c482cc in virCommandRunAsync (cmd=0x7f8b880c3280, pid=0x0)
    at util/command.c:2046
#7  0x0000003e90c48acc in virCommandRun (cmd=0x7f8b880c3280, exitstatus=0x0)
    at util/command.c:1841
#8  0x0000000000465d92 in qemuCapsExtractVersionInfo (

qemu=0x7f8b880c0e90 "/usr/bin/qemu", arch=0x4f9db6 "i686", retversion=0x0, 
    retflags=0x7f8b9e028448) at qemu/qemu_capabilities.c:1297
#9  0x00000000004665ed in qemuCapsInitGuest (caps=0x7f8b880c1b10, 
    old_caps=0x7f8b8c000a70, hostmachine=<optimized out>, info=0x733440, hvm=1)
    at qemu/qemu_capabilities.c:592
#10 0x0000000000466e09 in qemuCapsInit (old_caps=0x7f8b8c000a70)
    at qemu/qemu_capabilities.c:846
#11 0x000000000044c8a0 in qemuCreateCapabilities (oldcaps=<optimized out>, 
    driver=0x7f8b940146b0) at qemu/qemu_driver.c:242
#12 0x000000000045fb86 in qemudGetCapabilities (conn=<optimized out>)
    at qemu/qemu_driver.c:1004
#13 0x0000003e90cc835e in virConnectGetCapabilities (conn=0x7f8b84000b60)
    at libvirt.c:5877
#14 0x0000000000434ea3 in remoteDispatchGetCapabilities (ret=0x7f8b880adbf0, 
    rerr=0x7f8b9e028c70, hdr=<optimized out>, client=0x1834970, 
    server=<optimized out>) at remote_dispatch.h:5555
#15 remoteDispatchGetCapabilitiesHelper (server=<optimized out>, 
    client=0x1834970, hdr=<optimized out>, rerr=0x7f8b9e028c70, 
    args=<optimized out>, ret=0x7f8b880adbf0) at remote_dispatch.h:5536
#16 0x000000000043f42e in virNetServerProgramDispatchCall (msg=0x1877430, 
    client=0x1834970, server=0x182c410, prog=0x182c3e0)
    at rpc/virnetserverprogram.c:401
#17 virNetServerProgramDispatch (prog=0x182c3e0, server=0x182c410,

#18 0x000000000044169c in virNetServerHandleJob (jobOpaque=<optimized out>, 
    opaque=0x182c410) at rpc/virnetserver.c:136
#19 0x0000003e90c65eae in virThreadPoolWorker (opaque=<optimized out>)
    at util/threadpool.c:144
#20 0x0000003e90c65962 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:157
#21 0x0000003124a07d90 in start_thread (arg=0x7f8b9e029700)
    at pthread_create.c:309
#22 0x00000031246eed8d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#27

FYI, been down this path with the glibc maintainers. Their claim is that localtime_r while reentrant cannot be used in threaded programs, because it is not thread safe according to POSIX.

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#28

See:
http://oss.clusterlabs.org/pipermail/pacemaker/2010-February/004710.html

Which explains the problem in more detail. Perhaps my memory is fuzzy, it was getenv/setenv that may not be used in multithreaded programs at all (you are probably getenv in one program thread, forking, then localtime_r in another), and hence their use (which I see libvirt uses alot of) can and will result in deadlocks.

Regards
-steve

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#29

See Bug #544022 which helped work around part of this problem. I'm not sure if it entirely resolved it - eventually we just reworked our code to remove threads entirely.

Regards
-steve

Revision history for this message

In Red Hat Bugzilla #757382, Daniel (daniel-redhat-bugs) wrote on 2011-11-29:

#30

> 11:25:21.784: 24375: error : virNetServerDispatchNewClient:220 : Too many
> active clients (20), dropping connection from 127.0.0.1;0

This is the interesting error message. When this occurs, libvirtd will immediately close the client socket.

> 11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while
> reading data: Input/output error
> 11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while
> reading data: Input/output error
> 11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while
> reading data: Input/output error

This suggests, after closing the client socket, we have left a I/O watch registered against the socket FD, and are now spinning 100% in poll().

Revision history for this message

In Red Hat Bugzilla #757382, Daniel (daniel-redhat-bugs) wrote on 2011-11-29:

#31

To avoid hitting this problem in the first place you can simply update max_clients in /etc/libvirt/libvirtd.conf.

Of course we still want to fix the resulting bug.

Revision history for this message

In Red Hat Bugzilla #757382, Daniel (daniel-redhat-bugs) wrote on 2011-11-29:

#32

> FYI, been down this path with the glibc maintainers. Their claim is that
> localtime_r while reentrant cannot be used in threaded programs, because it is
> not thread safe according to POSIX.

IIUC the issue here is not one of "thread safety", but rather whether the function is "async signal safe".

If you have a multi-threaded application, and you fork a child process, then you are only allowed to use async-signal safe functions, until you execve(). The reason is as you describe

1. Thread A acquires lock L
2. Thread B forks process process C
3. Child process C attempts to acquire lock L
4. Thread A releases lock L

....child process C can never see results of step 4, since it is no longer in the same memory address space as Thread A.

Unfortunately libvirt is not complying for the async signal safety rules here.

Revision history for this message

In Red Hat Bugzilla #757382, Daniel (daniel-redhat-bugs) wrote on 2011-11-29:

#33

This series upstream improves libvirt's async signal safety by avoiding localtime_r() completely

https://www.redhat.com/archives/libvir-list/2011-November/msg01609.html

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#34

Re Comment #11, I am no longer hittiing the limit on FDs after changing my program. It is some other log message.

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-29:

#35

Daniel,

Thanks for a pointer to the upstream fixes. I'll apply them on the fedora 16 rpm, build a scratch rpm, and see if that fixes the problem.

As I recall, what happened in our case was both the localtime_r problem you describe as well as the following:

getenv/setenv take a lock in the timezone code (I am not sure why it does this, read the glibc code for more details).
another thread forks (with the timezeone thread locked)
getenv/setenv do not have at_fork handlers for getenv/setenv leaving the lock locked in the child process.
getlocaltime_r takes lock and deadlocks (because it uses the same tzget lock)

Regards
-steve

Revision history for this message

In Red Hat Bugzilla #757382, Steven (steven-redhat-bugs) wrote on 2011-11-30:

#36

Daniel,

I applied the patches and rebuilt a libvirt from git. The lockup seems to no longer occur. Not sure if its the patches or a change in libvirt 0.9.7. I couldn't get the older libvirt to build/patches backported in a timely manner. thanks for the quick response.

Regards
-steve

Revision history for this message

In Red Hat Bugzilla #757382, Fedora (fedora-redhat-bugs) wrote on 2011-11-30:

#37

This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.

Revision history for this message

In Red Hat Bugzilla #757382, Eric (eric-redhat-bugs) wrote on 2011-12-05:

#41

We should backport the following patches to F16:

3ec12898
32d3ec74
a8bb75a3
b265beda

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-12-12:

#1

Thanks for taking the time to report this bug.

To get some extra debugging information, could you please edit /etc/libvirt/libvirtd.conf and add the line

log_level = 1

Then restart libvirtd. Then, when this reoccurs, do

strace -f -ovirsh.debug virsh list

then attach both the /var/log/libvirt/libvirtd.log and virsh.debug.

Changed in libvirt (Ubuntu):
importance:	Undecided → High
status:	New → Incomplete

Revision history for this message

Andrew Glen-Young (aglenyoung) wrote on 2011-12-19:

#2

virsh.debug Edit (27.4 KiB, text/plain)

Attached virsh.debug

Revision history for this message

Andrew Glen-Young (aglenyoung) wrote on 2011-12-19:

#3

libvirtd.log Edit (51.2 KiB, text/plain)

I had to prune the libvirt.log as it is ~5GB.

Attached is the libvirtd.log when the above virsh command was run. If you need more I may be able to trim the non-unique log lines.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-12-19:

#4

Thanks for that info. Is there anything helpful in /var/log/libvirt/qemu/xp1.log?

Are you able to run kvm or qemu by hand?

Changed in libvirt (Ubuntu):
status:	Incomplete → New
status:	New → Incomplete

Revision history for this message

Andrew Glen-Young (aglenyoung) wrote on 2011-12-19:

#5

@Serge:

None of the machines that are experiencing this problem have a /var/log/libvirt/qemu/xp1.log log file.

These are Openstack nova-compute nodes running KVM and (un)fortunately I do not have a node that is currently not responding. One I do, I will attempt to run KVM manually.

Revision history for this message

In Red Hat Bugzilla #757382, Fedora (fedora-redhat-bugs) wrote on 2011-12-20:

#42

libvirt-0.9.6-4.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/libvirt-0.9.6-4.fc16

Revision history for this message

In Red Hat Bugzilla #757382, Fedora (fedora-redhat-bugs) wrote on 2011-12-22:

#43

Package libvirt-0.9.6-4.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-0.9.6-4.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-17267/libvirt-0.9.6-4.fc16
then log in and leave karma (feedback).

Revision history for this message

Andrew Glen-Young (aglenyoung) wrote on 2012-01-05:

#6

I have verified on a machine in this state and I can start a kvm instance by hand when libvirtd is not responding. Is there any further diagnosis that I can do?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-01-05:

#7

First please attach the output of:

ps -ef
find /var/lib/libvirt
x=`pidof libvirtd`
for y in `/bin/ls /proc/$x/task/`; do
echo "XXXX $y XXXX"
cat /proc/$y/status
done

Now, this is a bit of a big stick, but you could trace all libvirtd processes at once:

x=`pidof libvirtd`
for y in `/bin/ls /proc/$x/task/`; do strace -f -ostrace.out.$y -p $y & done

(done as root) then trace a libvirt action

strace -f -o strace.out.virsh virsh list

and tar up and upload strace.*.

Make sure to sudo killall strace when done.

Revision history for this message

In Red Hat Bugzilla #757382, Fedora (fedora-redhat-bugs) wrote on 2012-01-05:

#44

libvirt-0.9.6-4.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message

David Lawson (deej) wrote on 2012-01-19:

#8

virtd-proc-status.txt Edit (17.2 KiB, text/plain)

We encountered this again. There appear to be two separate libvirtd processes running on the machine, which I have a feeling makes this a pathological case, but I captured the first part of the information you're looking for. When we see this again, I'll grab the same info plus the various strace bits.

Serge Hallyn (serge-hallyn) on 2012-01-23

Changed in libvirt (Ubuntu):
status:	Incomplete → New

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-01-23:

#9

Interesting. Thanks for the information.

One libvirtd task appears to be on its own, while the other has all the expected threads.

Which pid is reported by 'status libvirt-bin' ?

(I realize you've probably reconfigured this machine by now, and we'll need to wait for the next time this happens. In whicih case, please provide the above output again.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-01-23:

#10

Chuck, is there anything nova could be doing to manually start its own seconnd libvirtd without going through upstart?

Changed in libvirt (Ubuntu):
status:	New → Incomplete

Revision history for this message

LaMont Jones (lamont) wrote on 2012-02-14:

#11

requested output Edit (48.5 KiB, text/plain)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-02-14:

#12

The second libvirtd is a child of the first, so I suspect it's actually fine.

User PID PPID
root 30895 1 0 Feb13 ? 00:02:05 /usr/sbin/libvirtd -d
root 20389 30895 0 13:04 ? 00:00:00 /usr/sbin/libvirtd -d

Chuck, would it be possible to try reproducing this on precise?

Changed in libvirt (Ubuntu):
assignee:	nobody → Chuck Short (zulcss)

James Troup (elmo) on 2012-03-05

tags:

added: canonistack

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-03-13:

#13

Marking confirmed based on the fedora bug.

Per comment 21 in the fedora bug, we should cherrypick these four patches to oneiric:

3ec12898
32d3ec74
a8bb75a3
b265beda

Changed in libvirt (Ubuntu):
status:	Incomplete → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-03-13:

#14

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu Oneiric):
status:	New → Confirmed

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-03-13:

#15

Unfortunately those patches don't apply cleanly.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-03-13:

#16

I've done by best to backport the patches. The gist is to get rid of localtime_r usage. A package with the backported patches should soon be building in ppa:serge-hallyn/virt.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-03-13:

#17

Sorry, for version numbering reasons I had to put the package in ppa:serge-hallyn/libvirt-mav.

Serge Hallyn (serge-hallyn) on 2012-03-21

summary:

- libvirtd stops responding
+ libvirtd stops responding in oneiric

Revision history for this message

Christian Wittwer (wittwerch) wrote on 2012-05-31:

#18

When will that bugfix be available in Oneiric?

Revision history for this message

Rolf Leggewie (r0lf) wrote on 2014-12-03:

#19

oneiric has seen the end of its life and is no longer receiving any updates. Marking the oneiric task for this ticket as "Won't Fix".

Changed in libvirt (Ubuntu Oneiric):
status:	Confirmed → Won't Fix

Bug Watch Updater (bug-watch-updater) on 2017-10-28

Changed in libvirt (Fedora):
importance:	Unknown → Undecided
status:	Unknown → Fix Released

Ubuntu
libvirt package

libvirtd stops responding in oneiric

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
libvirt (Fedora)	Fix Released	Undecided	redhat-bugs #757382
libvirt (Ubuntu)	Fix Released	High	Chuck Short
Oneiric	Won't Fix	Undecided	Unassigned

Ubuntulibvirt package

libvirtd stops responding in oneiric

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
libvirt package