LibVirtD crashing after many hours (100+)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned | ||
libvirt (Ubuntu) |
Expired
|
High
|
Unassigned |
Bug Description
Running devstack, after many hours (100+?) of uptime, libvirt appears to crash (but this is not under heavy usage, just long uptime). When I launch instances, nothing is printed on nova-cpu, but the state is ERROR.
Nova scheduler prints this:
2012-03-12 19:08:10 WARNING nova.scheduler.
2012-03-12 19:08:10 WARNING nova.scheduler.
It looks like n-cpu is hanging here:
2012-03-12 14:48:22 DEBUG nova.manager [-] Running periodic task ComputeManager.
2012-03-12 14:48:22 DEBUG nova.manager [-] Running periodic task ComputeManager.
If I Control-C nova-cpu, the following is printed:
^Clibvir: Remote error : poll on socket failed: Interrupted system call
Attempting to restart rapidly hangs here:
2012-03-12 19:11:01 DEBUG nova.compute.
2012-03-12 19:11:01 DEBUG nova.virt.
Again, Control C gives the same error:
^Clibvir: Remote error : poll on socket failed: Interrupted system call
"virsh list" also fails; it also hangs.
This is with the 'recommended' configuration: DevStack, KVM; Ubuntu Oneiric
There are two libvirtd processes running (is this normal)?
If I gdb each libvirtd, I can get stack traces:
(gdb) thread apply all bt
Thread 1 (Thread 0x7f579dafe700 (LWP 22512)):
#0 __lll_lock_
#1 0x00007f57a3afa404 in _L_lock_2263 () at tzset.c:616
#2 0x00007f57a3afa217 in __tz_convert (timer=
#3 0x00007f57a54cef47 in virLogMessage (category=
fmt=<optimized out>) at util/logging.c:727
#4 0x00007f57a54bdf0e in virCommandHook (data=0x7f57981
#5 0x00007f57a54bec4e in virExecWithHook (argv=0x7f57980
errfd=
#6 0x00007f57a54c0bea in virCommandRunAsync (cmd=0x7f579810
#7 0x00007f57a54c12e3 in virCommandRun (cmd=0x7f579810
#8 0x0000000000452d62 in qemuCapsExtract
at qemu/qemu_
#9 0x00000000004535f5 in qemuCapsInitGuest (caps=0x7f57981
#10 0x0000000000453dc9 in qemuCapsInit (old_caps=
#11 0x000000000043e9a0 in qemuCreateCapab
#12 0x000000000044ac36 in qemudGetCapabil
#13 0x00007f57a553e77c in virConnectGetCa
#14 0x000000000042ebf3 in remoteDispatchG
args=<optimized out>, ret=0x7f579dafdd00) at remote_
#15 0x000000000043299b in remoteDispatchC
#16 remoteDispatchC
#17 0x000000000041f0bf in qemudWorker (data=0xbacbf0) at libvirtd.c:1619
#18 0x00007f57a3e01efc in start_thread (arg=0x7f579daf
#19 0x00007f57a3b3c89d in clone () at ../sysdeps/
#20 0x0000000000000000 in ?? ()
The other libvirt process shows this:
(gdb) thread apply all bt
Thread 7 (Thread 0x7f57a0303700 (LWP 15496)):
#0 __lll_lock_wait () at ../nptl/
#1 0x00007f57a3e041e5 in _L_lock_883 () from /lib/x86_
#2 0x00007f57a3e0403a in __pthread_
#3 0x000000000044ad6d in qemudClose (conn=0xc73f50) at qemu/qemu_
#4 0x00007f57a552dd8b in virReleaseConnect (conn=0xc73f50) at datatypes.c:114
#5 0x00007f57a552e0e8 in virUnrefConnect (conn=0xc73f50) at datatypes.c:149
#6 0x00007f57a5535818 in virConnectClose (conn=0xc73f50) at libvirt.c:1363
#7 0x000000000041cee2 in qemudFreeClient (client=
#8 0x000000000041d294 in qemudRunLoop (opaque=0xba42a0) at libvirtd.c:2402
#9 0x00007f57a3e01efc in start_thread (arg=0x7f57a030
#10 0x00007f57a3b3c89d in clone () at ../sysdeps/
#11 0x0000000000000000 in ?? ()
Thread 6 (Thread 0x7f579fb02700 (LWP 15497)):
#0 pthread_
#1 0x00007f57a54dc82a in virCondWait (c=<optimized out>, m=<optimized out>) at util/threads-
#2 0x000000000041f02d in qemudWorker (data=0xbacb90) at libvirtd.c:1598
#3 0x00007f57a3e01efc in start_thread (arg=0x7f579fb0
#4 0x00007f57a3b3c89d in clone () at ../sysdeps/
#5 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7f579f301700 (LWP 15498)):
#0 pthread_
#1 0x00007f57a54dc82a in virCondWait (c=<optimized out>, m=<optimized out>) at util/threads-
#2 0x000000000041f02d in qemudWorker (data=0xbacba8) at libvirtd.c:1598
#3 0x00007f57a3e01efc in start_thread (arg=0x7f579f30
#4 0x00007f57a3b3c89d in clone () at ../sysdeps/
#5 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f579eb00700 (LWP 15499)):
#0 pthread_
#1 0x00007f57a54dc82a in virCondWait (c=<optimized out>, m=<optimized out>) at util/threads-
#2 0x000000000041f02d in qemudWorker (data=0xbacbc0) at libvirtd.c:1598
#3 0x00007f57a3e01efc in start_thread (arg=0x7f579eb0
#4 0x00007f57a3b3c89d in clone () at ../sysdeps/
#5 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f579e2ff700 (LWP 15500)):
#0 pthread_
#1 0x00007f57a54dc82a in virCondWait (c=<optimized out>, m=<optimized out>) at util/threads-
#2 0x000000000041f02d in qemudWorker (data=0xbacbd8) at libvirtd.c:1598
#3 0x00007f57a3e01efc in start_thread (arg=0x7f579e2f
---Type <return> to continue, or q <return> to quit---
#4 0x00007f57a3b3c89d in clone () at ../sysdeps/
#5 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f579dafe700 (LWP 15501)):
#0 0x00007f57a3b30773 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/
#1 0x00007f57a54c163b in virCommandProcessIO (cmd=0x7f579810
#2 virCommandRun (cmd=0x7f579810
#3 0x0000000000452d62 in qemuCapsExtract
at qemu/qemu_
#4 0x00000000004535f5 in qemuCapsInitGuest (caps=0x7f57981
#5 0x0000000000453dc9 in qemuCapsInit (old_caps=
#6 0x000000000043e9a0 in qemuCreateCapab
#7 0x000000000044ac36 in qemudGetCapabil
#8 0x00007f57a553e77c in virConnectGetCa
#9 0x000000000042ebf3 in remoteDispatchG
args=<optimized out>, ret=0x7f579dafdd00) at remote_
#10 0x000000000043299b in remoteDispatchC
#11 remoteDispatchC
#12 0x000000000041f0bf in qemudWorker (data=0xbacbf0) at libvirtd.c:1619
#13 0x00007f57a3e01efc in start_thread (arg=0x7f579daf
#14 0x00007f57a3b3c89d in clone () at ../sysdeps/
#15 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f57a5c23800 (LWP 15495)):
#0 0x00007f57a3e031f8 in pthread_join (threadid=
#1 0x000000000041bdb3 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:3418
If I kill the first (single thread process) then libvirt starts responding again.
This looks similar to this bug:
https:/
As libvirt seems to have threading issues, perhaps we should scope every libvirt call in a global lock?
Looks like a libvirt issue -- keeping task open since it may affect minimum dependencies versions.