libvirt-bin terminated unexpectedly

Bug #1367702 reported by James Page
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Unassigned

Bug Description

2014-09-10 11:21:16.471+0000: 8773: warning : qemuOpenVhostNet:508 : Unable to open vhost-net. Opened so far 0, requested 1
2014-09-10 11:21:20.990+0000: 6702: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:20.990+0000: 6702: error : qemuMonitorOpenUnix:309 : failed to connect to monitor socket: No such process
2014-09-10 11:21:21.520+0000: 6767: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:21.520+0000: 6767: error : qemuMonitorOpenUnix:309 : failed to connect to monitor socket: No such process
2014-09-10 11:21:22.123+0000: 6836: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:22.123+0000: 6836: error : qemuMonitorOpenUnix:309 : failed to connect to monitor socket: No such process
2014-09-10 11:21:22.123+0000: 6841: error : qemuMonitorOpenUnix:309 : failed to connect to monitor socket: No such process
2014-09-10 11:21:23.261+0000: 6914: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:23.261+0000: 6914: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:21:23.687+0000: 6976: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:23.687+0000: 6976: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:21:23.691+0000: 6975: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:21:26.830+0000: 7430: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:26.830+0000: 7430: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:21:27.221+0000: 7492: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:27.221+0000: 7492: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:21:27.955+0000: 7615: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:27.955+0000: 7615: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:21:28.311+0000: 7675: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:21:28.311+0000: 7675: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2014-09-10 11:22:22.886+0000: 8578: info : libvirt version: 1.2.6, package: 1.2.6-0ubuntu6
2014-09-10 11:22:22.886+0000: 8578: warning : cg_detect_placement:523 : Failed to get cgroup path for blkio
2014-09-10 11:22:22.886+0000: 8580: warning : cg_detect_placement:523 : Failed to get cgroup path for cpuset
2014-09-10 11:22:22.887+0000: 8581: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: libvirt-bin 1.2.6-0ubuntu6
ProcVersionSignature: User Name 3.16.0-14.20-generic 3.16.2
Uname: Linux 3.16.0-14-generic x86_64
ApportVersion: 2.14.7-0ubuntu2
Architecture: amd64
Date: Wed Sep 10 11:23:00 2014
Ec2AMI: ami-00000070
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.large
Ec2Kernel: aki-00000002
Ec2Ramdisk: ari-00000002
SourcePackage: libvirt
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.default.libvirt.bin: [modified]
modified.conffile..etc.libvirt.libvirtd.conf: [modified]
modified.conffile..etc.libvirt.qemu.conf: [inaccessible: [Errno 13] Permission denied: '/etc/libvirt/qemu.conf']
mtime.conffile..etc.default.libvirt.bin: 2014-09-10T11:14:07.129073
mtime.conffile..etc.libvirt.libvirtd.conf: 2014-09-10T11:14:02.473073

Revision history for this message
James Page (james-page) wrote :
Changed in libvirt (Ubuntu):
status: New → Invalid
Revision history for this message
Don Bowman (donbowman) wrote :
Download full text (3.1 KiB)

if cgmanager is restarted, this problem occurs.

2014-11-22 22:32:45.543+0000: 18448: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
process 18397: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.
Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection.
(null):alloc.c:315: Assertion failed in nih_free: ptr != NULL
Segmentation fault (core dumped)
root@nubo-10:/var/crash/x# cgm ping
<returns ok, cgmanager is up>

 ... again ...
2014-11-22 22:34:47.076+0000: 19701: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Connection is closed
(null):dbus_error.c:69: Unhandled error from nih_dbus_error_raise: Connection is closed
Segmentation fault (core dumped)

[Switching to Thread 0x7fe482ffd700 (LWP 21863)]
0x00007fe49a778d27 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fe49a778d27 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fe49a77a418 in __GI_abort () at abort.c:89
#2 0x00007fe4986fd538 in cgmanager_get_pid_cgroup_sync () from /lib/x86_64-linux-gnu/libcgmanager.so.0
#3 0x00007fe49ad9102b in cgm_controller_exists (controller=0x7fe49af1566b "name=systemd") at /build/buildd/libvirt-1.2.8/./src/util/cgmanager.c:296
#4 0x00007fe49ad88a3b in cg_get_cgroups (group=<optimized out>) at /build/buildd/libvirt-1.2.8/./src/util/vircgroup.c:374
#5 virCgroupDetectMounts (group=0x552e) at /build/buildd/libvirt-1.2.8/./src/util/vircgroup.c:412
#6 0x00007fe49ad89c53 in virCgroupDetect (parent=<optimized out>, path=<optimized out>, controllers=<optimized out>, pid=<optimized out>, group=<optimized out>)
    at /build/buildd/libvirt-1.2.8/./src/util/vircgroup.c:729
#7 virCgroupNew (pid=21806, path=0x7fe49af62dd8 "", parent=0x0, controllers=-1, group=0x7fe482ffd700) at /build/buildd/libvirt-1.2.8/./src/util/vircgroup.c:1263
#8 0x00007fe49ad89ec7 in virCgroupNewDetectMachine (name=0x7fe4842ef770 "instance-0000152c", drivername=0x7fe48b33affb "qemu", pid=21806, partition=0x7fe4841b9400 "/machine", controllers=-2097162496,
    group=0x7fe484309250) at /build/buildd/libvirt-1.2.8/./src/util/vircgroup.c:1782
#9 0x00007fe48b2baec5 in qemuConnectCgroup (driver=<optimized out>, vm=0x7fe4842c75d0) at /build/buildd/libvirt-1.2.8/./src/qemu/qemu_cgroup.c:781
#10 0x00007fe48b2cf447 in qemuProcessReconnect (opaque=0x552e, opaque@entry=0x7fe48431c700) at /build/buildd/libvirt-1.2.8/./src/qemu/qemu_process.c:3322
#11 0x00007fe49adda94e in virThreadHelper (data=<optimized out>) at /build/buildd/libvirt-1.2.8/./src/util/virthread.c:197
#12 0x00007fe49ab0f0a5 in start_thread (arg=0x7fe482ffd700) at pthread_create.c:309
#13 0x00007fe49a83c84d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:11...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

please go ahead and file a new bug and show exactly the steps you used to reproduce this.

i took an existing utopic container created with lxc, and defined a libvirt container for it. I started it, then restart cgmanager and connected to the container using 'virsh -c lxc:/// console u1'. The container did continue to work and libvirt did not restart.

Revision history for this message
Don Bowman (donbowman) wrote :

filed bug 1397130 as per request.

Revision history for this message
Don Bowman (donbowman) wrote :

the attached (utopic.xml) shows how to launch a virtual machine running 14.10 under KVM that will demonstrate the problem w/ the attached script.
it pretends to have 2 socks NUMA to make the problem occur (in case it does not on a single socket, since cgmanager is inter-related somehow)

Revision history for this message
Don Bowman (donbowman) wrote :

on startup of libvirtd, it does a cgm_dbus_connect() for each thread (e.g. each instance which is running).
One of these fails, not all.

Revision history for this message
Don Bowman (donbowman) wrote :

there may be some threading issues here
e.g. https://bugs.launchpad.net/ubuntu/+source/libnih/+bug/1294200

Revision history for this message
Don Bowman (donbowman) wrote :

OK, that is the problem.

in src/util/cgmanager.c, we are calling cgm_dbus_connect() from multiple threads.
it in turn assigns 'cgroup_manager'.
but this is a global (static) var. So the threads fight over it, and corrupt.

static NihDBusProxy *cgroup_manager = NULL;
bool cgm_running = false;

VIR_LOG_INIT("util.cgmanager");

#define CGMANAGER_DBUS_SOCK "unix:path=/sys/fs/cgroup/cgmanager/sock"
bool cgm_dbus_connect(void)
{
    DBusError dbus_error;
    DBusConnection *connection;
    dbus_error_init(&dbus_error);

    connection = dbus_connection_open_private(CGMANAGER_DBUS_SOCK, &dbus_error);
    if (!connection) {
        dbus_error_free(&dbus_error);
        return false;
    }

    dbus_connection_set_exit_on_disconnect(connection, FALSE);
    dbus_error_free(&dbus_error);
    cgroup_manager = nih_dbus_proxy_new(NULL, connection,
                NULL /* p2p */,
                "/org/linuxcontainers/cgmanager", NULL, NULL);
    dbus_connection_unref(connection);
    if (!cgroup_manager) {
        NihError *nerr;
        nerr = nih_error_get();
        VIR_ERROR("cgmanager: Error opening proxy: %s", nerr->message);
        nih_free(nerr);
        return false;
    }

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Don. I'll post a package set which takes a lock at cgm_dbus_connect and drops it at disconnect. Which release are you testing on? Utopic, like James was?

(Odd that I haven't run into that one here!)

Changed in libvirt (Ubuntu):
importance: Undecided → High
status: Invalid → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.2.8-0ubuntu18

---------------
libvirt (1.2.8-0ubuntu18) vivid; urgency=medium

  * mutex cgmanager actions (Thanks to Don Bowman for finding the cause)
    (LP: #1397130) (LP: #1367702)
 -- Serge Hallyn <email address hidden> Thu, 18 Dec 2014 13:28:03 -0600

Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello James, or anyone else affected,

Accepted libvirt into utopic-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/libvirt/1.2.8-0ubuntu11.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

Marked as verification done because the duplicated bug was marked as so.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Scott Kitterman (kitterman) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.