SIGSEGV on memory hotplug

Bug #1756915 reported by Christian Ehrhardt 
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

Following the instructions of bug 1755153 with the most recent 4.0.0-1ubuntu5 version I get a segfault.
The assumption is that one of the stable patches regressed this.

Thread 5 "libvirtd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fd6881c7700 (LWP 32564)]
virDomainDeviceGetInfo (device=device@entry=0x0) at ../../../src/conf/domain_conf.c:3546
warning: Source file is more recent than executable.
3546 switch ((virDomainDeviceType) device->type) {
(gdb) bt
#0 virDomainDeviceGetInfo (device=device@entry=0x0) at ../../../src/conf/domain_conf.c:3546
#1 0x00007fd69dadb205 in virDomainDefCompatibleDevice (def=0x7fd63015c8a0, dev=dev@entry=0x7fd670006be0,
    oldDev=oldDev@entry=0x0) at ../../../src/conf/domain_conf.c:26939
#2 0x00007fd67cc272fd in qemuDomainAttachDeviceLiveAndConfig (flags=<optimized out>, xml=<optimized out>,
    driver=0x7fd63010d380, vm=0x7fd63015c1e0, conn=0x7fd680000be0) at ../../../src/qemu/qemu_driver.c:8496
#3 qemuDomainAttachDeviceFlags (dom=<optimized out>, xml=<optimized out>, flags=<optimized out>)
    at ../../../src/qemu/qemu_driver.c:8555
#4 0x00007fd69db5fe97 in virDomainAttachDeviceFlags (domain=domain@entry=0x7fd670000d20,
    xml=0x7fd670000c80 "<memory model='dimm'>\n <target>\n", ' ' <repeats 16 times>, "<size unit='KiB'>524288</size>\n", ' ' <repeats 16 times>, "<node>0</node>\n </target>\n</memory>\n", flags=1)
    at ../../../src/libvirt-domain.c:8156
#5 0x000055cf6d358b73 in remoteDispatchDomainAttachDeviceFlags (server=0x55cf6e4c1bb0, msg=0x55cf6e508b80,
    args=0x7fd670000c20, rerr=0x7fd6881c6ba0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:3514
#6 remoteDispatchDomainAttachDeviceFlagsHelper (server=0x55cf6e4c1bb0, client=<optimized out>, msg=0x55cf6e508b80,
    rerr=0x7fd6881c6ba0, args=0x7fd670000c20, ret=0x7fd670000c60) at ../../../daemon/remote_dispatch.h:3490
#7 0x00007fd69dbc3cdc in virNetServerProgramDispatchCall (msg=0x55cf6e508b80, client=0x55cf6e4f6bc0,
    server=0x55cf6e4c1bb0, prog=0x55cf6e50e0d0) at ../../../src/rpc/virnetserverprogram.c:436
#8 virNetServerProgramDispatch (prog=0x55cf6e50e0d0, server=server@entry=0x55cf6e4c1bb0, client=0x55cf6e4f6bc0,
    msg=0x55cf6e508b80) at ../../../src/rpc/virnetserverprogram.c:307
#9 0x000055cf6d376688 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>,
    srv=0x55cf6e4c1bb0) at ../../../src/rpc/virnetserver.c:148
#10 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55cf6e4c1bb0) at ../../../src/rpc/virnetserver.c:169
#11 0x00007fd69da9e6d1 in virThreadPoolWorker (opaque=opaque@entry=0x55cf6e4aff00)
    at ../../../src/util/virthreadpool.c:167
#12 0x00007fd69da9da48 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206
#13 0x00007fd69d1476db in start_thread (arg=0x7fd6881c7700) at pthread_create.c:463
#14 0x00007fd69ce7088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

qemuDomainAttachDeviceLiveAndConfig
  if (virDomainDefCompatibleDevice(vm->def, dev_copy, NULL) < 0)

  -> virDomainDefCompatibleDevice
     # oldDev is the NULL above
     .oldInfo = virDomainDeviceGetInfo(oldDev),

     -> virDomainDeviceGetInfo
        # device is the NULL pointer
        switch ((virDomainDeviceType) device->type) {

This very likely is either fixed already or an effect of the stable fixes lacking a dependent change.

Checking upstream vs stable branch ...

Changed in libvirt (Ubuntu):
status: New → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

qemuDomainAttachDeviceLiveAndConfig touched in stable 9 / 20 / 24 / 27
virDomainDefCompatibleDevice touched in 8 11 25 27 28

...
Found that 0028-qemu-Fix-updating-device-with-boot-order.patch is it.
There was a follow on fix:
5535856f "conf: Fix crash in virDomainDefCompatibleDevice" that is needed as well.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That was post the changes we pulled so far, so it was just an upstream issue that needs fixing.
No artifact of the backports.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On the same topic:
b4fad8ec qemu: Fix comparison assignment in qemuDomainUpdateDeviceLive
4b1ec66c qemu: Fix memory leak in qemuConnectGetAllDomainStats error path

Maybe (if applicable):
33c6eb96 libvirtd: fix potential deadlock when reloading
c8935705 qemu: Use correct bus type for input devices
e02d102b qemu: hostdev: Fix the error on VM start with an mdev when IOMMU is off

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

all fixes are applicable (are needed) and apply cleanly.
A new test build is in my ppa for the next upload.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Download full text (4.4 KiB)

The libvirt-bin package from the ci-train ppa didn't fix my other problem (bug #1756981). libvirt-bin again changed pids after the virt-manager "crash".

grep libvirt /var/log/syslog shows when the version from the ppa segfaulted:

Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.302+0000: 30272: info : libvirt version: 4.0.0, package: 1ubuntu6~ppa2 (Christian Ehrhardt <email address hidden> Mon, 19 Mar 2018 14:57:08 +0100)
Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.302+0000: 30272: info : hostname: nsnx
Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.302+0000: 30272: error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/lib/libvirt/qemu/capabilities.pidfile': Resource temporarily unavailable
Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.909+0000: 30272: error : virDirOpenInternal:2840 : cannot open directory '/ds216/downloads/isos/ubuntu/14.04': No such file or directory
Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.909+0000: 30272: error : storageDriverAutostartCallback:210 : internal error: Failed to autostart storage pool '14.04': cannot open directory '/ds216/downloads/isos/ubuntu/14.04': No such file or directory
Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.910+0000: 30272: error : virDirOpenInternal:2840 : cannot open directory '/ds216/downloads/isos/ubuntu/16.04': No such file or directory
Mar 19 16:58:31 nsnx libvirtd[30256]: 2018-03-19 19:58:31.910+0000: 30272: error : storageDriverAutostartCallback:210 : internal error: Failed to autostart storage pool '16.04': cannot open directory '/ds216/downloads/isos/ubuntu/16.04': No such file or directory
Mar 19 17:04:48 nsnx kernel: [24991.495464] libvirtd[30258]: segfault at 0 ip 00007f3cee849020 sp 00007f3ce2016968 error 4 in libvirt.so.0.4000.0[7f3cee737000+3be000]
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Failed with result 'signal'.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Service hold-off time over, scheduling restart.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 1.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Found left-over process 3027 (dnsmasq) in control group while starting unit. Ignoring.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Found left-over process 3158 (dnsmasq) in control group while starting unit. Ignoring.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Found left-over process 3160 (dnsmasq) in control group while starting unit. Ignoring.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Found left-over process 3269 (dnsmasq) in control group while starting unit. Ignoring.
Mar 19 17:04:48 nsnx systemd[1]: libvirtd.service: Found left-over process 12043 (qemu-system-x86) in control group while starting unit. Ignoring.
Mar 19 17:04:49 nsnx dnsmasq[3158]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Mar 19 17:04:49 nsnx dnsmasq-dhcp[3158]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Mar 19 17:04:49 nsnx dnsmasq[3269]: read /var/lib/libvirt/dnsmasq/pxe5.addnhosts - 0 ad...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It was preliminary after all, thanks for the check Andreas.

Changed in libvirt (Ubuntu):
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Some things to hot add (without needing special devices) to test with this (some are more related to bug 1755153 but tested here as well):

Testing nvdimms is a bit odd, as there is no "enable" option.
So one needs to start with nvdimms to get it working.
And to do so it needs actually support in virt-aa-helper.
If one ignores that it would work like the following

$ sudo fallocate /var/lib/libvirt/qemu/nvdimm-test -l 512m
$ sudo chown libvirt-qemu:kvm /var/lib/libvirt/qemu/nvdimm-test
$ sudo chmod 777 /var/lib/libvirt/qemu/nvdimm-test
$ sudo fallocate /var/lib/libvirt/qemu/nvdimm-base -l 512m
$ sudo chown libvirt-qemu:kvm /var/lib/libvirt/qemu/nvdimm-base
$ sudo chmod 777 /var/lib/libvirt/qemu/nvdimm-base

Add a nvdimm to enable it on start to the base guest XML:
    <memory model='nvdimm'>
      <source>
        <path>/var/lib/libvirt/qemu/nvdimm-base</path>
      </source>
      <target>
        <size unit='KiB'>524288</size>
        <node>0</node>
      </target>
    </memory>

And for hot add one can use the file:
$ sudo virsh attach-device <guest> hotplug-mem512-nvdimm.xml --live

$ cat hotplug-mem512-nvdimm.xml
<memory model='nvdimm'>
        <source>
                <path>/var/lib/libvirt/qemu/nvdimm-test</path>
        </source>
        <target>
                <size unit='KiB'>524288</size>
                <node>0</node>
        </target>
</memory>

The hot add adds the rule:
  "/var/lib/libvirt/qemu/nvdimm-test" rwk,
as it should.

Needs the start rule thou (by virt-aa-helper).

Will retest once implemented.
First look at the others.

"Normal" memory - working
$ cat hotplug-mem512.xml
<memory model='dimm'>
        <target>
                <size unit='KiB'>524288</size>
                <node>0</node>
        </target>
</memory>

RNG Device - working
$ cat hotplug-rng.xml
<rng model='virtio'>
        <rate period="2000" bytes="1234"/>
        <backend model='random'>/dev/random</backend>
</rng>

Trivial input dev - working
$ cat hotplug-input.xml
<input type='tablet' bus='virtio'/>

Passthrough input dev - working
ubuntu@node-horsea:~$ cat hotplug-input-evdev.xml
<input type='passthrough' bus='virtio'>
        <source evdev='/dev/input/event0' />
</input>
Got rule:
    "/dev/input/event0" rwk,

But also on the latter if added to the initial domain xml it is failing (label callbacks work, but virt-aa-helper needs to learn about it).

So overall it seems we are all good, except the initial memory / input evdev specification.
Filing an extra bug for the two (it is a new issue essentially).

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Confirmed fixed for my rng case where it was crashing before. I used:

 *** 4.0.0-1ubuntu6~ppa6 500
        500 http://ppa.launchpad.net/ci-train-ppa-service/3200/ubuntu bionic/main amd64 Packages

Changed in libvirt (Ubuntu):
importance: High → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The further patches I want to upstream need another round of polishing and re-submission until they can be accepted.
Since this one is crit hitting so many people and the changes are backports-only (nothing that becomes a maintenance burden) lets unblock this issue by pushing an interim version through regression tests and upload it once good.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 4.0.0-1ubuntu6

---------------
libvirt (4.0.0-1ubuntu6) bionic; urgency=medium

  * Backport from recent upstream to stabilize libvirt (LP: #1756915)
    - d/p/stable/0033-qemu-Fix-comparison-assignment-in-qemuDomainUpdateDe.patch
    - d/p/stable/0034-qemu-Fix-memory-leak-in-qemuConnectGetAllDomainStats.patch
    - d/p/stable/0035-libvirtd-fix-potential-deadlock-when-reloading.patch
    - d/p/stable/0036-qemu-Use-correct-bus-type-for-input-devices.patch
    - d/p/stable/0037-qemu-hostdev-Fix-the-error-on-VM-start-with-an-mdev-.patch
    - d/p/stable/0038-conf-Fix-crash-in-virDomainDefCompatibleDevice.patch
  * d/p/ubuntu/lp1688508-tools-fix-variable-scope-in-in-check_guests_shutdown:
    avoid issues shutting down more guests than configured for parallel
    shutdown (LP: #1688508)
  * d/p/ubuntu-aa/lp1756394-virt-aa-helper-resolve-file-symlinks.patch: fix
    using devices that are symlinks (LP: #1756394)

 -- Christian Ehrhardt <email address hidden> Mon, 19 Mar 2018 14:57:08 +0100

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.