Activity log for bug #1731051

Date Who What changed Old value New value Message
2017-11-08 21:19:30 Sean Feole bug added bug
2017-11-08 21:19:30 Sean Feole attachment added qemu-patch.txt https://bugs.launchpad.net/bugs/1731051/+attachment/5006070/+files/qemu-patch.txt
2017-11-09 04:25:57 Ubuntu Foundations Team Bug Bot tags arm64 arm64 patch
2017-11-09 04:26:04 Ubuntu Foundations Team Bug Bot bug added subscriber Ubuntu Review Team
2017-11-09 09:22:22 Christian Ehrhardt  qemu (Ubuntu): status New Incomplete
2017-11-09 09:22:51 Christian Ehrhardt  bug added subscriber dann frazier
2017-11-10 07:43:22 Christian Ehrhardt  qemu (Ubuntu): status Incomplete Triaged
2017-11-13 13:09:03 Christian Ehrhardt  qemu (Ubuntu): status Triaged Incomplete
2017-11-14 07:28:03 Christian Ehrhardt  qemu (Ubuntu): status Incomplete Triaged
2017-11-14 07:28:11 Christian Ehrhardt  nominated for series Ubuntu Artful
2017-11-14 07:28:11 Christian Ehrhardt  bug task added qemu (Ubuntu Artful)
2017-11-14 07:28:16 Christian Ehrhardt  qemu (Ubuntu Artful): status New Triaged
2017-11-14 07:28:19 Christian Ehrhardt  qemu (Ubuntu): status Triaged In Progress
2017-11-14 22:45:58 Launchpad Janitor qemu (Ubuntu): status In Progress Fix Released
2017-11-15 07:04:28 Christian Ehrhardt  description The Pike cloud archive has a regression, compared to Ocata, where in rebooting a VM via virsh causes the VM to powerdown, and then exit. The VM does not automatically power back up, but can be restarted. Repro: Install 16.04.3 on an ARM64 host Fully update the install add-apt-repository cloud-archive:pike apt-get update apt-get install qemu-efi virt-manager libvirt-bin qemu-guest-agent qemu-system-aarch64 wget http://cdimage.ubuntu.com/ubuntu/releases/17.10/release/ubuntu-17.10-server-arm64.iso create a new session via ssh (session B) In session B: virt-install --accelerate --cdrom ubuntu-17.10-server-arm64.iso --disk size=10 --name ubuntu1710 --os-type linux --ram 1024 Once the install completes and the guest is at the login prompt, in session A: virsh reboot ubuntu1710 --mode acpi Observed result: The guest will powerdown as expected (from logs on session B), and then session B will be dumped back to the host shell. "virsh list" will not show the ubuntu1710 domain. Expected result: The guest powers back on, and boots back to the login prompt. Analysis: We observe these errors in various logs: Nov 1 13:29:16 ubuntu libvirtd[2441]: 2017-11-01 20:29:16.882+0000: 2441: error : qemuMonitorIORead:595 : Unable to read from monitor: Connection reset by peer Nov 1 13:29:16 ubuntu libvirtd[2441]: 2017-11-01 20:29:16.882+0000: 3101: error : qemuMonitorJSONCommandWithFd:309 : internal error: Missing monitor reply object 2017-11-01T20:29:16.538762Z qemu-system-aarch64: KVM_SET_DEVICE_ATTR failed: Group 4 attr 0x0000000000000001: No such device or address We debugged this to an issue in the QEMU in Pike being incompatible with the 4.10 kernel of 16.04.3. The QEMU in this version attempts to use the ITS migration functionality during reboot. 4.10 does not support this. When the IOCTL fails, QEMU calls abort(), thus killing the VM. We believe QEMU should not attempt to use this functionality if the host kernel does not support it. We suggest the attached patch to resolve the issue. [Impact] * Newer qemu crashes on older kernels (on arm) for using a feature that was not supported by these older kernels. * Backport of a fix - also the detection code itself already exists in qemu - this just makes sure that if the feature is not available that the related function is not queued to prevent a crash. [Test Case] * (on arm64 for the actual case - is a no-change everywhere else) 1. create a virtual machine that runs fine 2. suspend it $ sudo virsh dompmsuspend ubuntu1710 --target mem 3. wake it up $ sudo virsh dompmwakeup ubuntu1710 => Before the fix this sequence crashed qemu as outlined in the initial report below [Regression Potential] * This is only affecting arm (and thereby limiting regression to others) as well as being a backport and no "change from scratch" (limiting risk again). Then furthermore "all it does" is stop adding the ITS action which was a feature only added in Artfuls qemu. That said if there would be a case were the detection would be non-perfect, even then the user would just fall back to how it worked in zesty. That is a lot of IFs (=unlikely) and even if so impact would hopefully be minimal. So I think the regression assessment is very low for this change. [Other Info] * Even more important for backports of this like Ubuntu Cloud Archive --- The Pike cloud archive has a regression, compared to Ocata, where in rebooting a VM via virsh causes the VM to powerdown, and then exit. The VM does not automatically power back up, but can be restarted. Repro: Install 16.04.3 on an ARM64 host Fully update the install add-apt-repository cloud-archive:pike apt-get update apt-get install qemu-efi virt-manager libvirt-bin qemu-guest-agent qemu-system-aarch64 wget http://cdimage.ubuntu.com/ubuntu/releases/17.10/release/ubuntu-17.10-server-arm64.iso create a new session via ssh (session B) In session B: virt-install --accelerate --cdrom ubuntu-17.10-server-arm64.iso --disk size=10 --name ubuntu1710 --os-type linux --ram 1024 Once the install completes and the guest is at the login prompt, in session A: virsh reboot ubuntu1710 --mode acpi Observed result: The guest will powerdown as expected (from logs on session B), and then session B will be dumped back to the host shell. "virsh list" will not show the ubuntu1710 domain. Expected result: The guest powers back on, and boots back to the login prompt. Analysis: We observe these errors in various logs: Nov 1 13:29:16 ubuntu libvirtd[2441]: 2017-11-01 20:29:16.882+0000: 2441: error : qemuMonitorIORead:595 : Unable to read from monitor: Connection reset by peer Nov 1 13:29:16 ubuntu libvirtd[2441]: 2017-11-01 20:29:16.882+0000: 3101: error : qemuMonitorJSONCommandWithFd:309 : internal error: Missing monitor reply object 2017-11-01T20:29:16.538762Z qemu-system-aarch64: KVM_SET_DEVICE_ATTR failed: Group 4 attr 0x0000000000000001: No such device or address We debugged this to an issue in the QEMU in Pike being incompatible with the 4.10 kernel of 16.04.3. The QEMU in this version attempts to use the ITS migration functionality during reboot. 4.10 does not support this. When the IOCTL fails, QEMU calls abort(), thus killing the VM. We believe QEMU should not attempt to use this functionality if the host kernel does not support it. We suggest the attached patch to resolve the issue.
2017-11-15 07:05:20 Christian Ehrhardt  qemu (Ubuntu Artful): status Triaged In Progress
2017-11-15 07:05:45 Christian Ehrhardt  bug task added cloud-archive
2017-11-16 23:19:38 Brian Murray qemu (Ubuntu Artful): status In Progress Fix Committed
2017-11-16 23:19:39 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2017-11-16 23:19:40 Brian Murray bug added subscriber SRU Verification
2017-11-16 23:19:45 Brian Murray tags arm64 patch arm64 patch verification-needed verification-needed-artful
2017-11-20 09:38:16 Christian Ehrhardt  tags arm64 patch verification-needed verification-needed-artful arm64 patch verification-done verification-done-artful
2017-11-22 14:29:01 Corey Bryant nominated for series cloud-archive/pike
2017-11-22 14:29:01 Corey Bryant bug task added cloud-archive/pike
2017-11-22 14:29:25 Corey Bryant cloud-archive: status New Invalid
2017-11-22 14:29:31 Corey Bryant cloud-archive/pike: status New Triaged
2017-11-22 14:29:52 Corey Bryant cloud-archive: status Invalid Fix Released
2017-11-29 19:59:28 Corey Bryant cloud-archive/pike: status Triaged Fix Committed
2017-11-29 19:59:29 Corey Bryant tags arm64 patch verification-done verification-done-artful arm64 patch verification-done verification-done-artful verification-pike-needed
2017-12-04 19:21:22 Sean Feole tags arm64 patch verification-done verification-done-artful verification-pike-needed arm64 patch verification-done verification-done-artful verification-done-pike
2017-12-07 15:46:35 Launchpad Janitor qemu (Ubuntu Artful): status Fix Committed Fix Released
2017-12-07 15:46:41 Robie Basak removed subscriber Ubuntu Stable Release Updates Team
2017-12-10 13:54:21 Corey Bryant cloud-archive/pike: status Fix Committed Fix Released