support GICv3 ITS save/restore & migration

Bug #1710019 reported by dann frazier
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
Ocata
Triaged
Medium
Unassigned
Pike
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
dann frazier
Xenial
Won't Fix
Undecided
Unassigned
Zesty
In Progress
Undecided
dann frazier
Artful
Fix Released
Medium
dann frazier
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Won't Fix
Undecided
Unassigned
Zesty
Triaged
Undecided
dann frazier
Artful
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
Virtual machines on GICv3-based ARM systems cannot be saved/restored or migrated.
This feature was added in QEMU 2.10.

[Test Case]
ubuntu@grotrian:~$ sudo virsh save 7936-0 7936-0.sav

Domain 7936-0 saved to 7936-0.sav

ubuntu@grotrian:~$ sudo virsh restore 7396-0.sav
error: Failed to restore domain from 7396-0.sav
error: operation failed: job: unexpectedly failed

ubuntu@grotrian:~$ sudo tail -3 /var/log/libvirt/qemu/7936-0.log
2017-08-10T21:26:38.217427Z qemu-system-aarch64: State blocked by non-migratable device 'arm_gicv3_its'
2017-08-10T21:26:38.217565Z qemu-system-aarch64: load of migration failed: Invalid argument
2017-08-10 21:26:38.217+0000: shutting down, reason=failed

[Regression Risk]
The kernel changes are restricted to ARM, minimizing the regression risk on other architectures. Other than one minor offset adjustment, all patches are clean cherry-picks from upstream. Tested on Cavium ThunderX and Qualcomm Centriq.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Dann,
you aready added half a SRU template - so are you knowing/working on a fix already?

Revision history for this message
dann frazier (dannf) wrote :

Indeed! I've updated the bug status to reflect that. Currently I do have a backport to 2.8 that seems to be working as well as latest upstream. That is, I can save and restore a UEFI guest, prior to starting Linux. I'm working on diagnosing an issue where a guest that has been saved after Linux has booted fails to restore with:

Unexpected error in kvm_device_access() at /home/ubuntu/qemu/accel/kvm/kvm-all.c:2229:
2017-08-11T15:12:42.922614Z qemu-system-aarch64: KVM_SET_DEVICE_ATTR failed: Group 4 attr 0x0000000000000002: Invalid argument
2017-08-11 15:12:43.683+0000: shutting down, reason=crashed

Changed in qemu (Ubuntu Zesty):
assignee: nobody → dann frazier (dannf)
status: New → In Progress
Changed in qemu (Ubuntu):
assignee: nobody → dann frazier (dannf)
status: New → In Progress
Changed in qemu (Ubuntu Xenial):
status: New → Triaged
Revision history for this message
dann frazier (dannf) wrote :

Here's the test tool I'm using to verify this. It spins up a zesty guest and gives it time to get past the bootloader and into Linux, then tries to save and restore it.

tar xvfz vm-save-restore.tar.gz
cd vm-save-restore
./setup.sh
./test.sh

I've tested this on an artful host, running the 4.12.0-10.11 kernel from ppa:canonical-kernel-team/unstable and latest upstream QEMU[*] (4.10-rc @ commit 9db6ffc7667673) on both a Cavium ThunderX CRB1S and a Qualcomm QDF2400. Both fail with:

+ sudo virsh define /tmp/tmp.Cx3wyNXijG
Domain 20066-0 defined from /tmp/tmp.Cx3wyNXijG

+ sudo virsh start 20066-0
Domain 20066-0 started

+ sleep 60
+ sudo virsh save 20066-0 /tmp/20066-0.sav

Domain 20066-0 saved to /tmp/20066-0.sav

+ sudo virsh restore /tmp/20066-0.sav
error: Failed to restore domain from /tmp/20066-0.sav
error: operation failed: domain is not running

$ sudo tail -3 /var/log/libvirt/qemu/20066-0.log
Unexpected error in kvm_device_access() at /home/ubuntu/qemu/accel/kvm/kvm-all.c:2229:
2017-08-11T21:33:42.819591Z qemu-system-aarch64: KVM_SET_DEVICE_ATTR failed: Group 4 attr 0x0000000000000002: Invalid argument
2017-08-11 21:33:43.024+0000: shutting down, reason=crashed

[*] Note: you currently need to boot the Ubuntu kernel w/ apparmor=0 to work with latest upstream QEMU

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, thanks for calrification good to know this is in your hands.

About:
[*] Note: you currently need to boot the Ubuntu kernel w/ apparmor=0 to work with latest upstream QEMU

I haven't seen anything like that when testing 2.10-rc1/rc2 - is that arm specific (apparmor usually prefers to fail everywhere the same way)?
Maybe it depends on the host kernel as most of my tests use Xenial-Kernel + LXD<newrelease>.
Since soon there will be a 2.10 based upload to artful please let me know if there is a bug or details about this somewhere else.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1710019] Re: support GICv3 ITS save/restore & migration

On Mon, Aug 14, 2017 at 2:24 AM, ChristianEhrhardt
<email address hidden> wrote:
> Ok, thanks for calrification good to know this is in your hands.
>
> About:
> [*] Note: you currently need to boot the Ubuntu kernel w/ apparmor=0 to work with latest upstream QEMU
>
> I haven't seen anything like that when testing 2.10-rc1/rc2 - is that arm specific (apparmor usually prefers to fail everywhere the same way)?
> Maybe it depends on the host kernel as most of my tests use Xenial-Kernel + LXD<newrelease>.
> Since soon there will be a 2.10 based upload to artful please let me know if there is a bug or details about this somewhere else.

It doesn't *look* to be arch-specific, but I haven't tested others. It
may just be the way my test works.

Starting at:
  244a56681 file-posix: Add image locking to perm operations

My test case (attached to this bug) began to fail with:

error: Failed to start domain 7936-0
error: internal error: process exited while connecting to monitor:
2017-08-14T23:43:10.255604Z qemu-system-aarch64: -drive
file=/home/ubuntu/vm-start-stop/vms/7936-0_CODE.fd,if=pflash,format=raw,unit=0,readonly=on:
Failed to unlock byte 100
2017-08-14T23:43:10.255750Z qemu-system-aarch64: -drive
file=/home/ubuntu/vm-start-stop/vms/7936-0_CODE.fd,if=pflash,format=raw,unit=0,readonly=on:
Failed to unlock byte 100
2017-08-14T23:43:10.255936Z qemu-system-aarch64: -drive
file=/home/ubuntu/vm-start-stop/vms/7936-0_CODE.fd,if=pflash,format=raw,unit=0,readonly=on:
Failed to lock byte 100

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Dan was already on that error:
"2017-08-14T23:43:10.255936Z qemu-system-aarch64: -drive
file=/home/ubuntu/vm-start-stop/vms/7936-0_CODE.fd,if=pflash,format=raw,unit=0,readonly=on:
Failed to lock byte 100"

Actually it should be fixed by the last libvirt adding apparmor rules to allow that.
I did so for the image files, but likely you miss rules for the pflash.

Can you please report:
1. the guest xml that you use
2. the dmesg likely including apparmor denials

With both I should hopefully be able to fix custom rules for those as well.
I wonder what rules the pflash have in general, but I'll see when you apply the XML.

Revision history for this message
dann frazier (dannf) wrote :

(AppArmor/pflash issue has been forked off in to LP: #1710960)

Revision history for this message
Vijaya Kumar Kilari (vkilari) wrote :

Was able to reproduce the issue with 4.13 + latest qemu with the scripts provided.
Issue seems to be with restoring of ITS table is failing. Investigating further.

Revision history for this message
Vijaya Kumar Kilari (vkilari) wrote :

4.13-rc6

Revision history for this message
Vijaya Kumar Kilari (vkilari) wrote :

With the below patch, The issue seems resolved. Please check and let me know.

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index aa6b68d..63f8ac3 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -2142,7 +2142,7 @@ static int vgic_its_restore_device_tables(struct vgic_its *its)
                                     vgic_its_restore_dte, NULL);
        }

- if (ret > 0)
+ if (ret <= 0)
                ret = -EINVAL;

        return ret;

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That is a kernel patch.
@Vijaya/Dannf - should we add a kernel task for them to be aware of?

Revision history for this message
dann frazier (dannf) wrote :

@Vijaya: Yes, that patch does resolve the issue for me, thanks! Do you plan to submit this upstream?

@Christian: Yep - I'll open one.

Changed in linux (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu Zesty):
status: New → Confirmed
status: Confirmed → Triaged
Changed in linux (Ubuntu Xenial):
status: New → Won't Fix
dann frazier (dannf)
Changed in qemu (Ubuntu):
assignee: dann frazier (dannf) → nobody
Revision history for this message
dann frazier (dannf) wrote :

Here's a backport for zesty's QEMU, tested with the 4.13.0-7.8 kernel from ppa:canonical-kernel-team/unstable w/ the patch from Comment #10 applied.

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Vijaya Kumar Kilari (vkilari) wrote :

Hi Dann,

Yes, I will upstream the patch.

Revision history for this message
dann frazier (dannf) wrote :
tags: added: patch
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Dann,
Sorry to ask, really I'm not neglecting all the work you do here.
But it is a huge set of changes (17 files changed, 929 insertions(+), 46 deletions(-), and not all arm only) and I wonder as Zesty will never have the 4.13 kernel (as HWE is for LTS only).

If it only works with the newer kernel what is the point adding it to Zesty's qemu?
If it is for LTS+CloudArchive only we might add it only there?

Revision history for this message
dann frazier (dannf) wrote :

On Wed, Aug 30, 2017 at 12:35 AM, ChristianEhrhardt
<email address hidden> wrote:
> Hi Dann,
> Sorry to ask, really I'm not neglecting all the work you do here.
> But it is a huge set of changes (17 files changed, 929 insertions(+), 46 deletions(-), and not all arm only) and I wonder as Zesty will never have the 4.13 kernel (as HWE is for LTS only).

No offense taken! It's a good question. I also have a tested backport
of the necessary changes to zesty's 4.10 kernel, I'm just awaiting
upstream acceptance of Vijay's patch before doing a final round of
testing and sending a PR to the kernel team.

WRT the patches impacting !arm, the only one I see here is
kvm-all-Pass-an-error-object-to-kvm_device_access.patch, which impacts
the error handling of kvm_device_access(). I'd be happy coordinate
appropriate testing on other architectures to mitigate the risk there.

  -dann

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Dann, I don't want to miss the activity here - so the next step you are expecting is me to evaluate the diff in detal and prep a qemu SRU?
Along that you will do regression tests on arm and I could do x86/ppc/s390x?

Is that correct then I'd create a ppa for both of us to do the checks or is something missing that I should wait on?

Revision history for this message
dann frazier (dannf) wrote :

On Wed, Sep 6, 2017 at 2:50 AM, ChristianEhrhardt
<email address hidden> wrote:
> Dann, I don't want to miss the activity here - so the next step you are expecting is me to evaluate the diff in detal and prep a qemu SRU?
> Along that you will do regression tests on arm and I could do x86/ppc/s390x?
>
> Is that correct then I'd create a ppa for both of us to do the checks or
> is something missing that I should wait on?

Christian,
  Personally, I've pinned my activity here until the issue in Comment
#2 is resolved upstream. Vijaya has submitted her patch upstream and
it is under discussion:

  http://www.spinics.net/lists/arm-kernel/msg605138.html

Until that concludes, I can't be 100% sure that QEMU will require no
further patches. But once that is resolved, then I think your proposed
plan above sounds good.

   -dann

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On Wed, Sep 6, 2017 at 7:20 PM, dann frazier <email address hidden>
wrote:

>
> Until that concludes, I can't be 100% sure that QEMU will require no
> further patches. But once that is resolved, then I think your proposed
> plan above sounds good.
>

Thanks for clarification - sounds good, give an update here then once that
concluded.

dann frazier (dannf)
Changed in linux (Ubuntu):
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu Zesty):
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Zesty):
status: Triaged → In Progress
Revision history for this message
dann frazier (dannf) wrote :

@Christian: The blocking issue was with the kernel. It has now been fixed upstream and I've begun submitting SRU patches for the kernel. +1 from me on proceeding with the QEMU portion.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I assume this is:
commit b92382620e33c9f1bcbcd7c169262b9bf0525871
    KVM: arm/arm64: vgic-its: Fix return value for device table restore
And some more patches around it.

Ok, so this gets into 4.14, which will only be available for Bionic and as HWE for Xenial.
For Qemu as you stated changes are in 2.10.

So that means that:
- The zesty Bug Task become Bionic?
- Then the tasks for Bionic are
  - Kernel: waiting on 4.14, then "Fix released"
  - Qemu: already on 2.10 => so "Fix Released" now
- The tasks for Xenial will be
  - KERNEL: a) waiting for HWE >=4.14 to arrive on Xenial
  - KERNEL: b) Backport changes into 4.4.x kernel
  - Qemu: a) you backport your patch in c #15 to Xenial and we both test for SRU?
  - Qemu: b) Only meant to work with UCA-Pike (if unreasonable backport size/risk), so
    "won't fix" on that task

That is a massive change in status/targets, so please please ack/comment before I do so.
Or do you get the kernel changes into more Ubuntu releases?

Revision history for this message
dann frazier (dannf) wrote :

I plan to submit my backport to zesty's (4.10) kernel - I haven't yet investigated the feasibility of backporting this feature to xenial's virt stack. With respect to QEMU, at this time I'm only requesting that the backport for zesty be considered.

Revision history for this message
dann frazier (dannf) wrote :

libvirt's apparmor policy doesn't allow for restoring instances from /tmp, so I've updated my reproducer tool:
  https://code.launchpad.net/~dannf/+git/vm-save-restore

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, Dann - thanks for the clarification.
So per the SRU rules to not regress on updates that would then be Zesty, Artful, Bionic.
With Qemu being ok since Artful and Kernel needing your backports in Zesty 4.10 and Artful 4.13 then.

I'm setting up the tasks correctly then.
Do you want me to eval the Qemu changes in zesty right now (based on your debdiff), or do you want to prep the kernel changes first so that you have verified them for the actual feature?

Changed in linux (Ubuntu Artful):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
Changed in qemu (Ubuntu Artful):
status: New → Fix Released
Changed in qemu (Ubuntu Xenial):
status: Triaged → Won't Fix
Revision history for this message
dann frazier (dannf) wrote :

I have a staging PPA that I'm using for testing at ppa:dannf/lp1710019. The QEMU there is the attached debdiff (trivially) forward-ported to the latest qemu in updates, and the kernel has the patches I plan to submit to the kernel team. I've verified that this all works together on a zesty system.

I also took a look at LP: #1731051 to see if we should include that fix in this backport as well. I don't think we need to - the reboot command in question doesn't work at all in zesty (w/ or w/o these patches. I assume that was a feature added to QEMU post-2.8.

So yeah, from point-of-view this is all good to go. I'll proceed with submitting the kernel-side backport.

dann frazier (dannf)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I have kicked off a verification run on the ppa to check the qemu in there.
I passed all stages just fine, which means you are likely fine to go when you have the kernel ready.
Thanks for also clarifying if we need to BP the #1731051 fix in relation to this change.

Ack - Feel free to upload the content in your ppa to zesty-unapproved or let me know if you want me to do so.

One more check - do you need changes in libvirt in regard to comment #24 as well?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: tested version was 1:2.8+dfsg-3ubuntu2.7+lp1710019.1

Revision history for this message
dann frazier (dannf) wrote :

Uploaded qemu_2.8+dfsg-3ubuntu2.8 to unapproved.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted qemu into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.8+dfsg-3ubuntu2.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-zesty
Revision history for this message
dann frazier (dannf) wrote :
Download full text (3.9 KiB)

I've verified this for zesty, using my patched zesty kernel. Demonstration of that follows. However, the kernel patches have not yet been approved for zesty, and may never be. I'll hold-off on updating the tags here until a) we have a final answer on the kernel patches or b) we decide there is value in updating QEMU w/o the kernel side.

= Verification log =

ubuntu@dawes:~$ uname -a
Linux dawes 4.10.0-40-generic #44+gicv3sr.1-Ubuntu SMP Thu Nov 9 22:53:54 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@dawes:~$ cd vm-save-restore/
ubuntu@dawes:~/vm-save-restore$ ./setup.sh
+ set -e
+ cloudrel=zesty
+ cloudimg=zesty-server-cloudimg-arm64.img
+ sudo apt-get install -y cloud-image-utils qemu-kvm qemu-utils qemu-efi libvirt-bin screen uuid-runtime
Reading package lists... Done
Building dependency tree
Reading state information... Done
cloud-image-utils is already the newest version (0.30-0ubuntu2).
screen is already the newest version (4.5.0-5ubuntu1).
qemu-efi is already the newest version (0~20161202.7bbe0b3e-1).
uuid-runtime is already the newest version (2.29-1ubuntu2.1).
libvirt-bin is already the newest version (2.5.0-3ubuntu5.6).
qemu-kvm is already the newest version (1:2.8+dfsg-3ubuntu2.8).
qemu-utils is already the newest version (1:2.8+dfsg-3ubuntu2.8).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
+ test -f zesty-server-cloudimg-arm64.img
+ echo #!/bin/sh
+ sudo tee /etc/qemu-ifup
+ echo
+ sudo tee -a /etc/qemu-ifup
+ echo set -e
+ sudo tee -a /etc/qemu-ifup
+ echo
+ sudo tee -a /etc/qemu-ifup
+ echo ip link set "$1" up
+ sudo tee -a /etc/qemu-ifup
+ echo ip link set "$1" master virbr0
+ sudo tee -a /etc/qemu-ifup
ubuntu@dawes:~/vm-save-restore$ ./test.sh
+ i=0
+ [ -f /var/log/libvirt/qemu/4273-0.log ]
+ name=4273-0
+ ./randmac.py
+ mac=00:16:3e:1c:57:55
+ uuidgen
+ uuid=1578301c-e2fe-4981-87b5-bee2c56fa14c
+ mktemp
+ xml=/tmp/tmp.D9DXzEvOHn
+ cp template.xml /tmp/tmp.D9DXzEvOHn
+ trap cleanup EXIT
+ mkdir -p vms
+ dd if=/dev/zero of=./vms/4273-0_CODE.fd bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.166913 s, 402 MB/s
+ dd if=/usr/share/qemu-efi/QEMU_EFI.fd of=./vms/4273-0_CODE.fd conv=notrunc
4096+0 records in
4096+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0602123 s, 34.8 MB/s
+ dd if=/dev/zero of=./vms/4273-0_VARS.fd bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.16786 s, 400 MB/s
+ cat
+ cloud-localds vms/4273-0_seed.img vms/4273-0_user-data
+ pwd
+ qemu-img create -f qcow2 -o backing_file=/home/ubuntu/vm-save-restore/zesty-server-cloudimg-arm64.img ./vms/4273-0.img
Formatting './vms/4273-0.img', fmt=qcow2 size=2361393152 backing_file=/home/ubuntu/vm-save-restore/zesty-server-cloudimg-arm64.img encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+ sed -i s/\#VM\#/4273-0/ /tmp/tmp.D9DXzEvOHn
+ pwd
+ sed -i s,\#DIR\#,/home/ubuntu/vm-save-restore, /tmp/tmp.D9DXzEvOHn
+ sed -i s/\#UUID\#/1578301c-e2fe-4981-87b5-bee2c56fa14c/ /tmp/tmp.D9DXzEvOHn
+ sed -i s/\#MAC\#/00:16:3e:1c:57:55/ /tmp/tmp.D9DXzEvOHn
+ sudo virsh define /tmp/tmp.D9DXzEvOHn
Domain 4273-0 defined from /tmp/tmp.D9DXzEvOHn

+ ...

Read more...

Stefan Bader (smb)
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
status: In Progress → Fix Committed
Changed in cloud-archive:
status: New → Invalid
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello dann, or anyone else affected,

Accepted qemu into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ocata-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ocata-needed to verification-ocata-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ocata-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ocata-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Dann,
maybe it was good to hold off on this.
It seems it causes a regression on arm, see bug 1734326.
Also by holding for an arbitrary amount of time it might block the SRU queue for something else.

If you agree I'd ask you to let the SRU team cancel the upload from proposed.
And you can then take time to:
a) continue with the kernel Team discussions
b) analyze how bug 1734326 would be related and how to fix it

If you do so I'd immediately do a revert in the qemu git so that the next SRU doesn't push the same by accident.

@Coreycb - if done that way you likely want to cancel ocata proposed as well.

Revision history for this message
Khaled El Mously (kmously) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1710019] Re: support GICv3 ITS save/restore & migration

On Fri, Nov 24, 2017 at 4:51 AM, ChristianEhrhardt
<email address hidden> wrote:
> Hi Dann,
> maybe it was good to hold off on this.
> It seems it causes a regression on arm, see bug 1734326.
> Also by holding for an arbitrary amount of time it might block the SRU queue for something else.

Sorry for the late reply - holidays, and behind on e-mail :(

> If you agree I'd ask you to let the SRU team cancel the upload from proposed.

I agree. I think it is best to scrap the zesty backport at this point.
I will request the reject.

> And you can then take time to:
> a) continue with the kernel Team discussions
> b) analyze how bug 1734326 would be related and how to fix it
>
> If you do so I'd immediately do a revert in the qemu git so that the
> next SRU doesn't push the same by accident.
>
> @Coreycb - if done that way you likely want to cancel ocata proposed as
> well.

+1 cancelling the current ocata-proposed QEMU.

I'll make a todo to investigate LP: #1734326 to see if it can be
easily resolved and, if so, possibly request that we carry these
patches as a diff in the ocata cloud archive.

tags: added: verification-failed-zesty verification-ocata-failed
removed: verification-needed-zesty verification-ocata-needed
Revision history for this message
Andy Whitcroft (apw) wrote :

Removed qemu from zesty-proposed based on this discussion.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Andy,
I pushed a revert to qemu's packaging git to ensure there is no accidential upload of the same content on the next zesty SRU.

@Dannf - once you had time to sort out the acceptance for the zesty kernel as well as this regressions and come to want to push this again let us know.
I set it back to triaged for now on zesty.

Changed in qemu (Ubuntu Zesty):
status: Fix Committed → Triaged
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hi @dannf,

Could you please verify the artful fix with the latest kernel on -proposed?

Thank you.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Kleber - that would mean we also want to ping Po-Hsu for a bug 1734326 retest?
He would need a ppa for that - @Dannf do you still have that in a ppa for him to test against the kernel in proposed?

Revision history for this message
dann frazier (dannf) wrote :

artful verification:
ubuntu@seyfert:~/vm-save-restore$ ./test.sh
+ i=0
+ [ -f /var/log/libvirt/qemu/4172-0.log ]
+ name=4172-0
+ ./randmac.py
+ mac=00:16:3e:7b:3b:f1
+ uuidgen
+ uuid=33815e19-0f44-4e6f-8209-ada375484ba3
+ mktemp
+ xml=/tmp/tmp.Mc2sJDaknd
+ cp template.xml /tmp/tmp.Mc2sJDaknd
+ trap cleanup EXIT
+ mkdir -p vms
+ dd if=/dev/zero of=./vms/4172-0_CODE.fd bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.173587 s, 387 MB/s
+ dd if=/usr/share/qemu-efi/QEMU_EFI.fd of=./vms/4172-0_CODE.fd conv=notrunc
4096+0 records in
4096+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0523696 s, 40.0 MB/s
+ dd if=/dev/zero of=./vms/4172-0_VARS.fd bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.173042 s, 388 MB/s
+ cat
+ cloud-localds vms/4172-0_seed.img vms/4172-0_user-data
+ pwd
+ qemu-img create -f qcow2 -o backing_file=/home/ubuntu/vm-save-restore/zesty-server-cloudimg-arm64.img ./vms/4172-0.img
Formatting './vms/4172-0.img', fmt=qcow2 size=2361393152 backing_file=/home/ubuntu/vm-save-restore/zesty-server-cloudimg-arm64.img cluster_size=65536 lazy_refcounts=off refcount_bits=16
+ sed -i s/\#VM\#/4172-0/ /tmp/tmp.Mc2sJDaknd
+ pwd
+ sed -i s,\#DIR\#,/home/ubuntu/vm-save-restore, /tmp/tmp.Mc2sJDaknd
+ sed -i s/\#UUID\#/33815e19-0f44-4e6f-8209-ada375484ba3/ /tmp/tmp.Mc2sJDaknd
+ sed -i s/\#MAC\#/00:16:3e:7b:3b:f1/ /tmp/tmp.Mc2sJDaknd
+ sudo virsh define /tmp/tmp.Mc2sJDaknd
Domain 4172-0 defined from /tmp/tmp.Mc2sJDaknd

+ sudo virsh start 4172-0
Domain 4172-0 started

+ sleep 60
+ sudo virsh save 4172-0 ./vms/4172-0.sav

Domain 4172-0 saved to ./vms/4172-0.sav

+ sudo virsh restore ./vms/4172-0.sav
Domain restored from ./vms/4172-0.sav

+ sleep 5
+ sudo virsh save 4172-0 ./vms/4172-0.sav

Domain 4172-0 saved to ./vms/4172-0.sav

+ sudo virsh restore ./vms/4172-0.sav
Domain restored from ./vms/4172-0.sav

+ sudo virsh save 4172-0 ./vms/4172-0.sav

Domain 4172-0 saved to ./vms/4172-0.sav

+ sudo virsh restore ./vms/4172-0.sav
Domain restored from ./vms/4172-0.sav

+ cleanup
+ sudo virsh destroy 4172-0
Domain 4172-0 destroyed

+ sudo virsh undefine 4172-0 --nvram
Domain 4172-0 has been undefined

+ rm -f ./vms/4172-0.img ./vms/4172-0_CODE.fd ./vms/4172-0_VARS.fd /tmp/tmp.Mc2sJDaknd ./vms/4172-0.sav vms/4172-0_user-data vms/4172-0_seed.img

tags: added: verification-done-artful
removed: verification-needed-artful
Revision history for this message
dann frazier (dannf) wrote :

@Christian: I don't know that we need Po-Hsu to retest bug 1734326 - the accepted fix as for the artful kernel - there's no artful QEMU counterpart needed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Oh I see that was artful, thanks dannf for clarifiaction.
So no reasons to move on bug 1734326 unless you come back having it analyzed then.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (15.0 KiB)

This bug was fixed in the package linux - 4.13.0-19.22

---------------
linux (4.13.0-19.22) artful; urgency=low

  * linux: 4.13.0-19.22 -proposed tracker (LP: #1736118)

  * CVE-2017-1000405
    - mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()

linux (4.13.0-18.21) artful; urgency=low

  * linux: 4.13.0-18.21 -proposed tracker (LP: #1733530)

  * NVMe timeout is too short (LP: #1729119)
    - nvme: update timeout module parameter type

  * CPU call trace on AMD Raven Ridge after S3 (LP: #1732894)
    - x86/mce/AMD: Allow any CPU to initialize the smca_banks array

  * Set PANIC_TIMEOUT=10 on Power Systems (LP: #1730660)
    - [Config]: Set PANIC_TIMEOUT=10 on ppc64el

  * Cannot pair BLE remote devices when using combo BT SoC (LP: #1731467)
    - Bluetooth: increase timeout for le auto connections

  * enable CONFIG_SND_SOC_INTEL_BYT_CHT_NOCODEC_MACH easily confuse users
    (LP: #1732627)
    - [Config] CONFIG_SND_SOC_INTEL_BYT_CHT_NOCODEC_MACH=n

  * Plantronics P610 does not support sample rate reading (LP: #1719853)
    - ALSA: usb-audio: Add sample rate quirk for Plantronics P610

  * Allow drivers to use Relaxed Ordering on capable root ports (LP: #1721365)
    - Revert commit 1a8b6d76dc5b ("net:add one common config...")
    - net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

  * support GICv3 ITS save/restore & migration (LP: #1710019)
    - KVM: arm/arm64: vgic-its: Fix return value for device table restore

  * Device hotplugging with MPT SAS cannot work for VMWare ESXi (LP: #1730852)
    - scsi: mptsas: Fixup device hotplug for VMWare ESXi

  * Artful update to 4.13.13 stable release (LP: #1732726)
    - netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to
      rhashtable"
    - netfilter: nft_set_hash: disable fast_ops for 2-len keys
    - workqueue: Fix NULL pointer dereference
    - crypto: ccm - preserve the IV buffer
    - crypto: x86/sha1-mb - fix panic due to unaligned access
    - crypto: x86/sha256-mb - fix panic due to unaligned access
    - KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]
    - ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360
    - ARM: 8720/1: ensure dump_instr() checks addr_limit
    - ALSA: timer: Limit max instances per timer
    - ALSA: usb-audio: support new Amanero Combo384 firmware version
    - ALSA: hda - fix headset mic problem for Dell machines with alc274
    - ALSA: seq: Fix OSS sysex delivery in OSS emulation
    - ALSA: seq: Avoid invalid lockdep class warning
    - MIPS: Fix CM region target definitions
    - MIPS: BMIPS: Fix missing cbr address
    - MIPS: AR7: Defer registration of GPIO
    - MIPS: AR7: Ensure that serial ports are properly set up
    - KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT
      updates
    - Input: elan_i2c - add ELAN060C to the ACPI table
    - rbd: use GFP_NOIO for parent stat and data requests
    - drm/vmwgfx: Fix Ubuntu 17.10 Wayland black screen issue
    - Revert "x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo"
    - can: sun4i: handle overrun in RX FIFO
    - can: peak: Add support for new PCIe/M2 CAN FD interfaces
    - can: ifi: Fix transmitter del...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package qemu - 1:2.8+dfsg-3ubuntu2.8~cloud0
---------------

 qemu (1:2.8+dfsg-3ubuntu2.8~cloud0) xenial-ocata; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 qemu (1:2.8+dfsg-3ubuntu2.8) zesty; urgency=medium
 .
   * Backport support for GICv3/vITS save/restore (LP: #1710019).

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This fix has been reverted in qemu 1:2.8+dfsg-3ubuntu2.9~cloud1 to align with Zesty. qemu 1:2.8+dfsg-3ubuntu2.9~cloud1 has now been released to ocata-updates.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

qemu (1:2.8+dfsg-3ubuntu2.9~cloud1) xenial-ocata; urgency=medium

  * reverted "Backport support for GICv3/vITS save/restore (LP 1710019)."
    as there was an arm regressions found in zesty-proposed (cancelled before
    SRU release).

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.0 KiB)

This bug was fixed in the package linux - 4.13.0-25.29

---------------
linux (4.13.0-25.29) artful; urgency=low

  * linux: 4.13.0-25.29 -proposed tracker (LP: #1741955)

  * CVE-2017-5754
    - Revert "UBUNTU: [Config] updateconfigs to enable PTI"
    - [Config] Enable PTI with UNWINDER_FRAME_POINTER

linux (4.13.0-24.28) artful; urgency=low

  * linux: 4.13.0-24.28 -proposed tracker (LP: #1741745)

  * CVE-2017-5754
    - x86/cpu, x86/pti: Do not enable PTI on AMD processors

linux (4.13.0-23.27) artful; urgency=low

  * linux: 4.13.0-23.27 -proposed tracker (LP: #1741556)

  [ Kleber Sacilotto de Souza ]
  * CVE-2017-5754
    - x86/mm: Add the 'nopcid' boot option to turn off PCID
    - x86/mm: Enable CR4.PCIDE on supported systems
    - x86/mm: Document how CR4.PCIDE restore works
    - x86/entry/64: Refactor IRQ stacks and make them NMI-safe
    - x86/entry/64: Initialize the top of the IRQ stack before switching stacks
    - x86/entry/64: Add unwind hint annotations
    - xen/x86: Remove SME feature in PV guests
    - x86/xen/64: Rearrange the SYSCALL entries
    - irq: Make the irqentry text section unconditional
    - x86/xen/64: Fix the reported SS and CS in SYSCALL
    - x86/paravirt/xen: Remove xen_patch()
    - x86/traps: Simplify pagefault tracing logic
    - x86/idt: Unify gate_struct handling for 32/64-bit kernels
    - x86/asm: Replace access to desc_struct:a/b fields
    - x86/xen: Get rid of paravirt op adjust_exception_frame
    - x86/paravirt: Remove no longer used paravirt functions
    - x86/entry: Fix idtentry unwind hint
    - x86/mm/64: Initialize CR4.PCIDE early
    - objtool: Add ORC unwind table generation
    - objtool, x86: Add facility for asm code to provide unwind hints
    - x86/unwind: Add the ORC unwinder
    - x86/kconfig: Consolidate unwinders into multiple choice selection
    - objtool: Upgrade libelf-devel warning to error for CONFIG_ORC_UNWINDER
    - x86/ldt/64: Refresh DS and ES when modify_ldt changes an entry
    - x86/mm: Give each mm TLB flush generation a unique ID
    - x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
    - x86/mm: Rework lazy TLB mode and TLB freshness tracking
    - x86/mm: Implement PCID based optimization: try to preserve old TLB entries
      using PCID
    - x86/mm: Factor out CR3-building code
    - x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
    - x86/mm: Flush more aggressively in lazy TLB mode
    - Revert "x86/mm: Stop calling leave_mm() in idle code"
    - kprobes/x86: Set up frame pointer in kprobe trampoline
    - x86/tracing: Introduce a static key for exception tracing
    - x86/boot: Add early cmdline parsing for options with arguments
    - mm, x86/mm: Fix performance regression in get_user_pages_fast()
    - x86/asm: Remove unnecessary \n\t in front of CC_SET() from asm templates
    - objtool: Don't report end of section error after an empty unwind hint
    - x86/head: Remove confusing comment
    - x86/head: Remove unused 'bad_address' code
    - x86/head: Fix head ELF function annotations
    - x86/boot: Annotate verify_cpu() as a callable function
    - x86/xen: Fix xen head ELF annotations
    - x86/xen: Add unwind hint anno...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.