QEMU might fail to start on AMD CPUs when 'host-passthrough' is used

Bug #1828288 reported by Rafael David Tinoco on 2019-05-08
10
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Undecided
Rafael David Tinoco
Xenial
Undecided
Rafael David Tinoco

Bug Description

[Impact]

 * QEMU does not work in some AMD hardware when using host-passthrough as cpu-mode (usually to allow nested KVM to work).

[Test Case]

 * to use Xenial qemu (1:2.5+dfsg-5ubuntu10.36 ou 1:2.5+dfsg-5ubuntu10.37)
 * to use the following XML file: https://paste.ubuntu.com/p/BSyFY7ksR5/
 * to have AMD FX(tm)-8350 Eight-Core Processor CPU or similar

[Regression Potential]

 * initial qemu code could be affected, disallowing other guests, in other architectures, to be started
 * suggested patch is simple, being a positional change only
 * patch is upstream based and identifies the issue and is reported to be a fix for the described issue

[Other Info]

 * INITIAL CASE DESCRIPTION:

When using latest QEMU (-proposed) in Xenial you might encounter the following problem when trying to initialize your guests:

----

(c)inaddy@qemubug:~$ apt-cache policy qemu-system-x86
qemu-system-x86:
  Installed: 1:2.5+dfsg-5ubuntu10.37
  Candidate: 1:2.5+dfsg-5ubuntu10.37
  Version table:
 *** 1:2.5+dfsg-5ubuntu10.37 500
        500 http://ubuntu.c3sl.ufpr.br//ubuntu xenial-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.5+dfsg-5ubuntu10.36 500
        500 http://ubuntu.c3sl.ufpr.br//ubuntu xenial-updates/main amd64 Packages
     1:2.5+dfsg-5ubuntu10 500
        500 http://ubuntu.c3sl.ufpr.br/ubuntu xenial/main amd64 Packages

----

(c)inaddy@qemubug:~$ virsh list --all
 Id Name State
----------------------------------------------------
 - kdebian shut off
 - kguest shut off

(c)inaddy@qemubug:~$ virsh start --console kguest
error: Failed to start domain kguest
error: internal error: process exited while connecting to monitor: warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 1]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 2]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 4]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 7]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 8]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 9]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 13]
warning: host doesn't support requested feature: CPU

----

This happens because x86_cpu_get_migratable_flags() does not support CPUID_EXT2_AMD_ALIASES. After cherry-picking upstream patch 9997cf7bdac056aeed246613639675c5a9f8fdc2, that moves CPUID_EXT2_AMD_ALIASES code to after x86_cpu_filter_features(), the problem is fixed. Other QEMU versions are properly fixed and don't face this issue.

Cherry-picking commit and re-building the package makes it to work:
----

(c)inaddy@qemubug:~$ virsh start --console kguest
Domain kguest started
Connected to domain kguest
Escape character is ^]
[ 0.000000] Linux version 4.19.0-4-amd64 (<email address hidden>) (gcc version 8.3.0 (Debian 8.3.0-2)) #1
SMP Debian 4.19.28-2 (2019-03-15)
[ 0.000000] Command line: root=/dev/vda noresume console=tty0 console=ttyS0,38400n8 apparmor=0 net.ifnames=0 crashkernel=256M
[ 0.000000] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with crng_init=0
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
...

Related branches

Changed in qemu (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
status: New → Incomplete
status: Incomplete → In Progress

Upstream discussion regarding the topic can be found here:

http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02597.html

And it is well documented in the .patch inside debdiff.

Thank you for considering this fix.

Best,
Rafael

The attachment "qemu_2.5+dfsg-5ubuntu10.38.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch

Thanks Rafael for identifying the issue and already providing a patch.
A few other things need to clear the SRU pipe first, then we can take the change it LGTM.

Would you - in the meantime - find the time to add a full SRU template [1] for this to the description already?

[1]: https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

Hey Christian,

sorry for missing the SRU template, I was in hurry to create a package for myself and some local needs. I'll do it right now, feel free to SRU it whenever is good for you based on other fixes I know you usually take care off.

Thanks a lot!

description: updated
description: updated
Changed in qemu (Ubuntu Xenial):
status: New → In Progress
Changed in qemu (Ubuntu):
status: In Progress → Fix Released
Changed in qemu (Ubuntu Xenial):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)

I received an e-mail ~2 hours ago telling me:

"""
from: shahul hameed <email address hidden> via canonical.com

** Patch removed: "qemu_2.5+dfsg-5ubuntu10.38.debdiff"
   https://bugs.launchpad.net/ubuntu/xenial/+source/qemu/+bug/1828288/+attachment/5262436/+files/qemu_2.5+dfsg-5ubuntu10.38.debdiff
"""

This is weird. I'm not quite sure why my patch was removed from this case (and with which permissions inside launchpad), so I'm re-attaching the debdiff to the case.

Download full text (5.9 KiB)

PPA looks good:

2.5+dfsg-5ubuntu10.39:

2019-05-28 13:23:48.644+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10.26 (Marc Deslauriers <email address hidden> Tue, 14 May 2019 15:13:18 -0400), qemu version: 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.39), hostname: qemubug
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name kguest -S -machine pc-i440fx-2.5,accel=kvm,usb=off -cpu host -m 1024 -realtime mlock=off -smp 3,sockets=3,cores=1,threads=1 -uuid 7e55c71a-558f-412c-8445-db0e95fc549f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-kguest/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -kernel /var/lib/libvirt/images/kguest/vmlinuz -initrd /var/lib/libvirt/images/kguest/initrd.img -append 'root=/dev/vda noresume console=tty0 console=ttyS0,38400n8 apparmor=0 net.ifnames=0 crashkernel=256M' -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/kguest/disk01.ext4.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -fsdev local,security_model=passthrough,id=fsdev-fs0,path=/home/inaddy -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=inaddy,bus=pci.0,addr=0x5 -fsdev local,security_model=passthrough,id=fsdev-fs1,path=/root -device virtio-9p-pci,id=fs1,fsdev=fsdev-fs1,mount_tag=root,bus=pci.0,addr=0x6 -netdev tap,fd=26,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b4:78:29,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
Domain id=2 is tainted: high-privileges
Domain id=2 is tainted: host-cpu
char device redirected to /dev/pts/5 (label charserial0)
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 1]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 2]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 4]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 7]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 8]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 9]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 13]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 14]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 15]
warning: host doesn't support...

Read more...

There are 2 things in this bug:

- I had changed kernel to hwe-edge in Xenial during my tests and haven't realized that the nested kvm would only work with hwe-edge kernel (despite the requested feature CPUID warnings).

- The warnings are, indeed, fixed by the SRU proposed patch, but they are only esthetical. Feel free to drop the SRU proposal for this particular bug. I might have to bisect kernel for this bug, and mark Ubuntu Kernel for this one.

Thanks and sorry for the burden.

While the issue fixed is purely cosmetic we have had users led astray by these in the past.
Since we do an SRU anyway that should be ok to be uploaded.

Further I wanted to mention that I have not found an issue on this in my regression testing, yet I had no AMD host which the change is targeted to. So I appreciate if this is given soem extra testing in -propsed on AMD hosts where available.

Hello Rafael, or anyone else affected,

Accepted qemu into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.5+dfsg-5ubuntu10.40 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial

Verified. All bad cpu missing flags are gone:

2019-05-28T12:00:13.095808Z qemu-system-x86_64: terminating on signal 15 from pid 289
2019-06-19 18:02:13.008+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10.26 (Marc Deslauriers <email address hidden> Tue, 14 May 2019 15:13:18 -0400), qemu version: 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.40), hostname: qemuxenial
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name kguest -S -machine pc-i440fx-2.5,accel=kvm,usb=off -cpu host -m 1024 -realtime mlock=off -smp 3,sockets=3,cores=1,threads=1 -uuid 7e55c71a-558f-412c-8445-db0e95fc549f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-kguest/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -kernel /var/lib/libvirt/images/kguest/vmlinuz -initrd /var/lib/libvirt/images/kguest/initrd.img -append 'root=/dev/vda noresume console=tty0 console=ttyS0,38400n8 apparmor=0 net.ifnames=0 crashkernel=256M' -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/kguest/disk01.ext4.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -fsdev local,security_model=passthrough,id=fsdev-fs0,path=/home/inaddy -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=inaddy,bus=pci.0,addr=0x5 -fsdev local,security_model=passthrough,id=fsdev-fs1,path=/root -device virtio-9p-pci,id=fs1,fsdev=fsdev-fs1,mount_tag=root,bus=pci.0,addr=0x6 -netdev tap,fd=26,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b4:78:29,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
Domain id=1 is tainted: high-privileges
Domain id=1 is tainted: host-cpu
char device redirected to /dev/pts/5 (label charserial0)
2019-06-19T18:02:31.880128Z qemu-system-x86_64: terminating on signal 15 from pid 358

Thank you veery much!

tags: added: verification-done verification-done-xenial
removed: amd64 patch verification-needed verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu10.40

---------------
qemu (1:2.5+dfsg-5ubuntu10.40) xenial; urgency=medium

  * Restore patches that caused regression
    - d/p/lp1823458/add-VirtIONet-vhost_stopped-flag-to-prevent-multiple.patch
    - d/p/lp1823458/do-not-call-vhost_net_cleanup-on-running-net-from-ch.patch
  * Fix regression introduced by above patches (LP: #1829380)
    - d/p/lp1829380.patch

  [ Rafael David Tinoco ]
  * d/p/lp1828288/target-i386-Set-AMD-alias-bits-after-filtering-CPUID.patch
    - Fix issues with CPUID_EXT2_AMD_ALIASES allowing guests using
      cpu passthrough to boot. (LP: #1828288)

 -- Dan Streetman <email address hidden> Thu, 16 May 2019 14:29:56 -0400

Changed in qemu (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

For the original QEMU hanging in Bionic (with Bionic kernel and not HWE one), I have opened the following BUG: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834522 (fyio).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers