Virtual machines not working anymore on 2.10

Bug #1714331 reported by Saverio Miroddi on 2017-08-31
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

Using 2.10, my virtual machine(s) don't work anymore. This happens 100% of the times.

-----

I use QEMU compiling it from source, on Ubuntu 16.04 amd64. This is the configure command:

    configure --target-list=x86_64-softmmu --enable-debug --enable-gtk --enable-spice --audio-drv-list=pa

I have one virtual disk, with a Windows 10 64-bit, which I launch in two different ways; both work perfectly on 2.9 (and used to do on 2.8, but I haven't used it for a long time).

This is the first way:

    qemu-system-x86_64
      -drive if=pflash,format=raw,readonly,file=/path/to/OVMF_CODE.fd
      -drive if=pflash,format=raw,file=/tmp/OVMF_VARS.fd.tmp
      -enable-kvm
      -machine q35,accel=kvm,mem-merge=off
      -cpu host,kvm=off,hv_vendor_id=vgaptrocks,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time
      -smp 4,cores=4,sockets=1,threads=1
      -m 4096
      -display gtk
      -vga qxl
      -rtc base=localtime
      -serial none
      -parallel none
      -usb
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device virtio-scsi-pci,id=scsi
      -drive file=/path/to/image-diff.img,id=hdd1,format=qcow2,if=none,cache=writeback
      -device scsi-hd,drive=hdd1
      -net nic,model=virtio
      -net user

On QEMU 2.10, I get the `Recovery - Your PC/Device needs to be repaired` windows screen; on 2.9, it boots regularly.

This is the second way:

    qemu-system-x86_64
      -drive if=pflash,format=raw,readonly,file=/path/to/OVMF_CODE.fd
      -drive if=pflash,format=raw,file=/tmp/OVMF_VARS.fd.tmp
      -enable-kvm
      -machine q35,accel=kvm,mem-merge=off
      -cpu host,kvm=off,hv_vendor_id=vgaptrocks,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time
      -smp 4,cores=4,sockets=1,threads=1
      -m 10240
      -vga none
      -rtc base=localtime
      -serial none
      -parallel none
      -usb
      -device vfio-pci,host=01:00.0,multifunction=on
      -device vfio-pci,host=01:00.1
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device usb-host,vendorid=0xNNNN,productid=0xNNNN
      -device virtio-scsi-pci,id=scsi
      -drive file=/path/to/image-diff.img,id=hdd1,format=qcow2,if=none,cache=writeback
      -device scsi-hd,drive=hdd1
      -net nic,model=virtio
      -net user

On QEMU 2.10, I get the debug window on the linux monitor, and blank screen on VFIO one (no BIOS screen at all); after 10/20 seconds, QEMU crashes without any message.
On 2.9, this works perfectly.

-----

I am able to perform a git bisect, if that helps, but if this is the case, I'd need this issue to be reviewed, since bisecting is going to take me a lot of time.

Stefan Hajnoczi (stefanha) wrote :

Please try: -machine pc-q35-2.9,accel=kvm,mem-merge=off

I'm not sure if it will make a difference in this case but it presents exactly the same onboard hardware as QEMU 2.9. The onboard hardware configuration may change unless you specify a precise version in the machine type, so I suggest using it instead of "q35".

Did you build QEMU 2.9 from source with the same ./configure options?

> Please try: -machine pc-q35-2.9,accel=kvm,mem-merge=off

I've tried the first configuration (windowed) with the change you proposed, and I get the same behavior (windows recovery); I didnt' try the second configuration (VFIO).

> Did you build QEMU 2.9 from source with the same ./configure options?

Yes, you can see the command at the beginning of the bug report.

Chris Unsworth (chrisu) wrote :

I am also unable to boot Windows 10 64 bit since updating to 2.10.0. I was previously using 2.8.0 which worked perfectly. Windows gets stuck at the big Windows logo with the spinning dots underneath and QEMU uses 100% CPU of 1 core. I am using kernel 4.11.9 and the following command line:

qemu-system-x86_64
-serial none
-parallel none
-balloon none
-enable-kvm
-name Windows10
-display none
-M q35,accel=kvm,mem-merge=off
-m 10240
-cpu host,kvm=off,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_vendor_id=whatever
-smp 4,sockets=1,cores=4,threads=1
-realtime mlock=on
-vga none
-device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
-device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on
-device vfio-pci,host=01:00.1,bus=root.1,addr=00.1
-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/OVMF_CODE-pure-efi.fd
-drive if=pflash,format=raw,file=/usr/share/ovmf/OVMF_VARS-pure-efi.fd
-drive if=none,id=drive0,cache=directsync,aio=native,format=raw,file=/dev/mint-vg/windows
-object iothread,id=iothread0
-device virtio-blk-pci,drive=drive0,scsi=off,iothread=iothread0
-drive if=none,id=drive1,cache=directsync,aio=native,format=raw,file=/dev/evo-vg/windows2
-object iothread,id=iothread1 -device virtio-blk-pci,drive=drive1,scsi=off,iothread=iothread1
-net nic,model=virtio
-net bridge,br=br0
-rtc base=localtime,clock=host
-usb
-device usb-host,bus=usb-bus.0,vendorid=0x1b1c,productid=0x1b22
-device usb-host,bus=usb-bus.0,vendorid=0x1b1c,productid=0x1b15

I have now rolled back to 2.8.0 and it is working again.

Stefan Hajnoczi (stefanha) wrote :

Mary: If you have time to run a git bisect that would be very helpful, thanks!

  $ git bisect v2.10.0 v2.9.0

OK, I'll do that. It will take me a bit of time - I'll timeframe a week.

Note that I can't know if both VFIO and non-VFIO context suffer the same bug. For simplicity, I'll peform the bisect using the non-VFIO version, then, once we have more information, we can decide how to proceed.

Chris Unsworth (chrisu) wrote :
Download full text (3.9 KiB)

I think my issue looks like the same. Sometimes I just get spinning dots, and sometimes there is the message about doing an automatic repair below the spinning dots before it stops and uses 100% cpu. I just did a git bisect:

# git bisect log
# bad: [1ab5eb4efb91a3d4569b0df6e824cc08ab4bd8ec] Update version for v2.10.0 release
# good: [6c02258e143700314ebf268dae47eb23db17d1cf] Update version for v2.9.0 release
git bisect start 'v2.10.0' 'v2.9.0'
# bad: [269c20b2bbd2aa8531e0cdc741fb166f290d7a2b] tests/qdict: check more get_try_int() cases
git bisect bad 269c20b2bbd2aa8531e0cdc741fb166f290d7a2b
# bad: [eba0161990af8509608332450ee7e338273cf5df] Merge remote-tracking branch 'rth/tags/pull-s390-20170512' into staging
git bisect bad eba0161990af8509608332450ee7e338273cf5df
# good: [9ea5ada76f34a0ef048b131c3a166d8564199bdb] audio: Use ARRAY_SIZE from qemu/osdep.h
git bisect good 9ea5ada76f34a0ef048b131c3a166d8564199bdb
# bad: [1effe6ad5eac1b2e50a077695ac801d172891d6a] Merge remote-tracking branch 'danpb/tags/pull-qcrypto-2017-05-09-1' into staging
git bisect bad 1effe6ad5eac1b2e50a077695ac801d172891d6a
# good: [f03f9f0c10dcfadee5811d43240f0a6af230f1ce] Merge remote-tracking branch 'cohuck/tags/s390x-3270-20170504' into staging
git bisect good f03f9f0c10dcfadee5811d43240f0a6af230f1ce
# good: [6c02258e143700314ebf268dae47eb23db17d1cf] qobject-input-visitor: Document full_name_nth()
git bisect good 6c02258e143700314ebf268dae47eb23db17d1cf
# bad: [95615ce5a1beffff1a5dd3597d8cb6ba83f0010e] vhost-scsi: create a vhost-scsi-common abstraction
git bisect bad 95615ce5a1beffff1a5dd3597d8cb6ba83f0010e
# bad: [31f5a726b59bda5580e2f9413867893501dd7d93] trace: add qemu mutex lock and unlock trace events
git bisect bad 31f5a726b59bda5580e2f9413867893501dd7d93
# bad: [49e00a18708e27c815828d9440d5c9300d19547c] use _Static_assert in QEMU_BUILD_BUG_ON
git bisect bad 49e00a18708e27c815828d9440d5c9300d19547c
# bad: [6103451aeb749e92bf7d730429985189c6921c32] hw/i386: Build-time assertion on pc/q35 reset register being identical.
git bisect bad 6103451aeb749e92bf7d730429985189c6921c32
# bad: [77af8a2b95b79699de650965d5228772743efe84] hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.
git bisect bad 77af8a2b95b79699de650965d5228772743efe84
# first bad commit: [77af8a2b95b79699de650965d5228772743efe84] hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.

77af8a2b95b79699de650965d5228772743efe84 is the first bad commit
commit 77af8a2b95b79699de650965d5228772743efe84
Author: Phil Dennis-Jordan <email address hidden>
Date: Wed Mar 15 19:20:26 2017 +1300

    hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.

    This updates the FADT generated for x86/64 machine types from Revision 1 to 3. (Based on ACPI standard 2.0 instead of 1.0) The intention is to expose the reset register information to guest operating systems which require it, specifically OS X/macOS. Revision 1 FADTs do not contain the fields relating to the reset register.

    The new layout and contents remains backwards-compatible with operating systems which only support ACPI 1.0, as the existing fields are not modified by this c...

Read more...

Download full text (7.8 KiB)

On Tue, Sep 05, 2017 at 06:42:06PM -0000, Chris Unsworth wrote:
> I think my issue looks like the same. Sometimes I just get spinning
> dots, and sometimes there is the message about doing an automatic repair
> below the spinning dots before it stops and uses 100% cpu. I just did a
> git bisect:

Perfect, thanks Chris!

I have CCed Phil and Paolo regarding the commit you identified.

>
> # git bisect log
> # bad: [1ab5eb4efb91a3d4569b0df6e824cc08ab4bd8ec] Update version for v2.10.0 release
> # good: [6c02258e143700314ebf268dae47eb23db17d1cf] Update version for v2.9.0 release
> git bisect start 'v2.10.0' 'v2.9.0'
> # bad: [269c20b2bbd2aa8531e0cdc741fb166f290d7a2b] tests/qdict: check more get_try_int() cases
> git bisect bad 269c20b2bbd2aa8531e0cdc741fb166f290d7a2b
> # bad: [eba0161990af8509608332450ee7e338273cf5df] Merge remote-tracking branch 'rth/tags/pull-s390-20170512' into staging
> git bisect bad eba0161990af8509608332450ee7e338273cf5df
> # good: [9ea5ada76f34a0ef048b131c3a166d8564199bdb] audio: Use ARRAY_SIZE from qemu/osdep.h
> git bisect good 9ea5ada76f34a0ef048b131c3a166d8564199bdb
> # bad: [1effe6ad5eac1b2e50a077695ac801d172891d6a] Merge remote-tracking branch 'danpb/tags/pull-qcrypto-2017-05-09-1' into staging
> git bisect bad 1effe6ad5eac1b2e50a077695ac801d172891d6a
> # good: [f03f9f0c10dcfadee5811d43240f0a6af230f1ce] Merge remote-tracking branch 'cohuck/tags/s390x-3270-20170504' into staging
> git bisect good f03f9f0c10dcfadee5811d43240f0a6af230f1ce
> # good: [6c02258e143700314ebf268dae47eb23db17d1cf] qobject-input-visitor: Document full_name_nth()
> git bisect good 6c02258e143700314ebf268dae47eb23db17d1cf
> # bad: [95615ce5a1beffff1a5dd3597d8cb6ba83f0010e] vhost-scsi: create a vhost-scsi-common abstraction
> git bisect bad 95615ce5a1beffff1a5dd3597d8cb6ba83f0010e
> # bad: [31f5a726b59bda5580e2f9413867893501dd7d93] trace: add qemu mutex lock and unlock trace events
> git bisect bad 31f5a726b59bda5580e2f9413867893501dd7d93
> # bad: [49e00a18708e27c815828d9440d5c9300d19547c] use _Static_assert in QEMU_BUILD_BUG_ON
> git bisect bad 49e00a18708e27c815828d9440d5c9300d19547c
> # bad: [6103451aeb749e92bf7d730429985189c6921c32] hw/i386: Build-time assertion on pc/q35 reset register being identical.
> git bisect bad 6103451aeb749e92bf7d730429985189c6921c32
> # bad: [77af8a2b95b79699de650965d5228772743efe84] hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.
> git bisect bad 77af8a2b95b79699de650965d5228772743efe84
> # first bad commit: [77af8a2b95b79699de650965d5228772743efe84] hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.
>
>
> 77af8a2b95b79699de650965d5228772743efe84 is the first bad commit
> commit 77af8a2b95b79699de650965d5228772743efe84
> Author: Phil Dennis-Jordan <email address hidden>
> Date: Wed Mar 15 19:20:26 2017 +1300
>
> hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest
> OS support.
>
> This updates the FADT generated for x86/64 machine types from
> Revision 1 to 3. (Based on ACPI standard 2.0 instead of 1.0) The
> intention is to expose the reset register information to guest operating
> systems which require it, specifically OS X/...

Read more...

Phil Dennis-Jordan (pmdj) wrote :
Download full text (8.5 KiB)

Hi Stefan,

What version of OVMF are you using? There were a bunch of bugs related to
FADT in OVMF which were fixed back in March. Specifically, commits
072060a, 78807f6, and 198a46d. If your version of OVMF doesn't include
these fixes, that's a likely source for your problems.

Hope that helps.

Phil

On 6 September 2017 at 16:34, Stefan Hajnoczi <email address hidden> wrote:

> On Tue, Sep 05, 2017 at 06:42:06PM -0000, Chris Unsworth wrote:
> > I think my issue looks like the same. Sometimes I just get spinning
> > dots, and sometimes there is the message about doing an automatic repair
> > below the spinning dots before it stops and uses 100% cpu. I just did a
> > git bisect:
>
> Perfect, thanks Chris!
>
> I have CCed Phil and Paolo regarding the commit you identified.
>
> >
> > # git bisect log
> > # bad: [1ab5eb4efb91a3d4569b0df6e824cc08ab4bd8ec] Update version for
> v2.10.0 release
> > # good: [6c02258e143700314ebf268dae47eb23db17d1cf] Update version for
> v2.9.0 release
> > git bisect start 'v2.10.0' 'v2.9.0'
> > # bad: [269c20b2bbd2aa8531e0cdc741fb166f290d7a2b] tests/qdict: check
> more get_try_int() cases
> > git bisect bad 269c20b2bbd2aa8531e0cdc741fb166f290d7a2b
> > # bad: [eba0161990af8509608332450ee7e338273cf5df] Merge remote-tracking
> branch 'rth/tags/pull-s390-20170512' into staging
> > git bisect bad eba0161990af8509608332450ee7e338273cf5df
> > # good: [9ea5ada76f34a0ef048b131c3a166d8564199bdb] audio: Use
> ARRAY_SIZE from qemu/osdep.h
> > git bisect good 9ea5ada76f34a0ef048b131c3a166d8564199bdb
> > # bad: [1effe6ad5eac1b2e50a077695ac801d172891d6a] Merge remote-tracking
> branch 'danpb/tags/pull-qcrypto-2017-05-09-1' into staging
> > git bisect bad 1effe6ad5eac1b2e50a077695ac801d172891d6a
> > # good: [f03f9f0c10dcfadee5811d43240f0a6af230f1ce] Merge
> remote-tracking branch 'cohuck/tags/s390x-3270-20170504' into staging
> > git bisect good f03f9f0c10dcfadee5811d43240f0a6af230f1ce
> > # good: [6c02258e143700314ebf268dae47eb23db17d1cf]
> qobject-input-visitor: Document full_name_nth()
> > git bisect good 6c02258e143700314ebf268dae47eb23db17d1cf
> > # bad: [95615ce5a1beffff1a5dd3597d8cb6ba83f0010e] vhost-scsi: create a
> vhost-scsi-common abstraction
> > git bisect bad 95615ce5a1beffff1a5dd3597d8cb6ba83f0010e
> > # bad: [31f5a726b59bda5580e2f9413867893501dd7d93] trace: add qemu mutex
> lock and unlock trace events
> > git bisect bad 31f5a726b59bda5580e2f9413867893501dd7d93
> > # bad: [49e00a18708e27c815828d9440d5c9300d19547c] use _Static_assert in
> QEMU_BUILD_BUG_ON
> > git bisect bad 49e00a18708e27c815828d9440d5c9300d19547c
> > # bad: [6103451aeb749e92bf7d730429985189c6921c32] hw/i386: Build-time
> assertion on pc/q35 reset register being identical.
> > git bisect bad 6103451aeb749e92bf7d730429985189c6921c32
> > # bad: [77af8a2b95b79699de650965d5228772743efe84] hw/i386: Use Rev3
> FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.
> > git bisect bad 77af8a2b95b79699de650965d5228772743efe84
> > # first bad commit: [77af8a2b95b79699de650965d5228772743efe84] hw/i386:
> Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.
> >
> >
> > 77af8a2b95b79699de650965d5228772743efe84 is the first bad commit
> > co...

Read more...

I'm using the Ubuntu artful version (0~20161202.7bbe0b3e-1), so I'll need to build a recent OVMF package to verify the solution.

Chris Unsworth (chrisu) wrote :

I was using a version of OVMF from about a year ago which works fine with QEMU 2.8 and 2.9 but doesn't work with 2.10. I just updated to the latest OVMF from master and everything's working now with QEMU 2.10 and WIndows 10 64 bit, so the old OVMF was the issue.

Thanks for your help,
Chris

Building the OVMF from the `edk2` current master (89796c69d9) fixes both (non/vfio) issues.

Should this be opened as a bug on the `ovmf` Ubuntu package (xenial/zesty/artful)? I'm very familiar with the Ubuntu version policies.

See also LP#1725560.

> Should this be opened as a bug on the `ovmf` Ubuntu package (xenial/zesty/artful)? I'm very familiar with the Ubuntu version policies.

I've just noticed that I intended to write "I'm *not* very familiar with the Ubuntu version policies".

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers