QEMU crash using virtio-scsi with iothread

Bug #1885419 reported by Viktor Mihajlovski on 2020-06-28
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Undecided
Unassigned
Bionic
Critical
Christian Ehrhardt 
Eoan
Undecided
Unassigned
Focal
Undecided
Unassigned

Bug Description

[Impact]

 * Despite quite a bunch of regression testing, explicit testing by
   different parties and extra time in -proposed the fix to bug 1805256
   caused a regression for other configurations.

 * We will upload version .28 which essentially is a revert of .27 to
   provide us the time to revisit the fix for bug 1805256 again and not
   being forced to rush this cleanup.

[Test Case]

 * Ensure that the revert really avoids the regression to iothread
   handling. Start a guest with virtio-scsi + iothreads.

<domain type='kvm'>
...
  <iothreads>1</iothreads>
...
   <controller type='scsi' index='0' model='virtio-scsi'>
      <driver iothread='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
   </controller>
...
</domain>

[Regression Potential]

 * The regression already happened despite all measures. This is just the
   revert. So other than any issues creeping in at build time (e.g.
   toolchain changes, which I didn't see any) this should eventually match
   the former .27
   https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.27

[Other Info]

 * The formerly fixed bug 1805256 will re-open due to this and be worked on
   again.
 * When comparing for Delta I recommend comparing to 1:2.11+dfsg-
   1ubuntu7.26 as that will show that all but changelog entries are gone
   and nothing else changed.

---

After a recent upgrade I can't start a Windows 10 VM anymore, QEMU crashes with the error message:

error: Failed to start domain win10
error: internal error: qemu unexpectedly closed the monitor: qemu-system-x86_64: /build/qemu-v_zvmu/qemu-2.11+dfsg/util/aio-posix.c:592: aio_poll: Assertion `in_aio_context_home_thread(ctx)' failed.

I was able to resurrect the VM by removing the iothread-related elements from the domain definition:

<domain type='kvm'>
...
  <iothreads>1</iothreads>
...
   <controller type='scsi' index='0' model='virtio-scsi'>
      <driver iothread='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
   </controller>
...
</domain>

The domain XML is attached.

I'm using virtio-scsi and not virtio-blk because I use trimming to keep my QCOW image small(er). Would be great if I could continue to use this with iothreads enabled.

$ lsb_release -rd
Description: Ubuntu 18.04.4 LTS
Release: 18.04

$ apt-cache policy qemu-kvm
qemu-kvm:
  Installiert: 1:2.11+dfsg-1ubuntu7.27
  Installationskandidat: 1:2.11+dfsg-1ubuntu7.27
  Versionstabelle:
 *** 1:2.11+dfsg-1ubuntu7.27 500
        500 http://de.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.11+dfsg-1ubuntu7.26 500
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
     1:2.11+dfsg-1ubuntu7 500
        500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: qemu-kvm 1:2.11+dfsg-1ubuntu7.27
ProcVersionSignature: Ubuntu 4.15.0-108.109-generic 4.15.18
Uname: Linux 4.15.0-108-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7.15
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Sun Jun 28 13:05:17 2020
InstallationDate: Installed on 2019-10-03 (268 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-108-generic root=UUID=d40a86d5-61ae-486e-8ddf-9581c538d64e ro quiet splash vt.handoff=1
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/13/2011
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.60
dmi.board.name: Z68 Pro3
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.60:bd07/13/2011:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnZ68Pro3:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Bill Acker (acker9) wrote :

I too get this after an upgrade and I can add only a couple of things to Viktor's report. I've tried both aio=native and aio=threads with same result. Unlike Viktor, I'm using virtio-blk-pci with raw images.

With both aio=native and aio=threads:
qemu-system-x86_64: /build/qemu-v_zvmu/qemu-2.11+dfsg/util/aio-posix.c:592: aio_poll: Assertion `in_aio_context_home_thread(ctx)' failed.

$ apt-cache policy qemu-kvm
qemu-kvm:
  Installed: 1:2.11+dfsg-1ubuntu7.27
  Candidate: 1:2.11+dfsg-1ubuntu7.27

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status: New → Confirmed
Bill Acker (acker9) wrote :

On the same system and qemu command line options, I just tried my own upstream build from git://git.qemu.org/qemu.git at commit master:e7651153a8801dad6805d450ea8bef9b46c1adf5. I do not encounter the error with this upstream build.

$ ~/qemu/bin/qemu-system-x86_64 --version
QEMU emulator version 5.0.50 (v5.0.0-1824-ge7651153a8-dirty)

Thanks for the feedback.

It looks like this happens because of:

    /*
     * There cannot be two concurrent aio_poll calls for the same AioContext (or
     * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
     * We rely on this below to avoid slow locked accesses to ctx->notify_me.
     */
    assert(in_aio_context_home_thread(ctx));

added by the last update:

qemu (1:2.11+dfsg-1ubuntu7.27) bionic; urgency=medium

  * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts
    - aio: rename aio_context_in_iothread() to in_aio_context_home_thread()
    - aio: Do aio_notify_accept only during blocking aio_poll
    - aio-posix: Assert that aio_poll() is always called in home thread
    - async: use explicit memory barriers (LP: #1805256)
    - aio-wait: delegate polling of main AioContext if BQL not held
    - aio-posix: Don't count ctx->notifier as progress when polling

 -- Rafael David Tinoco <email address hidden> Tue, 26 May 2020 17:39:21 +0000

Is there a way you can test Eoan or Focal's version ?

Thank you for reporting this bug.

BTW, as a workaround, one should keep using version:

- 1:2.11+dfsg-1ubuntu7.26

and, perhaps, mark it as "hold" if needed (as the 7.27 version focus in an issue that could only be seen in aarch64 so far).

I was able to reproduce it:

$ virsh start tsting
error: Failed to start domain tsting
error: internal error: process exited while connecting to monitor: qemu-system-x86_64: /build/qemu-v_zvmu/qemu-2.11+dfsg/util/aio-posix.c:592: aio_poll: Assertion `in_aio_context_home_thread(ctx)' failed.

following your description. Checking other versions for now...

Alright, looks like it only affects bionic:

(c)rafaeldtinoco@qemueoan:~$ virsh start tsting
Domain tsting started

as Eoan worked fine with the same XML definition.

Changed in qemu (Ubuntu Focal):
status: New → Fix Released
Changed in qemu (Ubuntu Eoan):
status: New → Fix Released
Changed in qemu (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → High
Changed in qemu (Ubuntu):
status: Confirmed → Fix Released

I have uploaded a source package that overseeds existing bionic version at:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1885419

So anyone facing this issue can keep the PPA and have a mitigated version for now.

tags: added: regression-update
Changed in qemu (Ubuntu Bionic):
importance: High → Critical
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)

I have tagged it as update regression and bumped the priority.
Rafael is working on a fix already (thanks!).
Furthermore we reset the phasing of [1] to reduce the number of people getting the update.

And thanks for the report Victor and Bill.

P.S. I have added a todo to the regression tests to add such a iothread config to catch this earlier next time.

[1]: https://launchpad.net/ubuntu/bionic/amd64/qemu/1:2.11+dfsg-1ubuntu7.27

After an initial analysis by Rafael we decided that we will upload a revert of version .27 so that we have more time to re-evaluate this.

The backports of the fixes added the assert, but chances are that the issue was already present in Bionic before - not triggered but detected by the backport. To get everyone functional again I'll prepare an SRU to revert the change currently in -updates.
Once it lands we will re-open the bionic task of bug 1805256 and work on an improved fix for it there.

description: updated
Changed in qemu (Ubuntu Bionic):
status: Confirmed → In Progress
assignee: Rafael David Tinoco (rafaeldtinoco) → Christian Ehrhardt  (paelzer)

I reopened and assigned rafael to bug 1805256 to re-evaluate this.
I've taken over this bug to push a revert to Bionic asap.
SRU Template added to the bug here.

Since it is a plain revert with no other changes in between this (other than the changelog) matches the former 1%2.11+dfsg-1ubuntu7.26. Therefore I'll skip a review cycle in the Team and directly upload it to -unapproved for the SRU Team to look at it.

description: updated

To ssh://git.launchpad.net/~usd-import-team/ubuntu/+source/qemu
 * [new tag] upload/1%2.11+dfsg-1ubuntu7.28 -> upload/1%2.11+dfsg-1ubuntu7.28

Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading qemu_2.11+dfsg-1ubuntu7.28.dsc: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.28.debian.tar.xz: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.28_source.buildinfo: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.28_source.changes: done.
Successfully uploaded packages.

Hello Viktor, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.28 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Download full text (3.5 KiB)

Thank you for the quick response.
[1] has built now and I was verifying that it fixes the reported regression in Bionic.
Gladly 1:2.11+dfsg-1ubuntu7.26 still is in bionic-security so we can make a three way comparison.

For tests I used a default uvtool created VM and added:
<domain type='kvm'>
...
  <iothreads>1</iothreads>
...
   <controller type='scsi' index='0' model='virtio-scsi'>
      <driver iothread='1'/>
   </controller>
Note: if I did not add any disk to it nothing happened so I also set one of my disks to:
   <target dev='sda' bus='scsi'/>

Downgrade to former version (still in -security pocket):
1:2.11+dfsg-1ubuntu7.26:
Worked and started as expected

1:2.11+dfsg-1ubuntu7.27:
oot@b:~# virsh start iothreadfail
error: Failed to start domain iothreadfail
qemu-system-x86_64: /build/qemu-v_zvmu/qemu-2.11+dfsg/util/aio-posix.c:592: aio_poll: Assertion `in_aio_context_home_thread(ctx)' failed.

Upgrade to the version in proposed
1:2.11+dfsg-1ubuntu7.28:
Worked and started as expected

Log of the upgrade:
root@b:~# v=1:2.11+dfsg-1ubuntu7.28; apt install qemu-system-x86=$v qemu-kvm=$v qemu-block-extra=$v qemu-system-common=$v qemu-utils=$v
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
  samba vde2 sgabios debootstrap
The following packages will be upgraded:
  qemu-block-extra qemu-kvm qemu-system-common qemu-system-x86 qemu-utils
5 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 6783 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-utils amd64 1:2.11+dfsg-1ubuntu7.28 [870 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-system-common amd64 1:2.11+dfsg-1ubuntu7.28 [672 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-block-extra amd64 1:2.11+dfsg-1ubuntu7.28 [39.6 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-kvm amd64 1:2.11+dfsg-1ubuntu7.28 [13.2 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-system-x86 amd64 1:2.11+dfsg-1ubuntu7.28 [5188 kB]
Fetched 6783 kB in 2s (3931 kB/s)
(Reading database ... 43502 files and directories currently installed.)
Preparing to unpack .../qemu-utils_1%3a2.11+dfsg-1ubuntu7.28_amd64.deb ...
Unpacking qemu-utils (1:2.11+dfsg-1ubuntu7.28) over (1:2.11+dfsg-1ubuntu7.26) ...
Preparing to unpack .../qemu-system-common_1%3a2.11+dfsg-1ubuntu7.28_amd64.deb ...
Unpacking qemu-system-common (1:2.11+dfsg-1ubuntu7.28) over (1:2.11+dfsg-1ubuntu7.26) ...
Preparing to unpack .../qemu-block-extra_1%3a2.11+dfsg-1ubuntu7.28_amd64.deb ...
Unpacking qemu-block-extra:amd64 (1:2.11+dfsg-1ubuntu7.28) over (1:2.11+dfsg-1ubuntu7.26) ...
Preparing to unpack .../qemu-kvm_1%3a2.11+dfsg-1ubuntu7.28_amd64.deb ...
Unpacking qemu-kvm (1:2.11+dfsg-1ubuntu7.28) over (1:2.11+dfsg-1ubuntu7.26) ...
Preparing to unpack .../qemu-system-x86_1%3a2.11+dfsg-1ubuntu7.28_amd64.deb ...
Unpacking qemu-system-x86 (1:2.11+dfsg-1ubuntu7.28) over (1:2.11+dfsg-1ubuntu7.26) ...
Setting up qemu-block-extra:amd64 (1:2.11+dfsg-1ubuntu7.28) ...
Set...

Read more...

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic

All autopkgtests for the newly accepted qemu (1:2.11+dfsg-1ubuntu7.28) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

vagrant-mutate/1.2.0-3 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Robie Basak (racb) wrote :

Christian, your request to skip the ageing period seems fine, and thank you for responding to this bug so quickly. Your verification steps seem to relate only to the regression though. Are you satisfied that qemu is working generally OK from bionic-proposed - not just the specific crash being fixed?

Hi Robie,
I didn't do hourly long tests but spawned three different kind of guests on two architectures just as sanity check. I didn't do more as the code is literally the same as .26 was - only difference would be the toolchain at compile time. But even that is only a few weeks different, as qemu gets updates frequently.

Robie Basak (racb) wrote :

Thanks. The vagrant-mutate seems unrelated (looking at the history it fails often with various versions of qemu) and given that this is an exact revert (I checked) I believe this is fine to release then.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.28

---------------
qemu (1:2.11+dfsg-1ubuntu7.28) bionic; urgency=medium

  * Revert the fixes in 1:2.11+dfsg-1ubuntu7.27 for LP: 1805256 as they
    were causing regressions for some iothread use cases (LP: #1885419)

 -- Christian Ehrhardt <email address hidden> Tue, 30 Jun 2020 08:57:18 +0200

Changed in qemu (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Thanks a lot @racb for the quick SRU and @paelzer to take care of the rollback after I was gone.

Bill Acker (acker9) wrote :

I noticed the PPA update today. I installed it and I can confirm that 1:2.11+dfsg-1ubuntu7.28 is running for me without issue.
Thanks all. Quicker response than I was expecting!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers