vring_get_region_caches: Assertion `caches != NULL' failed.

Bug #1859527 reported by dann frazier
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Disco
Won't Fix
Undecided
Unassigned
Eoan
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
QEMU crashes when passing through 8 GPU devices on an AMD Rome-based system which is configured (via BIOS) as a single NUMA domain.

[Test Case]

uvt-kvm create test
uvt-kvm wait test
uvt-kvm ssh test sudo poweroff

virsh edit test

# change:
# <driver name='qemu' type='qcow2'/>
# to:
# <driver name='qemu' type='qcow2' queues='128'/>

virsh start test
virsh console test

# QEMU will crash before booting into the kernel

[Fix]

the index into the bitmap array of batch_notify_vqs incremented itself by BITS_PER_LONG but then incorrectly indexed the unsigned long bitmap array by the full index value; when the number of vqs was under BITS_PER_LONG, the index was always 0, but once the number of vqs increased over BITS_PER_LONG, the bitmap array was indexed with (e.g. using BITS_PER_LONG == 64) bitmap[64] instead of bitmap[1]. Fix it to use the proper index by dividing the index counter by BITS_PER_LONG to get the bitmap array index.

[Regression Risk]

as this changes/fixes the index into an array, regressions would likely occur around notifications to the guest about virtio-blk device updates, or out-of-bounds errors in qemu itself, causing crash.

[Scope]

This is needed in b/e/f.

This bug was introduced by commit e21737ab150c2742dd94089017db96c472dd4b87 which was introduced in version 2.7.0, so this bug does not exist in Xenial or earlier.

This is fixed by commit 725fe5d10dbd4259b1853b7d253cef83a3c0d22a which is not yet in focal, but per comment 3 is included in the pending MR for focal.

[other info]

I added 'block-proposed' tags for b/e for this, to prevent release until after the patch has been released in focal.

Related branches

Revision history for this message
dann frazier (dannf) wrote :

Thread 1 (Thread 0x7f2a00963640 (LWP 15030)):
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f29fabdd801 in __GI_abort () at abort.c:79
#2 0x00007f29fabcd39a in __assert_fail_base (fmt=0x7f29fad547d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5624a8e2419e "caches != NULL", file=file@entry=0x5624a8e23b80 "/build/qemu-XrmZRw/qemu-2.11+dfsg/hw/virtio/virtio.c", line=line@entry=211, function=function@entry=0x5624a8e249f0 <__PRETTY_FUNCTION__.30338> "vring_get_region_caches") at assert.c:92
#3 0x00007f29fabcd412 in __GI___assert_fail (assertion=assertion@entry=0x5624a8e2419e "caches != NULL", file=file@entry=0x5624a8e23b80 "/build/qemu-XrmZRw/qemu-2.11+dfsg/hw/virtio/virtio.c", line=line@entry=211, function=function@entry=0x5624a8e249f0 <__PRETTY_FUNCTION__.30338> "vring_get_region_caches") at assert.c:101
#4 0x00005624a8a0bcbc in vring_get_region_caches (vq=<optimized out>) at ./hw/virtio/virtio.c:211
#5 0x00005624a8aa88d7 in vring_get_region_caches (vq=<optimized out>) at ./hw/virtio/virtio.c:1628
#6 vring_avail_flags (vq=<optimized out>) at ./hw/virtio/virtio.c:217
#7 virtio_should_notify (vdev=<optimized out>, vq=<optimized out>) at ./hw/virtio/virtio.c:1632
#8 0x00005624a8aaa0b5 in virtio_notify_irqfd (vdev=0x5624af9342b0, vq=0x7f26a0655110) at ./hw/virtio/virtio.c:1646
#9 0x00005624a8a7c05f in notify_guest_bh (opaque=0x5624af93f420) at ./hw/block/dataplane/virtio-blk.c:71
#10 0x00005624a8ded30e in aio_bh_call (bh=0x5624af93ebd0) at ./util/async.c:90
#11 aio_bh_poll (ctx=ctx@entry=0x5624aa267fb0) at ./util/async.c:118
#12 0x00005624a8df0200 in aio_dispatch (ctx=0x5624aa267fb0) at ./util/aio-posix.c:436
#13 0x00005624a8ded1ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ./util/async.c:261
#14 0x00007f29fb9b7417 in g_main_context_dispatch () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#15 0x00005624a8def457 in glib_pollfds_poll () at ./util/main-loop.c:214
#16 os_host_main_loop_wait (timeout=<optimized out>) at ./util/main-loop.c:261
#17 main_loop_wait (nonblocking=<optimized out>) at ./util/main-loop.c:515
#18 0x00005624a8a12ef6 in main_loop () at ./vl.c:1995
#19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ./vl.c:4944

dann frazier (dannf)
Changed in qemu (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
Revision history for this message
Dan Streetman (ddstreet) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI Added the change to the git branch of the upcoming qemu 4.2 for Focal

Revision history for this message
Dan Streetman (ddstreet) wrote :

disco is dead (tomorrow), marking wontfix for disco.

description: updated
description: updated
Changed in qemu (Ubuntu Disco):
status: New → Won't Fix
Revision history for this message
Dan Streetman (ddstreet) wrote :

qemu uploaded to b/e queues.

@dannf can you add the [test case] section content of the sru template?

Dan Streetman (ddstreet)
tags: added: block-proposed-bionic block-proposed-eoan sts
description: updated
dann frazier (dannf)
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Ok, normally we're waiting for the same changes to be made available in the development series, but since it's already staged in the 4.2 merge in-progress, I'll accept the SRUs. Just be sure to finalize the merge ASAP, before Feature Freeze (so before end of February).

Changed in qemu (Ubuntu Eoan):
status: New → Fix Committed
tags: added: verification-needed verification-needed-eoan
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted qemu into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.0+dfsg-0ubuntu9.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello dann, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.22 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.22)

All autopkgtests for the newly accepted qemu (1:2.11+dfsg-1ubuntu7.22) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/237-3ubuntu10.33 (i386, s390x)
vagrant-mutate/1.2.0-3 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.3)

All autopkgtests for the newly accepted qemu (1:4.0+dfsg-0ubuntu9.3) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/242-7ubuntu3.2 (i386)
edk2/0~20190606.20d2e5a1-2ubuntu1 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1859527] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.22)

On Mon, Jan 27, 2020 at 1:25 PM Ubuntu SRU Bot
<email address hidden> wrote:
>
> All autopkgtests for the newly accepted qemu (1:2.11+dfsg-1ubuntu7.22) for bionic have finished running.
> The following regressions have been reported in tests triggered by the package:
>
> systemd/237-3ubuntu10.33 (i386, s390x)

Since this failure, a subsequent systemd/s390x test ran and passed
suggesting this is a flaky test. It looks like this bug:
  https://github.com/systemd/systemd/issues/8880

The i386 failure looks like:
  https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1776654
I've therefore added a bionic task for it (as well as a theory on the
root cause). I did not retry this test.

> vagrant-mutate/1.2.0-3 (ppc64el)

I was unable to reproduce this on a ppc64el instance. Since this test
depends on data from a remote source, I suspect something was out of
sync at the time. I retried the test and it passed.

  -dann

Revision history for this message
dann frazier (dannf) wrote :

Verified, using test case described in Description.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
dann frazier (dannf) wrote :

qemu/eoan (1:4.0+dfsg-0ubuntu9.3) does not seem to be impacted by this issue - at least, my reproducer doesn't trigger it. However, I've tested the package from -proposed, and it continues to pass the test, so I'll mark eoan verified.

tags: added: verification-done verification-done-eoan
removed: verification-needed verification-needed-eoan
Revision history for this message
dann frazier (dannf) wrote :

Correction for the previous comment: version should be 1:4.0+dfsg-0ubuntu9.2. To be clear, neither the current QEMU in eoan-updates (.2), nor the one in eoan-proposed (.3) fails my reproducer.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.6 KiB)

This bug was fixed in the package qemu - 1:4.2-1ubuntu1

---------------
qemu (1:4.2-1ubuntu1) focal; urgency=medium

  * Merge with Debian testing, Among many other things this fixes LP Bugs:
    LP: #1847806 - add mff* instructions to not break on ppc64 with newer glibc
    LP: #1812822 - avoid crashes on detaching vhost_net interfaces
    LP: #1852744 - Crypto Passthrough Interrupt Support
    LP: #1853316 - CCW IPL Support
    Remaining changes:
    - qemu-kvm to systemd unit
      - d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
        hugepages and architecture specifics
      - d/qemu-system-common.qemu-kvm.service: systemd unit to call
        qemu-kvm-init
      - d/qemu-system-common.install: install helper script
      - d/qemu-system-common.maintscript: clean old sysv and upstart scripts
      - d/qemu-system-common.qemu-kvm.default: defaults for
        /etc/default/qemu-kvm
      - d/rules: call dh_installinit and dh_installsystemd for qemu-kvm
    - Distribution specific machine type (LP: 1304107 1621042)
      - d/p/ubuntu/define-ubuntu-machine-types.patch: define distro machine
        types
      - d/qemu-system-x86.NEWS Info on fixed machine type definitions
        for host-phys-bits=true (LP: 1776189)
      - add an info about -hpb machine type in debian/qemu-system-x86.NEWS
      - provide pseries-bionic-2.11-sxxm type as convenience with all
        meltdown/spectre workarounds enabled by default. (LP: 1761372).
    - Enable nesting by default
      - d/p/ubuntu/expose-vmx_qemu64cpu.patch: expose nested kvm by default
        in qemu64 cpu type.
      - d/p/ubuntu/enable-svm-by-default.patch: Enable nested svm by default
        in qemu64 on amd
        [ No more strictly needed, but required for backward compatibility ]
    - improved dependencies
      - Make qemu-system-common depend on qemu-block-extra
      - Make qemu-utils depend on qemu-block-extra
      - let qemu-utils recommend sharutils
    - s390x support
      - Create qemu-system-s390x package
      - Enable numa support for s390x
      - d/rules: build s390-ccw.img with upstream Makefile
      - d/rules: build s390-netboot.img with upstream Makefile
    - arch aware kvm wrappers
    - d/control: update VCS links
    - tolerate ipxe size change on migrations to >=18.04 (LP: 1713490)
      - d/p/ubuntu/pre-bionic-256k-ipxe-efi-roms.patch: old machine types
        reference 256k path
      - d/control-in: depend on ipxe-qemu-256k-compat-efi-roms to be able to
        handle incoming migrations from former releases.
    - d/control-in: Disable capstone disassembler library support (universe)
    - d/control: disable bluetooth being deprecated
    - d/not-installed: ignore new interop docs and extra icons for now
    - d/not-installed: do not install elf2dmp until namespaced
    - d/qemu-utils.install: install new tools qemu-edid and qemu-keymap
    - d/control-in: promote qemu-efi/ovmf in Ubuntu (LP 1570617)
    - d/binfmt-update-in: fix binfmt being called in some containers
      (LP 1840956)
  - Dropped changes (in Debian)
    - qemu-guest-agent: freeze-hook fixes (LP: 1484990)
      - d/qemu-guest-agent.install: provide /etc/qemu/fsfree...

Changed in qemu (Ubuntu Focal):
status: New → Fix Released
Revision history for this message
dann frazier (dannf) wrote :

Dropping block-proposed-{bionic,eoan} tags as fixes have now landed in focal.

tags: removed: block-proposed-bionic block-proposed-eoan
Changed in qemu (Ubuntu Bionic):
assignee: dann frazier (dannf) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.0+dfsg-0ubuntu9.3

---------------
qemu (1:4.0+dfsg-0ubuntu9.3) eoan; urgency=medium

  * d/p/lp1859527-virtio-blk-fix-out-of-bounds-access-to-bitmap-in-not.patch:
    fix bitmap index to prevent OOB access when # of vqs > 64 (LP: #1859527)

 -- Dan Streetman <email address hidden> Wed, 22 Jan 2020 08:50:56 -0500

Changed in qemu (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.22

---------------
qemu (1:2.11+dfsg-1ubuntu7.22) bionic; urgency=medium

  * d/p/lp1859527-virtio-blk-fix-out-of-bounds-access-to-bitmap-in-not.patch:
    fix bitmap index to prevent OOB access when # of vqs > 64 (LP: #1859527)

 -- Dan Streetman <email address hidden> Wed, 22 Jan 2020 08:55:45 -0500

Changed in qemu (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.