Fix for CVE-2016-5403 causes crash on migration if memory stats are enabled

Bug #1612089 reported by Gaudenz Steinlin on 2016-08-11
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
High
Corey Bryant
Icehouse
Undecided
Corey Bryant
Kilo
Undecided
Corey Bryant
Liberty
Undecided
Corey Bryant
Mitaka
Undecided
Corey Bryant
qemu (Ubuntu)
High
Marc Deslauriers
Trusty
High
Marc Deslauriers
Xenial
High
Marc Deslauriers
Yakkety
High
Marc Deslauriers
qemu-kvm (Ubuntu)
Precise
High
Marc Deslauriers

Bug Description

If memory statistics are enabled for the memory baloon device in libvirt like this:

<memballoon model='virtio'>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
   <stats period='10'/>
</memballoon>

Then qemu exits with "qemu-system-x86_64: Virtqueue size exceeded" after the VM is migrated or when starting the VM again after a managedsave.

This bug is present since 2.0.0+dfsg-2ubuntu1.26 and was not present in 2.0.0+dfsg-2ubuntu1.24. It's most probably caused by the Fix for CVE-2016-5403.

Steps to reproduce:
1. Create a VM with libvirt which contains the above memory balloon device
2. Start the VM and let the Linux kernel boot (bug does not appear if the kernel is not yet booted, eg. while in the PXE boot phase)
3. Issue a managedsave
4. Start the VM again
5. The VM is restored and "crashes" right after it starts running again.
6. You can find the qemu output "qemu-system-x86_64: Virtqueue size exceeded" in the log at /var/log/libvirt/vmname.log

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: qemu-system-x86 2.0.0+dfsg-2ubuntu1.26
ProcVersionSignature: Ubuntu 3.13.0-93.140-generic 3.13.11-ckt39
Uname: Linux 3.13.0-93-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.21
Architecture: amd64
Date: Thu Aug 11 08:39:33 2016
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install)

CVE References

Robie Basak (racb) on 2016-08-11
tags: added: regression-update
Changed in qemu (Ubuntu):
importance: Undecided → High

I also posted the same report on the qemu-devel mailinglist. Maybe they have any comments.
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02270.html

Marc Deslauriers (mdeslaur) wrote :

Thanks for reporting this issue, I can reproduce it.

Changed in qemu (Ubuntu):
assignee: nobody → Marc Deslauriers (mdeslaur)
status: New → Confirmed
Changed in qemu (Ubuntu Precise):
status: New → Confirmed
Changed in qemu (Ubuntu Trusty):
status: New → Confirmed
Changed in qemu (Ubuntu Xenial):
status: New → Confirmed
Changed in qemu (Ubuntu Precise):
assignee: nobody → Marc Deslauriers (mdeslaur)
Changed in qemu (Ubuntu Trusty):
assignee: nobody → Marc Deslauriers (mdeslaur)
Changed in qemu (Ubuntu Xenial):
assignee: nobody → Marc Deslauriers (mdeslaur)
Marc Deslauriers (mdeslaur) wrote :

I can't reproduce this in Yakkety with qemu 2.6, which means it's a bad backport to earlier releases.

Changed in qemu (Ubuntu Yakkety):
status: Confirmed → Fix Released
Changed in qemu (Ubuntu Precise):
importance: Undecided → High
Changed in qemu (Ubuntu Trusty):
importance: Undecided → High
Changed in qemu (Ubuntu Xenial):
importance: Undecided → High
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.27

---------------
qemu (2.0.0+dfsg-2ubuntu1.27) trusty-security; urgency=medium

  * SECURITY REGRESSION: crash on migration with memory stats enabled
    (LP: #1612089)
    - debian/patches/CVE-2016-5403.patch: disable for now pending
      investigation.

 -- Marc Deslauriers <email address hidden> Fri, 12 Aug 2016 08:48:20 -0400

Changed in qemu (Ubuntu Trusty):
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu10.4

---------------
qemu (1:2.5+dfsg-5ubuntu10.4) xenial-security; urgency=medium

  * SECURITY REGRESSION: crash on migration with memory stats enabled
    (LP: #1612089)
    - debian/patches/CVE-2016-5403.patch: disable for now pending
      investigation.

 -- Marc Deslauriers <email address hidden> Fri, 12 Aug 2016 08:46:19 -0400

Changed in qemu (Ubuntu Xenial):
status: Confirmed → Fix Released
Changed in qemu (Ubuntu Precise):
status: Confirmed → Invalid
Changed in qemu-kvm (Ubuntu Trusty):
status: New → Invalid
Changed in qemu-kvm (Ubuntu Xenial):
status: New → Invalid
Changed in qemu-kvm (Ubuntu Yakkety):
status: New → Invalid
Changed in qemu-kvm (Ubuntu Precise):
assignee: nobody → Marc Deslauriers (mdeslaur)
importance: Undecided → High
status: New → Fix Released
no longer affects: qemu (Ubuntu Precise)
no longer affects: qemu-kvm (Ubuntu Trusty)
no longer affects: qemu-kvm (Ubuntu Xenial)
no longer affects: qemu-kvm (Ubuntu Yakkety)
no longer affects: qemu-kvm (Ubuntu)
Mehdi Abaakouk (sileht) wrote :

I run the cloud archive version and got the issue

qemu-kvm: 1:2.3+dfsg-5ubuntu9.4~cloud1 (trusty)

Hello Gaudenz, or anyone else affected,

Accepted qemu into kilo-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:kilo-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-kilo-needed to verification-kilo-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-kilo-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-kilo-needed
Ryan Beisner (1chb1n) wrote :

Hello Gaudenz, or anyone else affected,

Accepted qemu into liberty-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:liberty-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-liberty-needed to verification-liberty-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-liberty-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-liberty-needed
no longer affects: cloud-archive/newton
Ryan Beisner (1chb1n) wrote :

Hello Gaudenz, or anyone else affected,

Accepted qemu into icehouse-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:icehouse-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-icehouse-needed to verification-icehouse-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-icehouse-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-icehouse-needed
Linh Vu (linhgb) wrote :

We encountered this problem with qemu-system 1:2.3+dfsg-5ubuntu9.4~cloud1 in trusty-updates liberty

qemu-system 1:2.3+dfsg-5ubuntu9.4~cloud2 in trusty-proposed liberty has fixed it for us. :)

James Page (james-page) on 2016-09-02
tags: added: verification-liberty-done
removed: verification-liberty-needed
tags: added: verification-icehouse-done
removed: verification-icehouse-needed
Simon Déziel (sdeziel) wrote :

@mdeslaur, do you have an ETA for the package that will address the CVE without regressing live migrations? I'd be interested to know for Trusty and Xenial, please. Or maybe there is another bug I should track for that? Thanks

James Page (james-page) on 2016-09-08
Changed in cloud-archive:
status: New → Invalid
importance: Undecided → High
Simon Leinen (simon-leinen) wrote :

Folks, thanks for working on this!

Today we looked at OSSN-0069 (https://bugs.launchpad.net/ossn/+bug/1534652), which we could reproduce on older instances. Since we have Liberty, which already has the fix, we could fix this by simply live-migrating all concerned instances.

But when testing this, most instances (the older ones) shut themselves off after live-migration *shriek*.

I found that when I install the new Qemu packages from liberty-proposed on a nova-compute node, then live migrations toward that node no longer lead to these crashes. Yay!

So I'm looking forward to these packages appearing in the regular Ubuntu Cloud Archive.

tags: added: verification-kilo-done
removed: verification-kilo-needed

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu10.4~cloud0
---------------

 qemu (1:2.5+dfsg-5ubuntu10.4~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 qemu (1:2.5+dfsg-5ubuntu10.4) xenial-security; urgency=medium
 .
   * SECURITY REGRESSION: crash on migration with memory stats enabled
     (LP: #1612089)
     - debian/patches/CVE-2016-5403.patch: disable for now pending
       investigation.

James Page (james-page) wrote :

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package qemu - 1:2.3+dfsg-5ubuntu9.4~cloud2
---------------

 qemu (1:2.3+dfsg-5ubuntu9.4~cloud2) trusty-liberty; urgency=medium
 .
   * SECURITY REGRESSION: crash on migration with memory stats enabled
     (LP: #1612089)
     - debian/patches/CVE-2016-5403.patch: disable for now pending
       investigation.

James Page (james-page) wrote :

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package qemu - 1:2.2+dfsg-5expubuntu9.7~cloud7
---------------

 qemu (1:2.2+dfsg-5expubuntu9.7~cloud7) trusty-kilo; urgency=medium
 .
   * SECURITY REGRESSION: crash on migration with memory stats enabled
     (LP: #1612089)
     - debian/patches/CVE-2016-5403.patch: disable for now pending
       investigation.

Michael Roth (mdroth) wrote :

If it is of any help, Stefan Hajnoczi has been working with me to help fix the regressions introduced by the CVE-2016-5403 fix (upstream QEMU commit afd9096, which is in 2.6.1 stable release) in a follow-up 2.6.2 release.

So far the following patches have been identified as being needed in order to correct the behavior introduced with the CVE fix. The upstream QEMU commit IDs are:

commit bccdef6b1a204db0f41ffb6e24ce373e4d7890d4
Author: Stefan Hajnoczi <email address hidden>
Date: Mon Aug 15 13:54:15 2016 +0100

    virtio: recalculate vq->inuse after migration

commit 58a83c61496eeb0d31571a07a51bc1947e3379ac
Author: Stefan Hajnoczi <email address hidden>
Date: Mon Aug 15 13:54:16 2016 +0100

    virtio: decrement vq->inuse in virtqueue_discard()

commit 4b7f91ed0270a371e1933efa21ba600b6da23ab9
Author: Stefan Hajnoczi <email address hidden>
Date: Wed Sep 7 11:51:25 2016 -0400

    virtio: zero vq->inuse in virtio_reset()

commit 104e70cae78bd4afd95d948c6aff188f10508a9c
Author: Ladi Prosek <email address hidden>
Date: Wed Sep 7 17:20:47 2016 +0200

    virtio-balloon: discard virtqueue element on reset

I believe it is the last of these which addresses the issue reported in this bug.

Marc Deslauriers (mdeslaur) wrote :

Thanks Michael for working on this, and listing the commits, that definitely helps!

Simon Déziel (sdeziel) wrote :

On RHEL, the live migration crash was also noticed [1]. Michael S. Tsirkin has identified that backporting those two patches from 2.7 fixed the issue:

 virtio: decrement vq->inuse in virtqueue_discard()
 virtio: recalculate vq->inuse after migration

This confirms the findings of Michael Roth in comment 20, so it would be nice to have them SRU'ed.

1: https://bugzilla.redhat.com/show_bug.cgi?id=1372763

Marc Deslauriers (mdeslaur) wrote :

They will be included with the next round of security updates, possibly next week.

Simon Déziel (sdeziel) wrote :

Thanks for the timeline update Marc.

Marc Deslauriers (mdeslaur) wrote :

Updates for this have now been released:
https://www.ubuntu.com/usn/usn-3125-1/

s10 (vlad-esten) wrote :

With this series of patches fixing CVE-2016-5403 some live migrations crash with error:

9006: error : qemuProcessReportLogError:1813 : internal error: early end of file from monitor, possible problem: 2016-11-15T12:55:51.353085Z qemu-system-x86_64: VQ 2 size 0x80 < last_avail_idx 0x1 - used_idx 0x2
2016-11-15T12:55:51.353122Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
2016-11-15T12:55:51.353265Z qemu-system-x86_64: load of migration failed: Operation not permitted
Inconsistency detected by ld.so: dl-close.c: 762: _dl_close: Assertion `map->l_init_called' failed!

s10 (vlad-esten) wrote :

Version of the affected qemu is 1:2.5+dfsg-5ubuntu10.6.

Marc Deslauriers (mdeslaur) wrote :

s10: please file a new bug so this new regression can be tracked properly. Thanks.

Maik Zumstrull (m-zumstrull) wrote :

Looks like s10 never did file a separate bug for the regression, but we got hit by the same issue, so I filed https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1647389.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.