qemu core dumps when unable to allocate ram for new virtual machine

Bug #1650067 reported by Dave Chiluk
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Critical
Unassigned
Kilo
Fix Released
Critical
Unassigned
Liberty
Fix Released
Critical
Unassigned
Mitaka
Fix Released
Critical
Unassigned

Bug Description

Qemu available in the kilo-staging cloud archive will cause an abort which will generate a core dump, if there is not enough memory available to satisfy creation of the new Virtual Machine.

This becomes more critical if this happens over and over in highly resource constrained environments, as the core dumps start filling up the disk.

This is resolved with upstream commit f8ed85ac992c48814d916d5df4d44f9a971c5de4

I'm opening this case, while I decide if it's worthwhile to re-spin qemu to fix this issue.

Revision history for this message
Dave Chiluk (chiluk) wrote :
Dave Chiluk (chiluk)
description: updated
Revision history for this message
Billy Olsen (billy-olsen) wrote :

The commit in question was included in 2.5 Qemu stream, so I've nominated for Mitaka (to mark fix released), Liberty, Kilo. Still need to determine if this is an issue in the Trusty packages. If so, then we need to raise an Ubuntu task to get it fixed there prior to be included in the precise-icehouse cloud-archive.

Changed in cloud-archive:
importance: Undecided → Critical
Revision history for this message
Dave Chiluk (chiluk) wrote :

Fix is already included in 2.5 and newer so xenial is good.

Revision history for this message
Dave Chiluk (chiluk) wrote :

This issue appears to have been introduced with 49946538d and ef701d7b which were introduced in v2.2.0-rc5. From code inspection it doesn't appear as if earlier versions would be affected.

Changed in cloud-archive:
status: New → Fix Released
Revision history for this message
Dave Chiluk (chiluk) wrote :

a29a37b994ca3c5a1d39fa0e8934f7e0f2cf57ef is also necessary for the fix.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Here's my proposed solution for trusty-kilo. This ended up being more involved than I originally expected.

I started with attmepting to do a clean cherry-pick, but the list of patches quickly grew unmanageable.

f8ed85ac99
a29a37b994
1e9b65bb1b
edf6f3b335
4463dcb85c

These diffs together totalled roughly 2k lines of diffs, and affected large areas of the code base. I eventually settled on just f8ed85ac99 and backported the spirit of a29a37b994 (it doesn't even close to apply, but it's primary function is to enable logic for using error_fatal).

Now hopefully this applies to the liberty archives as well.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Here's the proposed debdiff for liberty, very similar, but there are some minor differences specifically to error.c.

Revision history for this message
Dave Chiluk (chiluk) wrote :

I have already tested the kilo version of this patch. I basically verified that booting a machine with too much allocated ram will not cause a core dump.

I will work through getting liberty tested shortly.

Revision history for this message
Dave Chiluk (chiluk) wrote :

liberty tested. Looks to work fine.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Updated kilo patch to include changes to error_set per review from coreycb.

Revision history for this message
Dave Chiluk (chiluk) wrote :
Revision history for this message
Dave Chiluk (chiluk) wrote :

To be explicit as to what I tested. I launched a machine with the corresponding cloud archive. Launched a VM, made sure it functioned. Launched a VM with a massive memory over-commit, made sure it core dumped. Then upgraded to my proposed package. Made sure I could launch a vm. Then made sure launching a VM with over-commit of memory would not create a core.

I did notice that liberty required significantly more memory overcommit by default before qemu would fail. i.e. I was able to launch a 20G vm on a 16G host using qemu out of liberty. I ended up raising the VM to 48G to force the failure *(intermediate values were not tested). It might be beneficial to figure out what allows this additional overcommit in liberty/qemu 2.3+.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks Dave. These have been uploaded to the staging PPAs:
https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/kilo-staging
https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/liberty-staging

Once they build successfully we can get these promoted to -proposed pockets for testing.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Verified liberty staging. Was able to boot a VM with 20G allocated to it on a 16GB machine. Attemppting to boot with 22GB yielded error, but no core dump. I'm suspecting this behavior is due to the default 1.5 memory over-commit capabilities.

Will do kilo shortly.

tags: added: verification-done-liberty
Revision history for this message
Dave Chiluk (chiluk) wrote :

Verified kilo staging as well. Was able to boot with 20G allocated. Booting with 21G correctly yielded error message smae as liberty.

ubuntu@nuc3:/etc/apt/sources.list.d$ virsh start demo
error: Failed to start domain demo
error: internal error: process exited while connecting to monitor: 2017-02-08T17:47:50.013119Z qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

No Core dump was created. So everything looks good.

tags: added: verification-done verification-done-kilo
Revision history for this message
Ryan Beisner (1chb1n) wrote : Please test proposed package

Hello Dave, or anyone else affected,

Accepted qemu into liberty-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:liberty-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-liberty-needed to verification-liberty-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-liberty-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-liberty-needed
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Hello Dave, or anyone else affected,

Accepted qemu into kilo-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:kilo-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-kilo-needed to verification-kilo-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-kilo-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-kilo-needed
Revision history for this message
Dave Chiluk (chiluk) wrote :

Marking verification done as I already tested while in the cloud-archive staging ppa. See above for testing done.

tags: added: verification-kilo-done verification-liberty-done
removed: verification-done-kilo verification-done-liberty verification-kilo-needed verification-liberty-needed
Revision history for this message
James Page (james-page) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package qemu - 1:2.2+dfsg-5expubuntu9.7~cloud8
---------------

 qemu (1:2.2+dfsg-5expubuntu9.7~cloud8) trusty-kilo; urgency=medium
 .
   * Fix core dump when unable to allocate ram for new virtual
     machine. Exits with error and log message instead. (LP: #1650067)

Revision history for this message
Dave Chiluk (chiluk) wrote :

@coreycb @james-page

Can one of you please promote 1:2.3+dfsg-5ubuntu9.4~cloud3 to cloud-archive main? It looks like 1:2.3+dfsg-5ubuntu9.4~cloud2 is still there.

Thanks.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Sorry to be more explicit I mean the liberty cloud-archve main.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Sorry that got promoted incorrectly. 1:2.3+dfsg-5ubuntu9.4~cloud3 was supposed to get promoted to updates and 1:2.3+dfsg-5ubuntu9.4~cloud4 to proposed. Anyway now it's blocked until cloud4 is verified. I think cpaelzer is verifying that version.

Ryan Beisner (1chb1n)
tags: added: s390x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.