Pods Cannot allocate memory, even with memory_over_commit_ratio set 10.0

Bug #1831134 reported by John George
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
Low
Unassigned
maas-ui
Triaged
Low
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Please see comment #11 for what needs to be done

--- initial description below ---

Expected to compose a pod VM with 4GB of memory on a host with 6GB of total physical memory when memory_over_commit_ratio is set to 10.

ubuntu@infra1:~$ maas root pod compose 10 hostname=juju-1 cores=1 memory=4096 storage=20
Unable to compose machine because: Failed talking to pod: Unable to compose juju-1: error: Failed to start domain juju-1
error: internal error: qemu unexpectedly closed the monitor: 2019-05-30T17:08:05.182033Z qemu-system- x86_64: -drive file=/var/lib/virt/images/13f6f8e3-681c-4de0-9981-a2a3856d56cc,format=raw,if=none,id=d rive-virtio-disk0,serial=13f6f8e3-681c-4de0-9981-a2a3856d56cc: 'serial' is deprecated, please use the corresponding option of '-device' instead
2019-05-30T17:08:05.183561Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUI D.80000001H:ECX.svm [bit 2]
2019-05-30T17:08:05.186659Z qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

ubuntu@infra1:~$ lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04

ii maas-cli 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS client and command-line interface
ii maas-common 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS server common files
ii maas-dhcp 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS DHCP server
ii maas-proxy 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all Rack Controller for MAAS
ii maas-region-api 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all Region Controller for MAAS
ii python3-django-maas 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.5.3-7533-g65952b418-0ubuntu1~18.04.1 all MAAS server provisioning libraries (Python 3)

Tags: cdo-qa docs
Revision history for this message
John George (jog) wrote :
Revision history for this message
John George (jog) wrote :
Revision history for this message
John George (jog) wrote :
description: updated
Revision history for this message
Andres Rodriguez (andreserl) wrote :

The error message below comes from QEMU

2019-05-30T17:08:05.186659Z qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

AS such, what I see here is that MAAS is allowing you to create the VM, but qemu is not... so I think you should investigate why qemu doesn;t allow you to.

Changed in maas:
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi John,
your system has (at the time of the logs) ~2.5G free and ~1.5G that it could free up.
There is no swap set up.
That is very likely to fail to be spawned and qemu is just telling you so.
There is nothing that qemu does "wrong" if it just can't get as much memory as it needs.

I have read quickly into [1] where this maas tunable is defined.
And setting 10 only allows you to try to define guests up to such size. But you need sooner or later some space those can live in. Overcommit is only good until used and sizing of a machine should always have at least the required amount as swap or - even if you are lucky and can start things - you might later on crash hard on an OOM which is even worse.

@Maas team - do you ensure that (memory*ratio)+baseConsumption (e.g. 15%) is <= memory+swap of the system (as an upper limit)? If not such issues as seen here will happen and I don't yet see any component doing wrong.

[1]: https://docs.maas.io/2.4/en/nodes-comp-hw#configuration

Changed in maas:
status: Incomplete → New
Changed in qemu (Ubuntu):
status: New → Invalid
Revision history for this message
Blake Rouse (blake-rouse) wrote :

No we do not ensure that. We leave that up to the user to define a proper over-commit ratio for their systems.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Good to know Blake, thanks.

IMHO you should consider this a low prio, but very nice to have item.
Without it seems like calling for trouble later on.
Cases are:
a) Like John here unable to start guests they expected to be able to
b) (more critical) people that also changed vm.overcommit_memory to allow the initial alloc breaking hard later on.

In a perfect world I'd envision a slider like this:

Overcommit ratio:
1.0 * 10
|----------|----------------|

* The recommended upper limit at (PhysMem-15%)+Swap
That way it would still be under the full control of the user, but this bug here would not grow into a common issue.

Until such code that adds something like an "upper recommended bound" is written you could just make things more clear with a comment at the slider maybe?

Therefore I'd be happy if you would consider this as a feature request.

@John - from your POV are you fine for now with the explanations or is there something else that was missed but is needed for you?

Revision history for this message
Lee Trager (ltrager) wrote :

It appears your system is over committed to much. qemu needs a minimal amount of resources to be able to start the VM. Leaving this open as our docs and the UI should explain this better.

tags: added: docs ui
Changed in maas:
status: New → Triaged
importance: Undecided → Low
Changed in maas-ui:
importance: Undecided → Unknown
Revision history for this message
Adam Collard (adam-collard) wrote :

Tracking a doc task and a UI task here, nothing to do for MAAS core itself

Changed in maas:
status: Triaged → Won't Fix
Changed in maas-ui:
importance: Unknown → Undecided
Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

This could be done similar to the helptext in commissioning -> minimum kernel. To implement the suggested solution the frontend would need to have info about the physical mem and the swap of the VM host.

Changed in maas-ui:
milestone: none → 3.4.x
status: New → Triaged
no longer affects: maas-ui/3.4
Revision history for this message
Thorsten Merten (thorsten-merten) wrote :

To recap:

The suggested solution of this is to display a warning if somebody sets the overcommit ratio so that is would exceed (PhysMem-15%)+Swap .
This makes sense because you cannot deploy more machines than you have memory and swap. It should still be under the control of the user as they might be sure what they are doing and are only starting some of those machines at the same time.

Changed in maas-ui:
importance: Undecided → Low
milestone: 3.4.x → 3.5.0
description: updated
tags: removed: ui
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.