[2.3] Unhandled error when invalid storage pod resources, doesn't surface to the UI

Bug #1772099 reported by Michael Iatrou on 2018-05-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Mike Pontillo
2.3
Undecided
Unassigned
2.4
Undecided
Unassigned

Bug Description

This is a regression issue for:
maas 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-cli 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-common 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-dhcp 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-dns 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-proxy 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-rack-controller 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-region-api 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
maas-region-controller 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
python3-django-maas 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
python3-maas-client 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1
python3-maas-provisioningserver 2.3.3-6498-ge4db91d-0ubuntu1~16.04.1

Composing a new VM on a KVM pod does not allow to overcommit local storage.
The action fails silently on the UI, and in the logs it reports:

==> /var/log/maas/regiond.log <==
2018-05-18 14:10:35 twisted.internet.defer: [critical] Unhandled error in Deferred:
2018-05-18 14:10:35 twisted.internet.defer: [critical]

Traceback (most recent call last):
Failure: provisioningserver.rpc.exceptions.PodInvalidResources: not enough space for disk 1.

description: updated
Andres Rodriguez (andreserl) wrote :

Hi Michael,

I /believe/ that MAAS wasn't meant to overcommit storage (but it would do for CPU/RAM). As such, the fact that it is no longer possible is an actual bugfix due to issues in the field. That said, could you please let me know:

1. From which version did you upgrade?
2. What was the behavior before you upgraded?

Changed in maas:
status: New → Incomplete
milestone: none → 2.4.1
summary: - Cannot overcommit storage for KVM pods, silent failure
+ [2.3] Unhandled error when invalid storage pod resources
summary: - [2.3] Unhandled error when invalid storage pod resources
+ [2.3] Unhandled error when invalid storage pod resources, doesn't
+ surface to the UI
Michael Iatrou (michael.iatrou) wrote :

1. Upgraded from 2.3.2.

2. Although on the pod page the "Local storage" information would show "246.8 GiB used -30.4 GiB available", I was able to compose more machines, since the volumes are thin provisioned.
df shows that there is plenty of space available:
/dev/sdb2 217G 133G 83G 62% /

Being able to overcommit storage is super-useful!
I understand that if it's not intentional it can cause issues.
I propose to make it configurable, perhaps on a per pod basis.

Michael Iatrou (michael.iatrou) wrote :

Fiddling with the capacity calculation gives me back the previous behavior, but it would be awesome if overcommit for storage was configurable.

https://github.com/maas/maas/blame/2.3/src/provisioningserver/drivers/pod/virsh.py

Changed in maas:
milestone: 2.4.1 → 2.4.2
Changed in maas:
milestone: 2.4.2 → 2.4.3
Changed in maas:
assignee: nobody → Mike Pontillo (mpontillo)
milestone: 2.4.3 → 2.5.0
milestone: 2.5.0 → 2.5.0beta1
Mike Pontillo (mpontillo) wrote :

I'm not entirely clear on what the issue is here. In MAAS 2.5 I see the following if I try to compose a machine with more storage than is available on disk:

    Pod unable to compose machine: Not enough space in default storage pool: maas

... and it's surfaced in the UI.

Beyond the feedback that it would be nice to allow storage to be overcommitted, is there behavior here that still needs fixing?

Changed in maas:
status: Incomplete → Invalid
Mike Pontillo (mpontillo) wrote :

Another thought on this: I wonder if the calculations are thrown off if MAAS is using a hypervisor as a pod that has a large number of shared copy-on-write images?

Changed in maas:
milestone: 2.5.0beta1 → 2.5.0beta2
Changed in maas:
milestone: 2.5.0beta2 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers