resource_tracker prevents oversubscription

Bug #1048842 reported by Joe Gordon
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Brian Elliott

Bug Description

The resource_tracker should only trigger retries if there is a race condition (two schedulers) or the nova db is to out of date.

It appears that the resource_tracker doesn't know about memory oversubscription or disk oversubscription.

from devstack when trying to launch a m1.tiny on a 3GB VM: http://pastie.org/4698435

Tags: folsom-rc1
Joe Gordon (jogo)
tags: added: folsom-rc1
Joe Gordon (jogo)
description: updated
Changed in nova:
importance: Undecided → Critical
assignee: nobody → Alex Meade (alex-meade)
status: New → Triaged
milestone: none → folsom-rc1
Alex Meade (alex-meade)
Changed in nova:
assignee: Alex Meade (alex-meade) → nobody
Brian Elliott (belliott)
Changed in nova:
assignee: nobody → Brian Elliott (belliott)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12990

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12990
Committed: http://github.com/openstack/nova/commit/63fdfcbdcd8a0dd8422ff83f8ee2b9603a4a6c94
Submitter: Jenkins
Branch: master

commit 63fdfcbdcd8a0dd8422ff83f8ee2b9603a4a6c94
Author: Brian Elliott <email address hidden>
Date: Thu Sep 13 21:50:14 2012 +0000

    Correct typo in memory_mb_limit filter property

    Fixes bug blocking memory oversubscription in builds

    bug 1048842

    Change-Id: I932d0a7248f231127965331886664bfd9092dad0

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Joe Gordon (jogo) wrote :

What about disk oversubscription? or CPU?

Changed in nova:
status: Fix Committed → In Progress
Revision history for this message
Brian Elliott (belliott) wrote :

Joe, right now you can oversubscribe disk by creating a sparse image on whatever kind of virt layer you use. On XenServer at least, they call this thin provisioning. You can create a 100GB disk but the space is not pre-allocate until it is actually needed...

So there is no parameter exposing a disk oversubscription ratio in Nova because it's assuming for the moment that you handle this by configuring your disk appropriately at the hypervisor layer if you want oversubscription. The resource tracker only looks at the actual disk usage right now, so it interacts fine with this.

Also for CPUs, there is currently not any real resource tracking being done.

My preference is to not introduce new code for either disk or cpu oversubscription at this stage of Folsom and risk breakage, but I am looking at refactoring some of this for Grizzly.

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Joe Gordon (jogo) wrote :

Brian,

I created an m1.small in a Devstack VM with 4GB space, and the resource_tracker attempts to claim:

Attempting claim: memory 2048 MB, disk 20 GB, mem limit 3748.5

And when I disable the GB checkin resource_tracker the VM starts up and:

$ du -sh /home/vagrant/_base/
767M /home/vagrant/_base/

So the resource_tracker is not looking at actual disk usage, and is blocking disk oversubscription.

Changed in nova:
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/13182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/13336

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/13182
Committed: http://github.com/openstack/nova/commit/8e851409f3a8a345ec954a880c81232fbf9e27b4
Submitter: Jenkins
Branch: master

commit 8e851409f3a8a345ec954a880c81232fbf9e27b4
Author: Brian Elliott <email address hidden>
Date: Fri Sep 14 15:17:07 2012 +0000

    Fix bugs in resource tracker and cleanup

    Fixes bugs in resource tracker:
    * Handle disk oversubscription
    * Handle suspended/powered off instances

    The usage model is changed to the old style that is
    based on actual instance usage on a compute host.
    (Not the current point in time of the hypervisor's
     reported host stats)

    There is now a 'limits' filter property that can be passed from
    the scheduler to the compute node to indicate that
    oversubscription of resources is desired:

    The 'limits' filter property is a dict with the following possible
    keys:

    * memory_mb - Specifies the memory ceiling for the compute node.
    * disk_gb - Specifies the disk space ceiling for the compute node.
    * vcpu - Specifies the max number of vcpus for the compute node.

    There is also some general cleanup and additional unit tests in
    an attempt to simplify down this function.

    bug 1048842
    bug 1052157

    Change-Id: I6ee851b8c03234a78a64d9f5c494dfc7059cdda4

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (milestone-proposed)

Reviewed: https://review.openstack.org/13336
Committed: http://github.com/openstack/nova/commit/9d8fce85b10dc6436754040769c779b35453f4cb
Submitter: Jenkins
Branch: milestone-proposed

commit 9d8fce85b10dc6436754040769c779b35453f4cb
Author: Brian Elliott <email address hidden>
Date: Fri Sep 14 15:17:07 2012 +0000

    Fix bugs in resource tracker and cleanup

    Fixes bugs in resource tracker:
    * Handle disk oversubscription
    * Handle suspended/powered off instances

    The usage model is changed to the old style that is
    based on actual instance usage on a compute host.
    (Not the current point in time of the hypervisor's
     reported host stats)

    There is now a 'limits' filter property that can be passed from
    the scheduler to the compute node to indicate that
    oversubscription of resources is desired:

    The 'limits' filter property is a dict with the following possible
    keys:

    * memory_mb - Specifies the memory ceiling for the compute node.
    * disk_gb - Specifies the disk space ceiling for the compute node.
    * vcpu - Specifies the max number of vcpus for the compute node.

    There is also some general cleanup and additional unit tests in
    an attempt to simplify down this function.

    bug 1048842
    bug 1052157

    Change-Id: I6ee851b8c03234a78a64d9f5c494dfc7059cdda4
    (cherry picked from commit 8e851409f3a8a345ec954a880c81232fbf9e27b4)

Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.