Host manager uses a different value for free disk than compute manager

Bug #1253599 reported by Phil Day
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
moorryan

Bug Description

There are two metrics in the system which describe how much disc space is
available on a compute host (both stored in compute_nodes):

free_gb is calculated from the maximum available space in the filesystem
minus the amount of disc space defined by the instance type of each instance
on the host.

disk_available_least is calculated from the actual free space in the filesystem minus the disk space that is commited but not yet used by all instances that the hypervisor knows about (so if an instance has a 10GB disc, and is currently using 2GB an additional 8GB will be taken away from the actual free space.

Under normal conditions disk_available_least should therefore always be less than free_gb (since it takes into account space in the filesystem that is consumed by things other than disks).

However where an instance exists in the DB but not on the host, which can happen for some Error conditions, then free_gb may be less that disk_available_least (since the instance which is only in the DB is not factored into disk_available_least)

Currently the scheduler (host manager) builds its view of the amount of free disk space from disk_least_available (if defined) using free_disk_gb only as a fallback if disk_least_available is None.

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L158

The compute manager resource tracker on the other hand always uses free_diks_gb when deciding if an instance fits or not.

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L387

In the case where disk_least_available > free_disk_gb this leads to the scheduler sending requests to hosts which will then be rejected.

Clearly using two different metrics in this way is not healthy.

At a minimum the scheduler should use the minimum of the two values (since the "missing" VM may come back its not safe to just ignore it).

Would probably be better if the compute manager also did the same thing.

moorryan (moorryan)
Changed in nova:
assignee: nobody → moorryan (moorryan)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/57708

Phil Day (philip-day)
Changed in nova:
importance: Undecided → Medium
Phil Day (philip-day)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/57708
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=56688707138d2343a65e25f8c27586ffd44875cf
Submitter: Jenkins
Branch: master

commit 56688707138d2343a65e25f8c27586ffd44875cf
Author: Ryan Moore <email address hidden>
Date: Thu Nov 21 15:38:56 2013 +0000

    Correct host managers free disk calculation

    Take lower of disk_least_available/free_disk_gb if they exist.

    When an instance exists in the database but not on the hypervisor
    (which can happen in some error cases), the 'missing' instance
    is included in the calculation of 'free_gb' but not included in
    'disk_available_least'.
    Need to take the value that represents the minimum amount of free
    disk space available.

    Testing:
    COMPUTE_NODES in fakes.py includes the 4 possible test cases:
        disk_least_available or free_disk_gb = free_disk_mb
            None or 512 = 524288
            1024 or None = 1048576
            3333 or 3072 = 3145728
            8192 or 8888 = 8388608
       test: test_host_manager.py
               test_get_all_host_states:
                 Already has existing tests looking for these
                 cases - I've modifed the test data (in fakes.py)
                 rather than creating new test data.
                 Added check for Warning message for node 3 where
                 physical disk (3333) is greater than database (3072)
             No impact on other tests.

    Change-Id: I4a18bf023a64d0cb198f77aab9daecb0786e93ff
    Closes-Bug: 1253599

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-3
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.