AllocationCandidates.get_by_filters ignores shared RPs when the RC exists in both places

Bug #1724613 reported by Eric Fried
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Tetsuro Nakamura

Bug Description

When both the compute node resource provider and the shared resource provider have inventory in the same resource class, AllocationCandidates.get_by_filters will not return an AllocationRequest including the shared resource provider.

Example:

 cnrp { VCPU: 24,
        MEMORY_MB: 2048,
        DISK_GB: 16 }
 ssrp { DISK_GB: 32 }

 AllocationCandidates.get_by_filters(
     resources={ VCPU: 1,
                 MEMORY_MB: 512,
                 DISK_GB: 2 } )

Expected:

 allocation_requests: [
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512,
               DISK_GB: 2 } },
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512 }
       ssrp: { DISK_GB: 2 } },
 ]

Actual:

 allocation_requests: [
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512,
               DISK_GB: 2 } }
 ]

I will post a review shortly that demonstrates this.

Tags: placement
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/513149

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Jay Pipes (jaypipes) wrote :

This is by design. Non-sharing providers that have all the resources needed in the request are used as-is and there is no attempt to create permutations of *some* the non-sharing provider's resources with those of a sharing provider.

If you had, though, a second resource provider that only had VCPU and MEMORY_MB but no disk, and associated that second provider to the shared storage provider via aggregate, you would see two allocation requests, one with all resources coming from the first compute node resource provider and the other with VCPU and MEMORY_MB from the second compute node resource provider and DISK_GB from the shared storage provider.

Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Eric Fried (efried) wrote :

So if

a) the non-sharing RP's inventory in the common RC is exhausted or otherwise unsuitable for the request;

and/or

b) the sharing RP has a required trait that the non-sharing RP doesn't have

...then we would get a (single) candidate that gets the common resource from the sharing RP?

Revision history for this message
Eric Fried (efried) wrote :

Per hangout, we decided this bug is valid - that we would like to get extra candidates involving shared RPs when those satisfy the request.

Changed in nova:
status: Invalid → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/513149
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4398f715f098b9edbc0be8612b6c079a8e607af
Submitter: Zuul
Branch: master

commit d4398f715f098b9edbc0be8612b6c079a8e607af
Author: Eric Fried <email address hidden>
Date: Tue Nov 7 09:30:21 2017 -0600

    Test alloc candidates with same RC in cn & shared

    This change set adds a couple of failing test cases that demonstrate
    holes in the design of GET /allocation_candidates when inventory from
    the same resource class is present on both the compute node (the "main"
    resource provider) and a shared resource provider.

    The example being used is where the compute node has some local disk,
    and is also associated with a shared storage pool. Both the compute
    node RP and the shared storage RP will provide inventory of DISK_GB.

    Test case test_common_rc demonstrates bug #1724613: when I ask for
    DISK_GB in this setup, the shared storage pool is ignored. I expect to
    get two candidates back: one with the storage from the compute node; the
    other with the storage from the shared storage pool. But I actually
    only get the former candidate back.

    Test case test_common_rc_traits_split shows bug #1724633: that placement
    can't tell which traits are supposed to apply to which resources. In
    the above scenario, if the local storage is SSD and the shared storage
    is RAID, and I ask for SSD + RAID, I "expect" to get back no hits. But
    I would in fact get back a candidate with the storage from the shared
    storage pool, because the cumulative set of traits would satisfy my
    requested SSD + RAID.

    Note that the two tests are functionally identical (traits are ignored
    entirely) until https://review.openstack.org/#/c/479766/ lands. At that
    point, depending on how we decide to implement the code that would deal
    with this scenario, the test may fail *differently* until bug #1724613
    is fixed.

    Related-Bug: #1724613
    Related-Bug: #1724633

    Change-Id: I42edf102379cf329aa2252ab779a9f945f5fc155

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/533396

Changed in nova:
assignee: nobody → Tetsuro Nakamura (tetsuro0907)
status: Confirmed → In Progress
Changed in nova:
assignee: Tetsuro Nakamura (tetsuro0907) → Chris Dent (cdent)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/553122

Changed in nova:
assignee: Chris Dent (cdent) → Tetsuro Nakamura (tetsuro0907)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/553122
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ecb09b29b743808a97751383dc6c7a0eefae2aa9
Submitter: Zuul
Branch: master

commit ecb09b29b743808a97751383dc6c7a0eefae2aa9
Author: Tetsuro Nakamura <email address hidden>
Date: Mon Mar 12 08:38:24 2018 +0900

    remove unnecessary short cut in placement

    When both the compute node resource provider and the shared
    resource provider have inventory in the same resource class,
    AllocationCandidates.get_by_filters didn't return an
    AllocationRequest including the shared resource provider.

    To fix the bug, we at first remove the shortcut in the logic to
    consider sharing providers even if the non-sharing provider can
    satisfy the resource request for itself.

    Change-Id: Ibd509c5a59407da1db46c6c12b82f8707f655466
    Partial-Bug: #1724613
    Related-Bug: #1731072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/533396
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=02e357e7c2d2ccbab0a1f6b5a807d11f1ef72d46
Submitter: Zuul
Branch: master

commit 02e357e7c2d2ccbab0a1f6b5a807d11f1ef72d46
Author: Tetsuro Nakamura <email address hidden>
Date: Sun Jan 14 17:09:05 2018 +0900

    Fix allocation_candidates not to ignore shared RPs

    When both the compute node resource provider and the shared
    resource provider have inventory in the same resource class,
    AllocationCandidates.get_by_filters didn't return an
    AllocationRequest including the shared resource provider.

    To fix the bug, this patch changes the function of
    _alloc_candidates_with_shared() to consider resources from
    non-sharing providers and resources from sharing providers
    at the same time.

    Change-Id: Iaf23f35f2f9a5d27a814ef5b94abed1a8b365bc3
    Closes-Bug: #1724613
    Related-Bug: #1731072

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.