OpenStack Compute (nova)

AllocationCandidates.get_by_filters ignores shared RPs when the RC exists in both places

Bug #1724613 reported by Eric Fried on 2017-10-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Tetsuro Nakamura

Bug Description

When both the compute node resource provider and the shared resource provider have inventory in the same resource class, AllocationCandidates.get_by_filters will not return an AllocationRequest including the shared resource provider.

Example:

cnrp { VCPU: 24,
MEMORY_MB: 2048,
DISK_GB: 16 }
ssrp { DISK_GB: 32 }

AllocationCandidates.get_by_filters(
     resources={ VCPU: 1,
                 MEMORY_MB: 512,
                 DISK_GB: 2 } )

Expected:

allocation_requests: [
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512,
               DISK_GB: 2 } },
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512 }
       ssrp: { DISK_GB: 2 } },
]

Actual:

allocation_requests: [
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512,
               DISK_GB: 2 } }
]

I will post a review shortly that demonstrates this.

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-18: Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/513149

Sylvain Bauza (sylvain-bauza) on 2017-10-27

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Jay Pipes (jaypipes) wrote on 2017-11-09:

This is by design. Non-sharing providers that have all the resources needed in the request are used as-is and there is no attempt to create permutations of *some* the non-sharing provider's resources with those of a sharing provider.

If you had, though, a second resource provider that only had VCPU and MEMORY_MB but no disk, and associated that second provider to the shared storage provider via aggregate, you would see two allocation requests, one with all resources coming from the first compute node resource provider and the other with VCPU and MEMORY_MB from the second compute node resource provider and DISK_GB from the shared storage provider.

Changed in nova:
status:	Confirmed → Invalid

Revision history for this message

Eric Fried (efried) wrote on 2017-11-09:

So if

a) the non-sharing RP's inventory in the common RC is exhausted or otherwise unsuitable for the request;

and/or

b) the sharing RP has a required trait that the non-sharing RP doesn't have

...then we would get a (single) candidate that gets the common resource from the sharing RP?

Revision history for this message

Eric Fried (efried) wrote on 2017-11-09:

Per hangout, we decided this bug is valid - that we would like to get extra candidates involving shared RPs when those satisfy the request.

Changed in nova:
status:	Invalid → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-13: Related fix merged to nova (master)

Reviewed: https://review.openstack.org/513149
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4398f715f098b9edbc0be8612b6c079a8e607af
Submitter: Zuul
Branch: master

commit d4398f715f098b9edbc0be8612b6c079a8e607af
Author: Eric Fried <email address hidden>
Date: Tue Nov 7 09:30:21 2017 -0600

Test alloc candidates with same RC in cn & shared

    This change set adds a couple of failing test cases that demonstrate
    holes in the design of GET /allocation_candidates when inventory from
    the same resource class is present on both the compute node (the "main"
    resource provider) and a shared resource provider.

    The example being used is where the compute node has some local disk,
    and is also associated with a shared storage pool. Both the compute
    node RP and the shared storage RP will provide inventory of DISK_GB.

    Test case test_common_rc demonstrates bug #1724613: when I ask for
    DISK_GB in this setup, the shared storage pool is ignored. I expect to
    get two candidates back: one with the storage from the compute node; the
    other with the storage from the shared storage pool. But I actually
    only get the former candidate back.

    Test case test_common_rc_traits_split shows bug #1724633: that placement
    can't tell which traits are supposed to apply to which resources. In
    the above scenario, if the local storage is SSD and the shared storage
    is RAID, and I ask for SSD + RAID, I "expect" to get back no hits. But
    I would in fact get back a candidate with the storage from the shared
    storage pool, because the cumulative set of traits would satisfy my
    requested SSD + RAID.

    Note that the two tests are functionally identical (traits are ignored
    entirely) until https://review.openstack.org/#/c/479766/ lands. At that
    point, depending on how we decide to implement the code that would deal
    with this scenario, the test may fail *differently* until bug #1724613
    is fixed.

Related-Bug: #1724613
Related-Bug: #1724633

Change-Id: I42edf102379cf329aa2252ab779a9f945f5fc155

Reviewed:  https://review.openstack.org/513149
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4398f715f098b9edbc0be8612b6c079a8e607af
Submitter: Zuul
Branch:    master

commit d4398f715f098b9edbc0be8612b6c079a8e607af
Author: Eric Fried <efried@us.ibm.com>
Date:   Tue Nov 7 09:30:21 2017 -0600

Test alloc candidates with same RC in cn & shared
    
    This change set adds a couple of failing test cases that demonstrate
    holes in the design of GET /allocation_candidates when inventory from
    the same resource class is present on both the compute node (the "main"
    resource provider) and a shared resource provider.
    
    The example being used is where the compute node has some local disk,
    and is also associated with a shared storage pool.  Both the compute
    node RP and the shared storage RP will provide inventory of DISK_GB.
    
    Test case test_common_rc demonstrates bug #1724613: when I ask for
    DISK_GB in this setup, the shared storage pool is ignored.  I expect to
    get two candidates back: one with the storage from the compute node; the
    other with the storage from the shared storage pool.  But I actually
    only get the former candidate back.
    
    Test case test_common_rc_traits_split shows bug #1724633: that placement
    can't tell which traits are supposed to apply to which resources.  In
    the above scenario, if the local storage is SSD and the shared storage
    is RAID, and I ask for SSD + RAID, I "expect" to get back no hits.  But
    I would in fact get back a candidate with the storage from the shared
    storage pool, because the cumulative set of traits would satisfy my
    requested SSD + RAID.
    
    Note that the two tests are functionally identical (traits are ignored
    entirely) until https://review.openstack.org/#/c/479766/ lands.  At that
    point, depending on how we decide to implement the code that would deal
    with this scenario, the test may fail *differently* until bug #1724613
    is fixed.
    
    Related-Bug: #1724613
    Related-Bug: #1724633
    
    Change-Id: I42edf102379cf329aa2252ab779a9f945f5fc155

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-14: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/533396

Changed in nova:
assignee:	nobody → Tetsuro Nakamura (tetsuro0907)
status:	Confirmed → In Progress

OpenStack Infra (hudson-openstack) on 2018-03-14

Changed in nova:
assignee:	Tetsuro Nakamura (tetsuro0907) → Chris Dent (cdent)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-15:

Fix proposed to branch: master
Review: https://review.openstack.org/553122

Changed in nova:
assignee:	Chris Dent (cdent) → Tetsuro Nakamura (tetsuro0907)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-30: Fix merged to nova (master)

Reviewed: https://review.openstack.org/553122
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ecb09b29b743808a97751383dc6c7a0eefae2aa9
Submitter: Zuul
Branch: master

commit ecb09b29b743808a97751383dc6c7a0eefae2aa9
Author: Tetsuro Nakamura <email address hidden>
Date: Mon Mar 12 08:38:24 2018 +0900

remove unnecessary short cut in placement

    When both the compute node resource provider and the shared
    resource provider have inventory in the same resource class,
    AllocationCandidates.get_by_filters didn't return an
    AllocationRequest including the shared resource provider.

    To fix the bug, we at first remove the shortcut in the logic to
    consider sharing providers even if the non-sharing provider can
    satisfy the resource request for itself.

    Change-Id: Ibd509c5a59407da1db46c6c12b82f8707f655466
    Partial-Bug: #1724613
    Related-Bug: #1731072

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-31:

Reviewed: https://review.openstack.org/533396
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=02e357e7c2d2ccbab0a1f6b5a807d11f1ef72d46
Submitter: Zuul
Branch: master

commit 02e357e7c2d2ccbab0a1f6b5a807d11f1ef72d46
Author: Tetsuro Nakamura <email address hidden>
Date: Sun Jan 14 17:09:05 2018 +0900

Fix allocation_candidates not to ignore shared RPs

    To fix the bug, this patch changes the function of
    _alloc_candidates_with_shared() to consider resources from
    non-sharing providers and resources from sharing providers
    at the same time.

    Change-Id: Iaf23f35f2f9a5d27a814ef5b94abed1a8b365bc3
    Closes-Bug: #1724613
    Related-Bug: #1731072