AllocationCandidates.get_by_filters hits incorrectly when traits are split across the main RP and aggregates

Bug #1724633 reported by Eric Fried on 2017-10-18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)

Bug Description

When requesting multiple resources with multiple traits, placement doesn't know that a particular trait needs to be associated with a particular resource. As currently conceived, it will return allocation candidates from the main RP plus shared RPs such that all traits are satisfied This is bad, particularly when the main RP and shared RPs provide inventory from the same resource class.

For example, consider a compute node that has local SSD storage, which is associated with a shared storage RP with a RAID5 array:

 cnrp { VCPU: 24,
        MEMORY_MB: 2048,
        DISK_GB: 16,
        traits: [HW_CPU_X86_SSE,
                 STORAGE_DISK_SSD] }
 ssrp { DISK_GB: 32,
        traits: [STORAGE_DISK_RAID5] }

A request for SSD + RAID5 disk should *not* return any results from the above setup, because there's not actually any disk with both of those characteristics.

     resources={ VCPU: 1,
                 MEMORY_MB: 512,
                 DISK_GB: 2 },
     traits= [HW_CPU_X86_SSE,


 allocation_requests: []


 allocation_requests: [
     { cnrp: { VCPU: 1,
               MEMORY_MB: 512 }
       ssrp: { DISK_GB: 2 } },

I will post a review shortly with a test case that demonstrates this. Note, however, that the test will spuriously pass until is fixed.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium

Submitter: Zuul
Branch: master

commit d4398f715f098b9edbc0be8612b6c079a8e607af
Author: Eric Fried <email address hidden>
Date: Tue Nov 7 09:30:21 2017 -0600

    Test alloc candidates with same RC in cn & shared

    This change set adds a couple of failing test cases that demonstrate
    holes in the design of GET /allocation_candidates when inventory from
    the same resource class is present on both the compute node (the "main"
    resource provider) and a shared resource provider.

    The example being used is where the compute node has some local disk,
    and is also associated with a shared storage pool. Both the compute
    node RP and the shared storage RP will provide inventory of DISK_GB.

    Test case test_common_rc demonstrates bug #1724613: when I ask for
    DISK_GB in this setup, the shared storage pool is ignored. I expect to
    get two candidates back: one with the storage from the compute node; the
    other with the storage from the shared storage pool. But I actually
    only get the former candidate back.

    Test case test_common_rc_traits_split shows bug #1724633: that placement
    can't tell which traits are supposed to apply to which resources. In
    the above scenario, if the local storage is SSD and the shared storage
    is RAID, and I ask for SSD + RAID, I "expect" to get back no hits. But
    I would in fact get back a candidate with the storage from the shared
    storage pool, because the cumulative set of traits would satisfy my
    requested SSD + RAID.

    Note that the two tests are functionally identical (traits are ignored
    entirely) until lands. At that
    point, depending on how we decide to implement the code that would deal
    with this scenario, the test may fail *differently* until bug #1724613
    is fixed.

    Related-Bug: #1724613
    Related-Bug: #1724633

    Change-Id: I42edf102379cf329aa2252ab779a9f945f5fc155

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers