AllocationCandidates.get_by_filters returns garbage with only sharing providers

Bug #1730730 reported by Eric Fried
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Balazs Gibizer

Bug Description

If my placement database is set up with only sharing providers (no "compute nodes"), the results are broken.

Steps to reproduce
==================
Here's one example:

SS1 has inventory in IPV4_ADDRESS, SRIOV_NET_VF, and DISK_GB.
SS2 has inventory in just DISK_GB.

Both are associated with the same aggregate; both have the MISC_SHARES_VIA_AGGREGATE trait.

I make a request for resources in all three classes (in amounts that can be satisfied by those inventories).

Expected result
===============
It is unclear what the expected result is. There is a school of thought that we are only dealing with compute hosts right now, so we should never get back a candidate that doesn't include a compute host. In that case, this scenario should yield *zero* candidates.

On the other hand, in the long-term vision of placement, there should be no reason not to support scenarios where allocations are made *only* against sharing providers (as long as they're in the same aggregate for a given candidate). In that case, this scenario should yield two candidates:

One that gets all its resources from SS1;
One that gets DISK_GB from SS2, and IPV4_ADDRESS and SRIOV_NET_VF from SS1.

Actual result
=============
The actual result is three candidates:

One that gets all its resources from SS1 (cool);
One that gets DISK_GB from SS2 and IPV4_ADDRESS from SS1 (not cool - SRIOV_NET_VF isn't in here!)
One that gets DISK_GB from SS2 and SRIOV_NET_VF from SS1 (not cool - IPV4_ADDRESS isn't in here!)

I will post a functional test to demonstrate this.

Tags: placement
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/518382

Eric Fried (efried)
tags: added: placement
Revision history for this message
Eric Fried (efried) wrote :

Update: after discussing with Jay, we decided there's no reason to restrict candidates coming from only shared providers; so the official "Expected result" is that second chunk. That is, this scenario should yield two candidates:

One that gets all its resources from SS1;
One that gets DISK_GB from SS2, and IPV4_ADDRESS and SRIOV_NET_VF from SS1.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/519380

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/518382
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b80e885e0e9685757dc8f9364729a6fbf9be32d2
Submitter: Zuul
Branch: master

commit b80e885e0e9685757dc8f9364729a6fbf9be32d2
Author: Eric Fried <email address hidden>
Date: Tue Nov 7 12:14:33 2017 -0600

    Test allocation_candidates with only sharing RPs

    This change set contains a functional test demonstrating broken-ness
    in AllocationCandidates.get_by_filters when all the providers in the
    database are sharing providers.

    Change-Id: Ifc5f0b11e54d937d960d5dd9102e6287ee7320d8
    Related-Bug: #1730730

Revision history for this message
Eric Fried (efried) wrote :

gibi proposed:
https://review.openstack.org/#/c/519380/

(not sure why the bot didn't pick it up)

Note that the behavior changes when tested against the series starting at https://review.openstack.org/#/c/516778/ -- now we don't get the partial candidates anymore (the ones missing some resources); but we do still get duplicate entries.

Revision history for this message
Chris Dent (cdent) wrote :

Changing the status to triaged since there are tests that exercise it.

Changed in nova:
status: New → Triaged
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
status: Triaged → In Progress
Revision history for this message
Eric Fried (efried) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/516782
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ea63b3b40f0abddce22a02f79ff9275d55d37407
Submitter: Zuul
Branch: master

commit ea63b3b40f0abddce22a02f79ff9275d55d37407
Author: Jay Pipes <email address hidden>
Date: Tue Oct 31 16:03:33 2017 -0400

    finish refactor AllocCandidates._get_by_filters()

    This patch completes the refactoring of the
    AllocationCandidates._get_by_filters() mega-method by splitting out the
    remaining pieces of code that handle the construction of the
    part-shared, part-non-shared allocation requests for when there are
    sharing providers in a deployment.

    The split-out _alloc_candidates_with_shared() is still a long, complex
    method and I'll continue trying to reduce the complexity there and break
    functions out of it further. I did uncomplicate a part of the function
    with use of the itertools.product() function to replace a faulty use of
    zip() against a list of allocation request resource lists.

    Closes-Bug: #1730730
    Change-Id: Ibcfd1a28f36d3c7ccd26fcc6386207e4a25300d4

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b2

This issue was fixed in the openstack/nova 17.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/519380
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fe17f9d15a8eed9104e470e49ba2c19ad6d6f7f8
Submitter: Zuul
Branch: master

commit fe17f9d15a8eed9104e470e49ba2c19ad6d6f7f8
Author: Balazs Gibizer <email address hidden>
Date: Mon Nov 13 15:16:25 2017 +0100

    Test alloc_cands with non overlapping sharing RPs

    This change set contains an additional functional test demonstrating
    broken-ness in AllocationCandidates.get_by_filters when all the
    providers in the database are sharing providers but all of them provides
    resources from unique resource classes.

    Change-Id: Ibb4b147d79a3757ae163bdf3f790bbcee59ecbda
    Related-Bug: #1730730

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/528591

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/528591
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=449f3402dbe86917fedb02b9b176ec532e6f6008
Submitter: Zuul
Branch: master

commit 449f3402dbe86917fedb02b9b176ec532e6f6008
Author: Tetsuro Nakamura <email address hidden>
Date: Sun Dec 17 13:01:40 2017 +0900

    Fix duplicate allocation candidates

    Some combinations of resource providers were duplicated in
    allocation candidates.

    This can happen when multiple sharing RPs are in the same aggregates,
    because both sharing RPs visit each other during the algorithm to
    see if it can be used as an allocation candidate's mate.

    This patch adds to _alloc_candidates_with_shared() function a check
    if we already have that combination in allocation candidates by
    adding `alloc_prov_ids`, which is a list of sets of allocation
    candidates's rp_ids.

    Change-Id: Iefd3137ec68029d27081a743e52e327625585081
    Closes-Bug:#1730730

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b3

This issue was fixed in the openstack/nova 17.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.