Report client placement cache consistency is broken

Bug #1746075 reported by Eric Fried
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Eric Fried

Bug Description

Today the report client makes assumptions about how resource provider generation is calculated by the placement service. Specifically, that it starts at zero [1], and that it increases by 1 when the provider's inventory is deleted [2].

While these assumptions happen to be true today [3], they are not a documented part of the placement API. Which either means we need to document this behavior; or clients should not be relying on it.

[1] https://github.com/openstack/nova/blob/b214dfc41928d9e05199263301f8e5b23555c170/nova/scheduler/client/report.py#L552
[2] https://github.com/openstack/nova/blob/b214dfc41928d9e05199263301f8e5b23555c170/nova/scheduler/client/report.py#L927
[3] The latter more broadly stated as "increases by 1 when anything about the provider changes" - except we have a known hole for aggregates (see https://blueprints.launchpad.net/nova/+spec/placement-aggregate-generation)

Tags: placement
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/539324

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/539712

Andrey Volkov (avolkov)
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/539712
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26c593c91f008caab92ed52156dfe2d898955d3f
Submitter: Zuul
Branch: master

commit 26c593c91f008caab92ed52156dfe2d898955d3f
Author: Eric Fried <email address hidden>
Date: Tue Jan 30 21:53:52 2018 -0600

    Avoid inventory DELETE API (no conflict detection)

    SchedulerReportClient._delete_inventory used the DELETE
    /resource_providers/{u}/inventories API, which does not provide a way to
    send the generation down (see bug 1746373), and is therefore subject to
    concurrency errors.

    This change set removes the _delete_inventory method and uses PUT
    /resource_providers/{u}/inventories with an empty 'inventories' dict
    instead, because that API *does* take the generation in the payload.

    Doing this makes moot part of bug 1746075 by removing one of the code
    paths that exploits undocumented knowledge of how placement uses the
    generation counter.

    Change-Id: Ia8131b32debd56b28315ef430ee4e8888b6f56e7
    Closes-Bug: #1746374
    Related-Bug: #1746373
    Related-Bug: #1746075

Revision history for this message
Eric Fried (efried) wrote :

#agreed at the PTG that the generation *should* be opaque.

The fix in comment #3 takes care of case [2] from the description.

In order to fix case [1], we're going to need a way to get the generation from a freshly created provider. We could follow POST /resource_providers with GET $LOCATION, but we'd rather just be able to glean the generation from the POST response. Blueprint generation-from-create-provider [1] will make that happen. Then we'll consume it from the report client for case [2] to close this bug.

[1] https://blueprints.launchpad.net/nova/+spec/generation-from-create-provider

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/548975

Changed in nova:
assignee: nobody → Eric Fried (efried)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/548975
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=60c62f7267e578365720f85e4fd068b8d9b290df
Submitter: Zuul
Branch: master

commit 60c62f7267e578365720f85e4fd068b8d9b290df
Author: Eric Fried <email address hidden>
Date: Thu Mar 1 15:41:07 2018 +0000

    Stop assuming initial provider generation is 0

    SchedulerReportClient._create_resource_provider used to assume the
    generation of a freshly-created resource provider is zero, and store
    that value in the provider tree cache. This violates the opacity of the
    generation in the API.

    The fix implemented herein is two-pronged. We exploit the
    implementation of the related blueprint, returning the payload of the
    POST /resource_providers response if available. Otherwise (if the
    available placement microversion is older), rather than assuming
    anything about the new provider, we perform a GET
    /resource_providers/{uuid} and return the result (which contains the
    generation).

    Change-Id: I8e0a37100b0876916db50b08f95f427cb0456616
    Closes-Bug: #1746075
    Related-Blueprint: generation-from-create-provider

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/539324
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7866a016f604c527e97662484c61856e9804dbfa
Submitter: Zuul
Branch: master

commit 7866a016f604c527e97662484c61856e9804dbfa
Author: Eric Fried <email address hidden>
Date: Tue Jan 30 15:10:03 2018 -0600

    Make generation optional in ProviderTree

    Since ProviderTree methods are expected to be used in
    ComputeDriver.update_provider_tree [1], wherein the generation should
    not be specified (it will be ignored by
    SchedulerReportClient.update_from_provider_tree [2] anyway), this change
    makes the generation argument optional for ProviderTree methods new_root
    and update_inventory. (It was already optional for all the other
    methods.)

    TODOs are added to deal with generation properly from some places in the
    report client - see the related bug.

    [1] https://review.openstack.org/#/c/521187/30/nova/virt/driver.py
    [2] https://review.openstack.org/#/c/533821/15/nova/scheduler/client/report.py@1454

    Change-Id: I0d979802865f22195f53d461ab0dd1df6334a066
    Related-Bug: #1746075
    blueprint: update-provider-tree

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.