Report client doesn't handle RP create conflict (409) properly

Bug #1735430 reported by Eric Fried
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Eric Fried
Ocata
Fix Committed
Medium
Eric Fried
Pike
Fix Committed
Medium
Eric Fried

Bug Description

POST /resource_providers can fail with conflict (HTTP status 409) for (at least) two reasons: A provider with the specified UUID exists; *or* a provider with the specified *name* already exists.

In SchedulerReportClient, _ensure_resource_provider uses helper method _create_resource_provider, whose logic goes like this:

 POST /resource_provider { 'uuid': <uuid>, 'name': <name> }
 if 201:
     cool, return the result
 if 409:
     LOG("Another thread created a provider with this *UUID*")
     GET /resource_provider/<uuid>
     if 200:
         cool, return the result
     if 404 or any other error:
         return None
 if any other error:
     return None

PROBLEM: If a provider exists with the desired *name* (but a different UUID), this code will always return None (via that 404 path).

PROBLEM: Nobody up the stack is checking the return for None.

What this effectively means is that _ensure_resource_provider... doesn't.

IMO we should raise an exception in these error paths, forcing consuming code to handle them explicitly. But at the very least, any code consumuing _ensure_resource_provider needs to validate that it succeeds.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/524263

Changed in nova:
assignee: nobody → Eric Fried (efried)
status: New → In Progress
Eric Fried (efried)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/524618

Chris Dent (cdent)
tags: added: placement resource-tracker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/524263
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=112cd9cd1f31e091920e2f55fb213f78152dfd37
Submitter: Zuul
Branch: master

commit 112cd9cd1f31e091920e2f55fb213f78152dfd37
Author: Eric Fried <email address hidden>
Date: Thu Nov 30 11:39:11 2017 -0600

    Proper error handling by _ensure_resource_provider

    Previously, if _ensure_resource_provider encountered any error from the
    placement REST API, it would (sometimes log a message and) return None.

    Furthermore, a name conflict while creating the provider was treated the
    same as a UUID conflict, which would actually result in None being
    returned.

    With this change set, the error paths that previously returned None now
    raise one of the new ResourceProviderRetrievalFailed or
    ResourceProviderCreationFailed exceptions; and the name conflict path is
    detected and treated as an error condition.

    Note: This change set only touches the SchedulerReportClient side of
    these error conditions - it makes no attempt to add error handling to
    its callers. Case in point, the API samples tests needed fixing because
    they were previously running into the name conflict error condition, but
    not noticing. As currently implemented, the new exceptions will
    percolate up to ComputeManager.update_available_resource_for_node like
    any others coming from SchedulerReportClient, where they will be logged
    and ignored.

    Change-Id: I0c4ca6a81f213277fe7219cb905a805712f81e36
    Closes-Bug: #1735430

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/525309

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b2

This issue was fixed in the openstack/nova 17.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/524618
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=20531dbf89c77b6c63ad43285ccde20f176b6b23
Submitter: Zuul
Branch: stable/pike

commit 20531dbf89c77b6c63ad43285ccde20f176b6b23
Author: Eric Fried <email address hidden>
Date: Thu Nov 30 11:39:11 2017 -0600

    Proper error handling by _ensure_resource_provider

    Previously, if _ensure_resource_provider encountered any error from the
    placement REST API, it would (sometimes log a message and) return None.

    Furthermore, a name conflict while creating the provider was treated the
    same as a UUID conflict, which would actually result in None being
    returned.

    With this change set, the error paths that previously returned None now
    raise one of the new ResourceProviderRetrievalFailed or
    ResourceProviderCreationFailed exceptions; and the name conflict path is
    detected and treated as an error condition.

    Note: This change set only touches the SchedulerReportClient side of
    these error conditions - it makes no attempt to add error handling to
    its callers. Case in point, the API samples tests needed fixing because
    they were previously running into the name conflict error condition, but
    not noticing. As currently implemented, the new exceptions will
    percolate up to ComputeManager.update_available_resource_for_node like
    any others coming from SchedulerReportClient, where they will be logged
    and ignored.

    Closes-Bug: #1735430
    (cherry picked from commit 112cd9cd1f31e091920e2f55fb213f78152dfd37)

    Conflicts:
     nova/scheduler/client/report.py
            - Log translation removal
            - ProviderTree integration
     nova/tests/unit/scheduler/client/test_report.py
            - ksa Adapter conversion
     nova/tests/functional/test_list_servers_ip_filter.py
            - Placement fixture needs to start before compute service

    Change-Id: I0c4ca6a81f213277fe7219cb905a805712f81e36

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/525309
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44912b5fefd94d3cbff1e308fecb0589e7928932
Submitter: Zuul
Branch: stable/ocata

commit 44912b5fefd94d3cbff1e308fecb0589e7928932
Author: Eric Fried <email address hidden>
Date: Thu Nov 30 11:39:11 2017 -0600

    Proper error handling by _ensure_resource_provider

    Previously, if _ensure_resource_provider encountered any error from the
    placement REST API, it would (sometimes log a message and) return None.

    Furthermore, a name conflict while creating the provider was treated the
    same as a UUID conflict, which would actually result in None being
    returned.

    With this change set, the error paths that previously returned None now
    raise one of the new ResourceProviderRetrievalFailed or
    ResourceProviderCreationFailed exceptions; and the name conflict path is
    detected and treated as an error condition.

    Note: This change set only touches the SchedulerReportClient side of
    these error conditions - it makes no attempt to add error handling to
    its callers. Case in point, the API samples tests needed fixing because
    they were previously running into the name conflict error condition, but
    not noticing. As currently implemented, the new exceptions will
    percolate up to ComputeManager.update_available_resource_for_node like
    any others coming from SchedulerReportClient, where they will be logged
    and ignored.

    Closes-Bug: #1735430
    (cherry picked from commit 112cd9cd1f31e091920e2f55fb213f78152dfd37)

    Conflicts:
     nova/scheduler/client/report.py
            - Log translation removal
            - ProviderTree integration
     nova/tests/unit/scheduler/client/test_report.py
            - ksa Adapter conversion
     nova/tests/functional/test_list_servers_ip_filter.py
            - Placement fixture needs to start before compute service

    (cherry picked from commit 20531dbf89c77b6c63ad43285ccde20f176b6b23)

    Conflicts:
      No 2.53 microversion in Ocata:
     doc/api_samples/os-hypervisors/v2.53/hypervisors-detail-resp.json
     nova/tests/functional/api_sample_tests/api_samples/os-hypervisors/v2.53/hypervisors-detail-resp.json.tpl
     nova/tests/functional/test_list_servers_ip_filter.py
     nova/tests/functional/api_sample_tests/test_hypervisors.py
      Return type changed, placement request ID not yet in logs:
     nova/scheduler/client/report.py
     nova/tests/unit/scheduler/client/test_report.py

    Change-Id: I0c4ca6a81f213277fe7219cb905a805712f81e36

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.1

This issue was fixed in the openstack/nova 15.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.