Resize test fails in conductor during migration/instance allocation swap: "Unable to replace resource claim on source host"

Bug #1728722 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dan Smith

Bug Description

Resize tests are intermittently failing in the gate:

http://logs.openstack.org/96/516396/1/check/legacy-tempest-dsvm-py35/ecb9db4/logs/screen-n-super-cond.txt.gz?level=TRACE#_Oct_30_18_01_18_003148

Oct 30 18:01:18.003148 ubuntu-xenial-inap-mtl01-0000586035 nova-conductor[22452]: ERROR nova.conductor.tasks.migrate [None req-2818e7b7-6881-4cfb-ae79-1816cb948748 tempest-ListImageFiltersTestJSON-1403553182 tempest-ListImageFiltersTestJSON-1403553182] [instance: f5aec132-8a62-47a5-a967-8e5d18a9c6f8] Unable to replace resource claim on source host ubuntu-xenial-inap-mtl01-0000586035 node ubuntu-xenial-inap-mtl01-0000586035 for instance

The request in the placement logs starts here:

http://logs.openstack.org/96/516396/1/check/legacy-tempest-dsvm-py35/ecb9db4/logs/screen-placement-api.txt.gz#_Oct_30_18_01_16_940644

Oct 30 18:01:17.993287 ubuntu-xenial-inap-mtl01-0000586035 <email address hidden>[15936]: DEBUG nova.api.openstack.placement.wsgi_wrapper [None req-7eec8dd2-f65c-43fa-b3df-cdf7a236aa03 service placement] Placement API returning an error response: Inventory changed while attempting to allocate: Another thread concurrently updated the data. Please retry your update {{(pid=15938) call_func /opt/stack/new/nova/nova/api/openstack/placement/wsgi_wrapper.py:31}}
Oct 30 18:01:17.994558 ubuntu-xenial-inap-mtl01-0000586035 <email address hidden>[15936]: INFO nova.api.openstack.placement.requestlog [None req-7eec8dd2-f65c-43fa-b3df-cdf7a236aa03 service placement] 198.72.124.85 "PUT /placement/allocations/52b215a6-0d60-4fcc-8389-2645ffb22562" status: 409 len: 305 microversion: 1.8

The error from placement is a bit misleading. It's probably not that inventory has changed, but allocations have changed in the meantime since this is a single-node environment, so capacity changd and conductor needs to retry, just like the scheduler does.

Matt Riedemann (mriedem)
Changed in nova:
status: New → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

We basically want something like this in the migration code that's swapping the allocations:

https://github.com/openstack/nova/blob/965f56d7d2ca1f668f70d24d4dcc20e418bb5b9c/nova/scheduler/client/report.py#L1013

Or we could implement this TODO in the placement server code, but that would require a microversion so the client can know if it's there or not and rely on the retry in the server, or if the client has to perform the retries:

https://github.com/openstack/nova/blob/965f56d7d2ca1f668f70d24d4dcc20e418bb5b9c/nova/objects/resource_provider.py#L1887

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/516708

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/516708
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=62d35009755a45c39a0b1cdc4c69791f469e469e
Submitter: Zuul
Branch: master

commit 62d35009755a45c39a0b1cdc4c69791f469e469e
Author: Dan Smith <email address hidden>
Date: Tue Oct 31 07:46:36 2017 -0700

    Make put_allocations() retry on concurrent update

    This adds a retries decorator to the scheduler report client
    and modifies put_allocations() so that it will detect a concurrent
    update, raising the Retry exception to trigger the decorator.

    This should be usable by other methods in the client easily, but
    this patch only modifies put_allocations() to fix the bug.

    Change-Id: Ic32a54678dd413668f02e77d5e6c4195664ac24c
    Closes-Bug: #1728722

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b2

This issue was fixed in the openstack/nova 17.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.