nova-compute can not regist resource_provider when nova-compute start before placement endpoint created

Bug #1697825 reported by zhang zhenzhong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Chris Dent

Bug Description

1.When nova-compute start,bug placement's endpoint is not ready, it get the keystone session whitout placement's endpoint, cause nova-comuter regist resource_provider in nova_api failed.
2.Then create placement's endpoint, but nova-compute can not regist resource_provider automatic because of the old keystone session. In this case, It must restart the nova-compute to regist the resource_provider

I think the keystone client session should be reload when regist resource_provider failed.

Changed in nova:
assignee: nobody → zhang zhenzhong (zzzhang0118)
tags: added: placement
summary: - nova-compute and not regist resource_provider when nova-compute start
- before placement endpoint create
+ nova-compute can not regist resource_provider when nova-compute start
+ before placement endpoint created
Revision history for this message
Matt Riedemann (mriedem) wrote :

Which release are you testing on? Newton or Ocata? Or master (pike)? Because we do continue to retry in the compute service to connect to placement even if the first attempt fails:

https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L192

Or is the problem with the ksa session created here?

https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L190

Changed in nova:
status: New → In Progress
Revision history for this message
zhang zhenzhong (zzzhang0118) wrote :

we found it in Ocata release
if report_client can not get the endpoint at init
https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L190

it always reach line 412 when every update_availbility_resource in compute serice
https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L412

Revision history for this message
zhang zhenzhong (zzzhang0118) wrote :
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: zhang zhenzhong (zzzhang0118) → nobody
Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
tags: added: compute
Changed in nova:
assignee: nobody → zhang zhenzhong (zzzhang0118)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/482222

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ocata)

Change abandoned by zhang zhenzhong (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/482222

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: zhang zhenzhong (zzzhang0118) → nobody
Revision history for this message
Chris Dent (cdent) wrote :

Turning back on the in progress and assignment. This got lost, probably when the change was split out from an unrelated change:

https://review.openstack.org/#/c/483460/

Changed in nova:
status: Confirmed → In Progress
assignee: nobody → zhang zhenzhong (zzzhang0118)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/493536

Changed in nova:
assignee: zhang zhenzhong (zzzhang0118) → Chris Dent (cdent)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/493536
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c6b0d8ff5a80bf87f49124d8d9e4621d157c51e1
Submitter: Jenkins
Branch: master

commit c6b0d8ff5a80bf87f49124d8d9e4621d157c51e1
Author: Chris Dent <email address hidden>
Date: Mon Aug 14 13:47:56 2017 +0100

    Reset client session when placement endpoint not found

    If the report client is able to access keystone to get a service
    catalog, but that catalog does not include a placement service (because
    it hasn't been added to the catalog yet), creating resource providers
    and other placement entities will fail. This is expected.

    What's not expected is that creating entities will continue to fail for
    quite some time, even if placement is added to the catalog. This is
    because the keystone session caches the service catalog for some amount
    of time.

    Therefore we need to create a new client session when EndpointNotFound
    happens. This has been added in this change by extracting creation of
    the report client's _client to its own method that we can call from the
    exception handler. The resource provider and aggregate maps are wiped at
    this time, to make sure we are starting from a clean slate. While this
    isn't likely to cause a problem in real life scenarios, in the manual
    testing I was doing it created issues.

    I've made the _client method synchronized so that in the unlikely event
    that the resource tracker is trying to do its update job while some
    other thing is happening, we won't waste the client. This may not be
    necessary, but probably doesn't harm anything.

    Change-Id: I02ac615dc26a4a0d1fd28a638f622777e41d14e4
    Co-Authored-By: zhangzhenzhong <email address hidden>
    Closes-Bug: #1697825

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/483460
Reason: This was fixed at https://review.opendev.org/#/c/493536/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.