c-api times out devstack waiting to start

Bug #1722326 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Matt Riedemann

Bug Description

We're seeing CI failures where c-api takes >60 seconds to start:

http://logs.openstack.org/50/498950/18/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/e4982c0/logs/devstacklog.txt.gz#_2017-10-09_15_10_12_484

The devstack timeout is 60 seconds, and in this case it took 62 seconds for c-api to start serving requests:

http://logs.openstack.org/50/498950/18/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/e4982c0/logs/screen-c-api.txt.gz#_Oct_09_15_10_14_032382

Oct 09 15:10:14.061521 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[21361]: WSGI app 0 (mountpoint='') ready in 62 seconds on interpreter 0x1eedac0 pid: 21363 (default app)

Compare to n-api which takes 8 seconds to start:

Oct 09 15:05:30.335892 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[11763]: WSGI app 0 (mountpoint='') ready in 8 seconds on interpreter 0xd23ae0 pid: 11766 (default app)

So what is cinder-api doing on startup that takes so long? Can any of it be run in the background?

Revision history for this message
Matt Riedemann (mriedem) wrote :

I see "Initializing extension manager." logged 6 times on the same process.

There are two API workers:

osapi_volume_workers = 2

So that's loading extensions for v1, v2 and v3 on 2 workers, so that's why extensions are loaded 6 times.

--

I'm seeing this in the logs:

Oct 09 15:10:13.364929 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[21361]: result['domain'] = self.domain
Oct 09 15:10:13.529425 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[21361]: INFO cinder.rpc [None req-2447e5c0-eea2-42c6-9b11-958e3dbc9fa2 None None] Automatically selected cinder-scheduler objects version 1.27 as minimum service version.
Oct 09 15:10:13.533670 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[21361]: INFO cinder.rpc [None req-2447e5c0-eea2-42c6-9b11-958e3dbc9fa2 None None] Automatically selected cinder-scheduler RPC version 3.7 as minimum service version.
Oct 09 15:10:13.538091 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[21361]: INFO cinder.rpc [None req-2447e5c0-eea2-42c6-9b11-958e3dbc9fa2 None None] Automatically selected cinder-volume objects version 1.27 as minimum service version.
Oct 09 15:10:13.542528 ubuntu-xenial-inap-mtl01-11290241 <email address hidden>[21361]: INFO cinder.rpc [None req-2447e5c0-eea2-42c6-9b11-958e3dbc9fa2 None None] Automatically selected cinder-volume RPC version 3.15 as minimum service version.

Which looks related to a "work_cleanup" blocking RPC call from the API to the scheduler, which is also going to hit the database. That looks bad.

Changed in cinder:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/510615

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note that the service version lookup that the scheduler is doing is not cached per worker, so it's a fresh lookup to the database each time.

Nova caches the service version per worker:

https://review.openstack.org/#/c/265326/

^ is pretty old, as nova also caches in the api and scheduler workers now too since those services also have to lookup the version.

Matt Riedemann (mriedem)
tags: added: api performance
Revision history for this message
Matt Riedemann (mriedem) wrote :

"Initializing extension manager." is actually coming from a few different places:

1. There are two API worker processes.
2. There are the v2 and v3 API routes. So that's 4 occurrences in the logs.
3. The other is the Version router which runs on both workers, but doesn't actually require
   extensions, so it's logging that and setting up extension routes but doesn't use them.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/510653

Changed in cinder:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Changed in cinder:
assignee: Matt Riedemann (mriedem) → Sean McGinnis (sean-mcginnis)
Changed in cinder:
assignee: Sean McGinnis (sean-mcginnis) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in cinder:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/510615
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=6c630c5e5383ea1cbd5475aad9826bbe2becf364
Submitter: Jenkins
Branch: master

commit 6c630c5e5383ea1cbd5475aad9826bbe2becf364
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 9 12:01:08 2017 -0400

    Remove newton-compat upgrade code on service startup

    This upgrade compat check code was marked for removal
    in Ocata. Since we're now in Queens, and this is burning
    time on every service start, let's remove it.

    Change-Id: I8528752630276bd6e2348f27721238115de504cd
    Related-Bug: #1722326

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/510653
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=abeb81029026dd623f4b132fe6870af40cd97ed1
Submitter: Jenkins
Branch: master

commit abeb81029026dd623f4b132fe6870af40cd97ed1
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 9 14:33:32 2017 -0400

    Do not load extendable routes for the Versions router

    The Versions router is implicitly spending time loading
    up extension routes on startup which is unnecessary since
    the Versions router only cares about handling requests to
    "/" on the base endpoint.

    Change-Id: I7452bb2952459dca8cf0baf39fa76e0e39ad2ffd
    Partial-Bug: #1722326

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/510753

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like https://review.openstack.org/#/c/510615/ fixed the startup time issue in the API service. It drops the startup time from 62 seconds to 4 seconds.

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/510753
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d0dc23fb77b8a7b0496313881434267b0091fb3c
Submitter: Jenkins
Branch: stable/pike

commit d0dc23fb77b8a7b0496313881434267b0091fb3c
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 9 12:01:08 2017 -0400

    Remove newton-compat upgrade code on service startup

    This upgrade compat check code was marked for removal
    in Ocata. Since we're now in Queens, and this is burning
    time on every service start, let's remove it.

    Change-Id: I8528752630276bd6e2348f27721238115de504cd
    Related-Bug: #1722326
    (cherry picked from commit 6c630c5e5383ea1cbd5475aad9826bbe2becf364)

tags: added: in-stable-pike
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.