InstanceNotFound due to missing osapi_compute service version when running nova-api under wsgi

Bug #1661360 reported by Alfredo Moralejo on 2017-02-02
24
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Chris Dent
tripleo
Critical
Unassigned

Bug Description

Running OpenStack services from master, when we try to run tempest test tempest.scenario.test_server_basic_ops.TestServerBasicOps.test_server_basic_ops (among others). It always fails with message "u'message': u'Instance bf33af04-6b55-4835-bb17-02484c196f13 could not be found.'" (full log in http://logs.openstack.org/15/424915/8/check/gate-puppet-openstack-integration-4-scenario001-tempest-centos-7/b29f35b/console.html)

According to the sequence in the log, this is what happens:

1. tempest creates an instance:

http://logs.openstack.org/15/424915/8/check/gate-puppet-openstack-integration-4-scenario001-tempest-centos-7/b29f35b/console.html#_2017-02-02_13_04_48_291997

2. nova server returns instance bf33af04-6b55-4835-bb17-02484c196f13 so it seems it has been properly created:

http://logs.openstack.org/15/424915/8/check/gate-puppet-openstack-integration-4-scenario001-tempest-centos-7/b29f35b/console.html#_2017-02-02_13_04_48_292483

3. tempest try to get status of the instance right after creating it and nova server returns 404, instance not found:

http://logs.openstack.org/15/424915/8/check/gate-puppet-openstack-integration-4-scenario001-tempest-centos-7/b29f35b/console.html#_2017-02-02_13_04_48_292565

http://logs.openstack.org/15/424915/8/check/gate-puppet-openstack-integration-4-scenario001-tempest-centos-7/b29f35b/console.html#_2017-02-02_13_04_48_292845

At that time following messages are found in nova log:

2017-02-02 12:58:10.823 7439 DEBUG nova.compute.api [req-eec92d3e-9f78-4915-b3b9-ca6858f8dd6a - - - - -] [instance: bf33af04-6b55-4835-bb17-02484c196f13] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2312
2017-02-02 12:58:10.879 7439 INFO nova.api.openstack.wsgi [req-eec92d3e-9f78-4915-b3b9-ca6858f8dd6a - - - - -] HTTP exception thrown: Instance bf33af04-6b55-4835-bb17-02484c196f13 could not be found.
2017-02-02 12:58:10.880 7439 DEBUG nova.api.openstack.wsgi [req-eec92d3e-9f78-4915-b3b9-ca6858f8dd6a - - - - -] Returning 404 to user: Instance bf33af04-6b55-4835-bb17-02484c196f13 could not be found. __call__ /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:1039

http://logs.openstack.org/15/424915/8/check/gate-puppet-openstack-integration-4-scenario001-tempest-centos-7/b29f35b/logs/nova/nova-api.txt.gz#_2017-02-02_12_58_10_879

4. Then tempest start cleaning up environment, deleting security group, etc...

We are hitting this with nova from commit f40467b0eb2b58a369d24a0e832df1ace6c400c3

Tempest starts cleaning up securitygroup

Emilien Macchi (emilienm) wrote :

It affects Puppet OpenStack CI but also TripleO. We can't spawn a VM anymore.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → ocata-rc1
Matt Riedemann (mriedem) on 2017-02-02
tags: added: ocata-rc-potential

Related fix proposed to branch: master
Review: https://review.openstack.org/428404

Current thinking is we have a cached service version of 0 which is making us hit this:

https://github.com/openstack/nova/blob/ed55dcad83d5db2fa7e43fc3d5465df1550b554c/nova/compute/api.py#L2269

Before the instance is created.

Alfredo Moralejo (amoralej) wrote :

It seems that the problem my be related to running nova api under apache using wsgi.

Nova tries to discover version of nova api in https://github.com/openstack/nova/blame/master/nova/compute/api.py#L2264 by using the registered services. When running nova under apache, api service is not registered in services table so service_version is set to 0 so behavior is incorrect.

Matt Riedemann (mriedem) wrote :

The issue, it turns out, is that TripleO runs nova-api with Apache which doesn't get the nova-osapi_compute service record created in the database, which is what this code would do (but doesn't with Apache):

https://github.com/openstack/nova/blob/ed55dcad83d5db2fa7e43fc3d5465df1550b554c/nova/service.py#L139

As noted here:

https://github.com/openstack/nova/blob/ed55dcad83d5db2fa7e43fc3d5465df1550b554c/nova/wsgi/nova-api.py#L15

Running nova-api under Apache is experimental and we don't test it in our gating jobs.

So I think we should just provide a hack workaround for this in Ocata (since today is rc1) and then work on getting one of our gating jobs to run nova-api under Apache in Pike, probably the -placement- job.

Matt Riedemann (mriedem) on 2017-02-02
Changed in nova:
assignee: nobody → Dan Smith (danms)
status: New → In Progress
Download full text (5.0 KiB)

Looking at a debug patch with a tripleo run, the instance that fails the heat stack create with the 404 fails to schedule:

http://logs.openstack.org/60/428360/2/experimental/gate-tripleo-ci-centos-7-nonha-multinode/7a6e5bc/logs/subnode-2/var/log/nova/nova-conductor.txt.gz#_2017-02-03_05_17_35_826

2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager [req-5e120dea-89b0-4cf7-a7cc-d7893234aea2 - - - - -] Failed to schedule instances
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager Traceback (most recent call last):
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 866, in schedule_and_build_instances
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager request_specs[0].to_legacy_filter_properties_dict())
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 597, in _schedule_instances
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager hosts = self.scheduler_client.select_destinations(context, spec_obj)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 371, in wrapped
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager return func(*args, **kwargs)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager return self.queryclient.select_destinations(context, spec_obj)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager return getattr(self.instance, __name)(*args, **kwargs)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 32, in select_destinations
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager return self.scheduler_rpcapi.select_destinations(context, spec_obj)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 129, in select_destinations
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager retry=self.retry)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager timeout=timeout, retry=retry)
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in send
2017-02-03 05:17:35.826 43546 ERROR nova.conductor.manager retr...

Read more...

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/428360
Reason: We don't need this anymore, it's the service version being 0 that's the problem:

https://review.openstack.org/#/c/428415/

http://logs.openstack.org/15/428415/1/experimental/gate-tripleo-ci-centos-7-nonha-multinode/4a76477/

OK yeah so the issue is that the instance fails to build (for whatever reason, I'm not sure what that is but it's probably a separate bug), but then the instance is put into the cell0 database but the instance GET code in the compute API doesn't lookup the instance from the cell0 database because the service version is 0 since TripleO is running under Apache which doesn't run our code to create a service record in the nova database for the nova-osapi_compute service.

So the recommendation is to not run nova-api under Apache in Ocata and the nova team will work on fixing this properly in Pike and start gating on that configuration.

Change abandoned by Dan Smith (<email address hidden>) on branch: master
Review: https://review.openstack.org/428415
Reason: We're not going to be able to work around this in the short term, as seen from this patch. We'll fix it for real in pike.

Reviewed: https://review.openstack.org/428785
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=4d58e7704a8a422c308ac55f083d879addf70e12
Submitter: Jenkins
Branch: master

commit 4d58e7704a8a422c308ac55f083d879addf70e12
Author: Emilien Macchi <email address hidden>
Date: Fri Feb 3 10:54:39 2017 -0500

    Stop deploying Nova API in WSGI with Apache

    It was suggested by Nova team to not deploying Nova API in WSGI with
    Apache in production.
    It's causing some issues that we didn't catch until now (see in the bug
    report). Until we figure out what was wrong, let's disable it so we can
    move forward in the upgrade process.

    Note: once it's supported by Nova, we'll revert this patch.
    Change-Id: I2712ca0b9626771cec1f3d98b04cc8c18eb1cb15
    Related-Bug: 1661360

Reviewed: https://review.openstack.org/428783
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=22c5d341776c02dfafab1f58f68a852da34f3692
Submitter: Jenkins
Branch: master

commit 22c5d341776c02dfafab1f58f68a852da34f3692
Author: Emilien Macchi <email address hidden>
Date: Fri Feb 3 10:40:41 2017 -0500

    Stop deploying Nova API in WSGI with Apache

    It was suggested by Nova team to not deploying Nova API in WSGI with
    Apache in production.
    It's causing some issues that we didn't catch until now (see in the bug
    report). Until we figure out what was wrong, let's disable it so we can
    move forward in the upgrade process.

    Related-Bug: 1661360

    Co-Authored-By: Juan Antonio Osorio Robles <email address hidden>
    Change-Id: Ia87b5bdea79e500ed41c30beb9aa9d6be302e3ac

Reviewed: https://review.openstack.org/428778
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9f48b91ce77c57d45b916b865ab2f2e075299543
Submitter: Jenkins
Branch: master

commit 9f48b91ce77c57d45b916b865ab2f2e075299543
Author: Emilien Macchi <email address hidden>
Date: Fri Feb 3 10:30:59 2017 -0500

    Stop deploying Nova API in WSGI with Apache

    It was suggested by Nova team to not deploying Nova API in WSGI with
    Apache in production.
    It's causing some issues that we didn't catch until now (see in the bug
    report). Until we figure out what was wrong, let's disable it so we can
    move forward in the upgrade process.

    Change-Id: I09b73476762593642a0e011f83f0233de68f2c33
    Related-Bug: 1661360

Matt Riedemann (mriedem) on 2017-02-09
tags: added: apache api
removed: ocata-rc-potential

Looks like the tripleo changes have all landed.

Changed in tripleo:
status: Triaged → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
Matt Riedemann (mriedem) on 2017-04-07
summary: - tempest test fails with "Instance not found" error
+ InstanceNotFound due to missing osapi_compute service version when
+ running nova-api under wsgi

Fix proposed to branch: master
Review: https://review.openstack.org/457283

Changed in nova:
assignee: Dan Smith (danms) → Chris Dent (cdent)

Reviewed: https://review.openstack.org/457283
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d3c084f23448d1890bfda4a06de246f2be3c1279
Submitter: Jenkins
Branch: master

commit d3c084f23448d1890bfda4a06de246f2be3c1279
Author: Chris Dent <email address hidden>
Date: Mon Apr 17 16:38:49 2017 +0000

    Register osapi_compute when nova-api is wsgi

    When the nova-api services starts from its own standalone binary it
    registers itself in the services table. The original wsgi script in
    nova/wsgi/nova-api.py did not, leading to the bug referenced below.

    The new wsgi script at nova.api.openstack.compute.wsgi, modelled on
    a similar thing used for the placement API, provides the necessary
    service registration.

    If a ServiceTooOld exception happens while trying to register the
    service then a very simple (currently very stubby) application is
    loaded instead of the compute api. This application returns a 500
    and a message.

    Some caveats/todos:

    * wsgi apps managed under mod-wsgi (and presumably other containers)
      are not imported/compiled/run until the first request is made. In
      this case that means the service handling does not happen until
      that first request, somewhat defeating the purpose if the api is a
      bit idle.

    Change-Id: I7c4acfaa6c50ac0e4d6de69eb62ec5bbad72ff85
    Closes-Bug: #1661360

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/461289
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4b897557845c733b851b914065ad687a3eb3f85e
Submitter: Jenkins
Branch: master

commit 4b897557845c733b851b914065ad687a3eb3f85e
Author: Chris Dent <email address hidden>
Date: Sun Apr 30 15:30:55 2017 +0000

    devref and reno for nova-{api,metadata}-wsgi scripts

    This provides a brief explanation of the new nova-api-wsgi [1] and
    nova-metadata-wsgi [2] scripts in the Architecture section of the devref
    with links to the new doc added to the man pages for the eventlet
    scripts.

    The nova-api.rst mentioned ec2 so figured best to fix that now
    rather than forget about it, despite not being entirely germane.

    There is also a reno note that indicates the availability of the new
    scripts.

    There is a devstack change which is testing the new wsgi scripts as
    well as forcing grenade to not use them at
    If2d7e363a6541854f2e30c03171bef7a41aff745

    [1] I7c4acfaa6c50ac0e4d6de69eb62ec5bbad72ff85
    [2] Icb35fe2b94ab02c0ba8ba8129ae18aae0f794756

    Change-Id: I351b2af3b256d3031bd2a65feba0495e815f8427
    Related-Bug: #1661360

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/428404
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Reviewed: https://review.openstack.org/511503
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4d7acf3a0caa111f407ea7aac5dde95ff6ca49b6
Submitter: Zuul
Branch: stable/ocata

commit 4d7acf3a0caa111f407ea7aac5dde95ff6ca49b6
Author: Matt Riedemann <email address hidden>
Date: Thu Oct 12 10:28:11 2017 -0400

    Add release note for running nova-api under wsgi in Ocata

    Commit e846c32ce3c3aee2cd83dec7561dc14cb3f0ada8 added a warning
    about running nova-api under wsgi in Ocata but that's not very
    discoverable for people that are upgrading and already running
    in this mode.

    This change adds a release note to make that more obvious and
    link to the known bug.

    Related-Bug: #1661360
    Related-Bug: #1682423

    Change-Id: Id9034f795de1b55ac8190feaa37a370e9afd2d8d

tags: added: in-stable-ocata
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers