novaclient.tests.functional.v2.test_servers.TestServersListNovaClient.test_list_all_servers broken since at least 12/2

Bug #1647464 reported by Matt Riedemann
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Newton
Fix Committed
Medium
Matt Riedemann

Bug Description

We're always getting a 400 on the marker value here now:

http://logs.openstack.org/59/406359/1/check/gate-novaclient-dsvm-functional-neutron/30f5c67/console.html#_2016-12-05_18_00_49_690292

2016-12-05 18:00:49.690292 | 2016-12-05 18:00:49.689 | novaclient.tests.functional.v2.test_servers.TestServersListNovaClient.test_list_all_servers
2016-12-05 18:00:49.691677 | 2016-12-05 18:00:49.691 | -------------------------------------------------------------------------------------------
2016-12-05 18:00:49.694401 | 2016-12-05 18:00:49.694 |
2016-12-05 18:00:49.695830 | 2016-12-05 18:00:49.695 | Captured traceback:
2016-12-05 18:00:49.697230 | 2016-12-05 18:00:49.696 | ~~~~~~~~~~~~~~~~~~~
2016-12-05 18:00:49.698889 | 2016-12-05 18:00:49.698 | Traceback (most recent call last):
2016-12-05 18:00:49.700319 | 2016-12-05 18:00:49.699 | File "novaclient/tests/functional/v2/legacy/test_servers.py", line 104, in test_list_all_servers
2016-12-05 18:00:49.701907 | 2016-12-05 18:00:49.701 | output = self.nova("list", params="--limit -1 --name %s" % name)
2016-12-05 18:00:49.703240 | 2016-12-05 18:00:49.702 | File "novaclient/tests/functional/base.py", line 316, in nova
2016-12-05 18:00:49.704505 | 2016-12-05 18:00:49.704 | endpoint_type, merge_stderr)
2016-12-05 18:00:49.706426 | 2016-12-05 18:00:49.706 | File "/opt/stack/new/python-novaclient/.tox/functional/local/lib/python2.7/site-packages/tempest/lib/cli/base.py", line 124, in nova
2016-12-05 18:00:49.707668 | 2016-12-05 18:00:49.707 | 'nova', action, flags, params, fail_ok, merge_stderr)
2016-12-05 18:00:49.709199 | 2016-12-05 18:00:49.708 | File "/opt/stack/new/python-novaclient/.tox/functional/local/lib/python2.7/site-packages/tempest/lib/cli/base.py", line 368, in cmd_with_auth
2016-12-05 18:00:49.710930 | 2016-12-05 18:00:49.710 | self.cli_dir)
2016-12-05 18:00:49.712387 | 2016-12-05 18:00:49.712 | File "/opt/stack/new/python-novaclient/.tox/functional/local/lib/python2.7/site-packages/tempest/lib/cli/base.py", line 68, in execute
2016-12-05 18:00:49.714028 | 2016-12-05 18:00:49.713 | result_err)
2016-12-05 18:00:49.715601 | 2016-12-05 18:00:49.715 | tempest.lib.exceptions.CommandFailed: Command '['/opt/stack/new/python-novaclient/.tox/functional/bin/nova', '--os-username', 'admin', '--os-tenant-name', 'admin', '--os-password', 'secretadmin', '--os-auth-url', 'http://10.13.96.44/identity_admin', '--os-compute-api-version', '2.latest', '--os-endpoint-type', 'publicURL', 'list', '--limit', '-1', '--name', '6a31a7c8-189d-4a63-88d5-7ee1f63f6810']' returned non-zero exit status 1.
2016-12-05 18:00:49.717098 | 2016-12-05 18:00:49.716 | stdout:
2016-12-05 18:00:49.718661 | 2016-12-05 18:00:49.718 |
2016-12-05 18:00:49.720061 | 2016-12-05 18:00:49.719 | stderr:
2016-12-05 18:00:49.721559 | 2016-12-05 18:00:49.721 | ERROR (BadRequest): marker [282483d5-433b-4c34-8a5d-894e40db705d] not found (HTTP 400) (Request-ID: req-d0b88399-b0d6-4f0c-881c-442e88944350)

There isn't anything obvious in the nova and novaclient changes around 12/2 that would cause this.

Revision history for this message
Matt Riedemann (mriedem) wrote :

There could have been a change in devstack around 12/2 that broke things too, but nothing is obvious from the git history.

Revision history for this message
Matt Riedemann (mriedem) wrote :

It seems to blow up when listing ports for the instance:

http://logs.openstack.org/59/406359/1/check/gate-novaclient-dsvm-functional-neutron/30f5c67/logs/screen-n-api.txt.gz#_2016-12-05_17_15_15_511

2016-12-05 17:15:15.511 28673 DEBUG neutronclient.v2_0.client [req-679305ef-b796-4b7d-8576-719620dd4512 admin admin] GET call to neutron for http://10.13.96.44:9696/v2.0/ports.json?device_id=95ec3dcd-70e5-4c40-8fca-40bff547f1f8&device_id=3ed887f1-2995-42a4-81ce-10065f00b17c&device_id=cc0f6946-b1f5-4c5e-8f5d-8bec2d908b8b used request id req-8a3d595f-2b92-4199-842c-12171886a45a _append_request_id /usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py:127
..

2016-12-05 17:15:15.678 28673 INFO nova.api.openstack.wsgi [req-226e3e6f-f1af-4e3b-bdda-c4c6c4870d86 admin admin] HTTP exception thrown: marker [cc0f6946-b1f5-4c5e-8f5d-8bec2d908b8b] not found

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is failing with python-neutronclient==6.0.0 which was released on 9/1 and it looks like we've been using it in upper-constraints since then too:

https://github.com/openstack/requirements/commit/9fe53336f874e242dc59cf8aaf3a74cb0000e8fe

So it doesn't seem that's what the problem would be.

I wonder if it's related to this:

https://github.com/openstack/neutron/commit/7e9986411c48cc4302910beaec159ec9bb6aa558

Revision history for this message
Matt Riedemann (mriedem) wrote :

The neutron thing is probably a red herring, as it looks like that marker not found message is coming from nova:

http://git.openstack.org/cgit/openstack/nova/tree/nova/api/openstack/compute/servers.py#n349

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-novaclient (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/407205

Revision history for this message
Matt Riedemann (mriedem) wrote :

I've pushed up a couple of test patches to get more logging when we hit the test failure so we can maybe trace this better.

Revision history for this message
Diana Clarke (diana-clarke) wrote :

Could this be the culprit?

    Always setup cells v2
    openstack-dev/devstack
    https://review.openstack.org/#/c/392289/
    Merged: Dec 2, 2016

I've spent a few hours on this and nothing else is jumping out at me.

Revision history for this message
Diana Clarke (diana-clarke) wrote :

Yup, if I revert the devstack change then python-novaclient is green again. For example (with a depends on):

https://review.openstack.org/#/c/406360/

Revision history for this message
Matt Riedemann (mriedem) wrote :

As Diana pointed out, this started failing after we started running the job with cells v2.

I think I've figured out through my debugging patches that we're querying the cell0 database for the instance by uuid (the marker) and that's raising the MarkerNotFound and we don't handle it in the compute API code, then we fail.

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → Confirmed
importance: Undecided → High
Changed in python-novaclient:
status: New → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah the problem is cell0:

http://logs.openstack.org/05/407205/1/check/gate-novaclient-dsvm-functional-identity-v3-only-ubuntu-xenial-nv/61e9a05/logs/screen-n-api.txt.gz#_2016-12-06_20_58_54_417

2016-12-06 20:58:54.417 27543 ERROR nova.compute.api [req-de260414-aacf-47c4-949b-0b501efa5e69 admin admin] Failed to find instance by marker fe05d5ed-e97a-48c3-a57f-fb8a05f43d88 in cell0; ignoring
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api Traceback (most recent call last):
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/compute/api.py", line 2404, in get_all
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api sort_dirs=sort_dirs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/compute/api.py", line 2483, in _get_instances_by_filters
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api expected_attrs=fields, sort_keys=sort_keys, sort_dirs=sort_dirs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api result = fn(cls, context, *args, **kwargs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/objects/instance.py", line 1219, in get_by_filters
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api use_slave=use_slave, sort_keys=sort_keys, sort_dirs=sort_dirs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 226, in wrapper
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api return f(*args, **kwargs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/objects/instance.py", line 1203, in _get_by_filters_impl
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api sort_keys=sort_keys, sort_dirs=sort_dirs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/db/api.py", line 763, in instance_get_all_by_filters_sort
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api sort_dirs=sort_dirs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 170, in wrapper
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api return f(*args, **kwargs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 271, in wrapped
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api return f(context, *args, **kwargs)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 2243, in instance_get_all_by_filters_sort
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api raise exception.MarkerNotFound(marker=marker)
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api MarkerNotFound: Marker fe05d5ed-e97a-48c3-a57f-fb8a05f43d88 could not be found.
2016-12-06 20:58:54.417 27543 ERROR nova.compute.api

So we just need to handle that MarkerNotFound and ignore it when listing instances.

no longer affects: python-novaclient
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/407204
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=72d28ccd6e8ce8cf700891788f06578313b03c38
Submitter: Jenkins
Branch: master

commit 72d28ccd6e8ce8cf700891788f06578313b03c38
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 5 16:24:05 2016 -0500

    Handle MarkerNotFound from cell0 database

    When listing instances in the cellv2 world we look them up
    from three locations:

    1. Build requests which exist before the instances are created
       in the cell database (after the scheduler picks a host to
       build the instance). Currently instances and build requests
       are both created before casting to conductor, but that's going
       away in Ocata with the support for multiple cellsv2 cells.
    2. The cell0 database for instances which failed to get scheduled
       to a compute host (and therefore a cell).
    3. The actual cell database that the instance lives in. Currently
       that's only a single traditional nova database, but could be one
       of multiple cellsv2 cells when we add that support in Ocata.

    If a marker is passed in when listing instances, if the instance
    lives in an actual cell database, we'll get a MarkerNotFound failure
    from cell0 because the instance doesn't exist in cell0, but we check
    cell0 before we check the cell database. This makes the instance
    listing short-circuit and fail with a 400 from the REST API.

    This patch simply handles the MarkerNotFound when listing instances
    from the cell0 database and ignores it so we can continue onto the
    cell database.

    Closes-Bug: #1647464

    Change-Id: I977497be262fb7f2333e32fb7313b29624323422

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/407791

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/407791
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0be839bdfbf15e551b4e6a2788b20502db1933c4
Submitter: Jenkins
Branch: stable/newton

commit 0be839bdfbf15e551b4e6a2788b20502db1933c4
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 5 16:24:05 2016 -0500

    Handle MarkerNotFound from cell0 database

    When listing instances in the cellv2 world we look them up
    from three locations:

    1. Build requests which exist before the instances are created
       in the cell database (after the scheduler picks a host to
       build the instance). Currently instances and build requests
       are both created before casting to conductor, but that's going
       away in Ocata with the support for multiple cellsv2 cells.
    2. The cell0 database for instances which failed to get scheduled
       to a compute host (and therefore a cell).
    3. The actual cell database that the instance lives in. Currently
       that's only a single traditional nova database, but could be one
       of multiple cellsv2 cells when we add that support in Ocata.

    If a marker is passed in when listing instances, if the instance
    lives in an actual cell database, we'll get a MarkerNotFound failure
    from cell0 because the instance doesn't exist in cell0, but we check
    cell0 before we check the cell database. This makes the instance
    listing short-circuit and fail with a 400 from the REST API.

    This patch simply handles the MarkerNotFound when listing instances
    from the cell0 database and ignores it so we can continue onto the
    cell database.

    Closes-Bug: #1647464

    Change-Id: I977497be262fb7f2333e32fb7313b29624323422
    (cherry picked from commit 72d28ccd6e8ce8cf700891788f06578313b03c38)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b2

This issue was fixed in the openstack/nova 15.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.3

This issue was fixed in the openstack/nova 14.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.