pre-cells_v2 nova-osapi_compute service in database breaks instance lookup

Bug #1759316 reported by Sam Yaple
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Ocata
Confirmed
Medium
Unassigned
Pike
Confirmed
Medium
Unassigned
Queens
Fix Committed
Medium
Matt Riedemann

Bug Description

This was encoutered on Ocata after and upgrade from Newton, but affects master to the best of my knowledge.

During our upgrade from Newton -> Ocata after we finished cells_v2 migration and map'd instances accordingly, `nova show $uuid` no longer worked. Returning the error:

{"itemNotFound": {"message": "Instance 0e1e6038-bc69-4a85-b4cc-779e3b1d367a could not be found.", "code": 404}}

After much probing and with the complete lack of logs/warnings I discovered that the 'nova-osapi_compute' service was reporting a different 'host' and there were duplicate entires for the same box (one using the ip address, the other using the hostname of the box). The older entries still had version < 15. [0]

With version less than 15 and cells_v2, the instance lookup will not work since it never reaches the code path needed to talk to cells_v2 things. [1]

The solution was to service delete the old services.

My suggestion moving forward is to do one or more of the following:
 * place a WARN in the linked nova code [1]
 * add a check to `nova-status upgrade check` to look for old service entries

[0] http://paste.openstack.org/show/715421/
[1] https://github.com/openstack/nova/blob/ed55dcad83d5db2fa7e43fc3d5465df1550b554c/nova/compute/api.py#L2263-L2270

Tags: api cells upgrade
Revision history for this message
Matt Riedemann (mriedem) wrote :

The code in [1] was added in Newton, and I think we'd be OK to add a warning if you're not using cells v1 and the osapi_compute minimum version is < 15 in that code as a breadcrumb at least, and we could backport that through to queens, pike and ocata.

For nova-status, we'd likely add a check that queries the minimum nova-osapi_compute service version across all cells (API services should really only be in once cell though) and if < 15 we'd emit a warning. The thing about the nova-status check would be, if you had older nova-osapi_compute services in your nova (cell) database from before upgrading to ocata where cells v2 was required, and then you re-configured the API to point the [database]/connection at the nova_cell0 database and created a new 'current' service version, the cross-cell min version check would give a warning for a cell table entry you don't actually care about. The resolution would just be to delete that entry though I think. Alternatively, we could just not look across cells in nova-status and just rely on [database]/connection being set (or at least look in cell0).

tags: added: api cells upgrade
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/557506

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/557506
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eaf6340847c35ace3b4b681a95b8a79a7a3f2491
Submitter: Zuul
Branch: master

commit eaf6340847c35ace3b4b681a95b8a79a7a3f2491
Author: Matt Riedemann <email address hidden>
Date: Wed Mar 28 16:26:48 2018 -0400

    Log a warning and add nova-status check for old API service versions

    Change Ib984c30543acb3ca9cb95fb53d44d9ded0f5a5c8, which was added
    in Newton when cells v2 was optional, added some transitional code
    to the API for looking up an instance, which didn't rely on instance
    mappings in a cell to find the instance if the minimum nova-osapi_compute
    service version was from before Ocata.

    People have reported this being a source of confusion when upgrading
    from before Ocata, when cells v2 wasn't required, to Ocata+ where cells
    v2 along with the mapping setup is required. That's because they might
    have older nova-osapi_compute service version records in their 'nova'
    (cell) database which makes the API think the code is older than it
    actually is, and results in an InstanceNotFound error.

    This change does two things:

    1. Adds a warning to the compute API code in this scenario to serve
       as a breadcrumb if a deployment hits this issue.

    2. A nova-status check to look for minimum nova-osapi_compute service
       versions across all cells and report the issue as a warning. It's
       not an upgrade failure since we don't know how the nova-api service
       is configured, but leave that investigation up to the deployer.

    This is also written in such a way that we should be able to backport
    this through to stable/ocata.

    Change-Id: Ie2bc4616439352850cf29a9de7d33a06c8f7c2b8
    Closes-Bug: #1759316

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/563251

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/563251
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aaa259d9d34bab4fd168111b7393c000f7b82077
Submitter: Zuul
Branch: stable/queens

commit aaa259d9d34bab4fd168111b7393c000f7b82077
Author: Matt Riedemann <email address hidden>
Date: Wed Mar 28 16:26:48 2018 -0400

    Log a warning and add nova-status check for old API service versions

    Change Ib984c30543acb3ca9cb95fb53d44d9ded0f5a5c8, which was added
    in Newton when cells v2 was optional, added some transitional code
    to the API for looking up an instance, which didn't rely on instance
    mappings in a cell to find the instance if the minimum nova-osapi_compute
    service version was from before Ocata.

    People have reported this being a source of confusion when upgrading
    from before Ocata, when cells v2 wasn't required, to Ocata+ where cells
    v2 along with the mapping setup is required. That's because they might
    have older nova-osapi_compute service version records in their 'nova'
    (cell) database which makes the API think the code is older than it
    actually is, and results in an InstanceNotFound error.

    This change does two things:

    1. Adds a warning to the compute API code in this scenario to serve
       as a breadcrumb if a deployment hits this issue.

    2. A nova-status check to look for minimum nova-osapi_compute service
       versions across all cells and report the issue as a warning. It's
       not an upgrade failure since we don't know how the nova-api service
       is configured, but leave that investigation up to the deployer.

    This is also written in such a way that we should be able to backport
    this through to stable/ocata.

    Conflicts:
          doc/source/cli/nova-status.rst

    NOTE(mriedem): The conflict is because the Rocky section
    in the man page does not exist in Queens. The note about
    the new check is added to the Queens section and mentions
    it was backported from Rocky.

    Change-Id: Ie2bc4616439352850cf29a9de7d33a06c8f7c2b8
    Closes-Bug: #1759316
    (cherry picked from commit eaf6340847c35ace3b4b681a95b8a79a7a3f2491)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.5

This issue was fixed in the openstack/nova 17.0.5 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.