Can't delete instance with numa_topology after upgrading from kilo

Bug #1596119 reported by Chris Stone
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Hans Lindgren
Declined for Liberty by Matt Riedemann
Mitaka
Fix Released
High
Lee Yarwood

Bug Description

Using the RDO Kilo version of Nova (2015.1.0-3) with dedicated cpu pinning populates the numa_topology database field with data at "nova_object.version": "1.1". After upgrading to Liberty a new instance will be created with a 1.2 object version however already existing instances created under Kilo remain at 1.1. Attempting to do many actions on these instances (start, delete) will fail with the following error: RemoteError: Remote error: InvalidTargetVersion Invalid target version 1.2.

Updating from Kilo to Mitaka produces the same problem.

Revision history for this message
Chris Stone (cstone-0) wrote :

Attaching log info from nova compute during an attempted start of a stopped instance

tags: added: upgrades
tags: added: unified-objects
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

The project "oslo.versionedobjects" introduced this check with commit [1] in the Liberty timeframe.

References:
[1] https://github.com/openstack/oslo.versionedobjects/commit/028aad5b44470a7f57e0f5c1a6fbf9594a9c6947

Revision history for this message
Dan Smith (danms) wrote :

Can you give more detail about your service arrangement after the upgrade to Liberty? Are _all_ services upgraded at once (offline)? From your log snippet, it looks like maybe you upgraded the compute nodes but the conductor nodes are old?

Changed in nova:
status: New → Incomplete
Revision history for this message
Hans Lindgren (hanlind) wrote :

I believe the problem is how instance extras are stored in the db but never gets migrated during upgrades. In this case, when the instance is initially created in Kilo its numa_topology is stored in the db as a serialized NumaTopology object with version 1.1. After upgrading to Liberty, when the instance is fetched from db, the same serialized 1.1 object is returned but now the conductor don't know how to forward port it to the current/requested 1.2 version before returning it back to the caller, hence the raised exception.

Changed in nova:
status: Incomplete → Confirmed
Revision history for this message
Dan Smith (danms) wrote :

Hans, I don't understand how that could be the case. Why would the conductor need to forward port it? All the nodes that the object should be sent to (or loaded in) should be happy to see a 1.1 object if they support up to 1.2.

On the other hand, if conductors were older than other nodes, a new object could be stored in the database which would confuse the conductor when it went to deserialize it.

In short, there should be no problem storing older objects in the database as long as we can still support their major version. It should be identical to receiving that older object over RPC from an older node.

Revision history for this message
Hans Lindgren (hanlind) wrote :

If I read the logs correctly, when trying to lazy-load instance.numa_topology it ends up calling the remotable class method InstanceNUMATopology.get_by_instance_uuid and in conductor method object_class_action_versions it gets the 1.1 object. But then before returning to the client it tries to honor what the client is asking for. Here the version manifest has 1.2 and we blow up.

Revision history for this message
Chris Stone (cstone-0) wrote :

Dan, in this case the conductors were definitely updated to the newer version prior to updating compute nodes, which were the last ones. Both nova-conductor and nova-compute are reporting the same --version across all hosts.

Revision history for this message
Dan Smith (danms) wrote :

Ah, okay, makes sense Hans. Are you cooking up a patch for this?

Revision history for this message
Hans Lindgren (hanlind) wrote :

Ok, I'll look into it.

Changed in nova:
assignee: nobody → Hans Lindgren (hanlind)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/335629

Hans Lindgren (hanlind)
tags: added: liberty-backport-potential mitaka-backport-potential
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/335629
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1e3e7309997f90fbd0291c05cc859dd9ac0ef161
Submitter: Jenkins
Branch: master

commit 1e3e7309997f90fbd0291c05cc859dd9ac0ef161
Author: Hans Lindgren <email address hidden>
Date: Wed Jun 29 20:29:47 2016 +0200

    Do not try to backport when db has older object version

    Instance extras are stored as serialized objects in the database and
    because of this it is possible to get a version back that is older
    than what the client requested. This is in itself not a problem, but
    the way we do things right now in conductor we end up trying to
    backport to a newer version, which raises InvalidTargetVersion.

    This change adds a check to make sure we skip backporting if the
    requested version is newer than the actual db version as long as the
    major version is the same.

    Change-Id: I34ac0abd016b72d585f83ae2ce34790751082180
    Closes-Bug: #1596119

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :

Can we get this patch back-ported to Mitaka as well?

Revision history for this message
Bjoern (bjoern-t) wrote :

This is critical enough to get back ported into all possible releases

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/387214

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/387249

Tony Breeds (o-tony)
tags: removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/387249
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7b15ecc22fee0aa4d628a6e0d75850a3275bdaf6
Submitter: Jenkins
Branch: stable/newton

commit 7b15ecc22fee0aa4d628a6e0d75850a3275bdaf6
Author: Hans Lindgren <email address hidden>
Date: Wed Jun 29 20:29:47 2016 +0200

    Do not try to backport when db has older object version

    Instance extras are stored as serialized objects in the database and
    because of this it is possible to get a version back that is older
    than what the client requested. This is in itself not a problem, but
    the way we do things right now in conductor we end up trying to
    backport to a newer version, which raises InvalidTargetVersion.

    This change adds a check to make sure we skip backporting if the
    requested version is newer than the actual db version as long as the
    major version is the same.

    Change-Id: I34ac0abd016b72d585f83ae2ce34790751082180
    Closes-Bug: #1596119
    (cherry picked from commit 1e3e7309997f90fbd0291c05cc859dd9ac0ef161)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/387214
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a54c4f51e753fda5c6aeb103181d815719a4b532
Submitter: Jenkins
Branch: stable/mitaka

commit a54c4f51e753fda5c6aeb103181d815719a4b532
Author: Hans Lindgren <email address hidden>
Date: Wed Jun 29 20:29:47 2016 +0200

    Do not try to backport when db has older object version

    Instance extras are stored as serialized objects in the database and
    because of this it is possible to get a version back that is older
    than what the client requested. This is in itself not a problem, but
    the way we do things right now in conductor we end up trying to
    backport to a newer version, which raises InvalidTargetVersion.

    This change adds a check to make sure we skip backporting if the
    requested version is newer than the actual db version as long as the
    major version is the same.

    Change-Id: I34ac0abd016b72d585f83ae2ce34790751082180
    Closes-Bug: #1596119
    (cherry picked from commit 1e3e7309997f90fbd0291c05cc859dd9ac0ef161)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.2

This issue was fixed in the openstack/nova 14.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b1

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.2

This issue was fixed in the openstack/nova 14.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.3

This issue was fixed in the openstack/nova 13.1.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.