Can't delete instance with numa_topology after upgrading from kilo

Bug #1596119 reported by Chris Stone on 2016-06-25
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Hans Lindgren
Nominated for Liberty by Hans Lindgren
Nominated for Mitaka by Hans Lindgren

Bug Description

Using the RDO Kilo version of Nova (2015.1.0-3) with dedicated cpu pinning populates the numa_topology database field with data at "nova_object.version": "1.1". After upgrading to Liberty a new instance will be created with a 1.2 object version however already existing instances created under Kilo remain at 1.1. Attempting to do many actions on these instances (start, delete) will fail with the following error: RemoteError: Remote error: InvalidTargetVersion Invalid target version 1.2.

Updating from Kilo to Mitaka produces the same problem.

Chris Stone (cstone-0) wrote :

Attaching log info from nova compute during an attempted start of a stopped instance

tags: added: upgrades
tags: added: unified-objects

The project "oslo.versionedobjects" introduced this check with commit [1] in the Liberty timeframe.

References:
[1] https://github.com/openstack/oslo.versionedobjects/commit/028aad5b44470a7f57e0f5c1a6fbf9594a9c6947

Dan Smith (danms) wrote :

Can you give more detail about your service arrangement after the upgrade to Liberty? Are _all_ services upgraded at once (offline)? From your log snippet, it looks like maybe you upgraded the compute nodes but the conductor nodes are old?

Changed in nova:
status: New → Incomplete
Hans Lindgren (hanlind) wrote :

I believe the problem is how instance extras are stored in the db but never gets migrated during upgrades. In this case, when the instance is initially created in Kilo its numa_topology is stored in the db as a serialized NumaTopology object with version 1.1. After upgrading to Liberty, when the instance is fetched from db, the same serialized 1.1 object is returned but now the conductor don't know how to forward port it to the current/requested 1.2 version before returning it back to the caller, hence the raised exception.

Changed in nova:
status: Incomplete → Confirmed
Dan Smith (danms) wrote :

Hans, I don't understand how that could be the case. Why would the conductor need to forward port it? All the nodes that the object should be sent to (or loaded in) should be happy to see a 1.1 object if they support up to 1.2.

On the other hand, if conductors were older than other nodes, a new object could be stored in the database which would confuse the conductor when it went to deserialize it.

In short, there should be no problem storing older objects in the database as long as we can still support their major version. It should be identical to receiving that older object over RPC from an older node.

Hans Lindgren (hanlind) wrote :

If I read the logs correctly, when trying to lazy-load instance.numa_topology it ends up calling the remotable class method InstanceNUMATopology.get_by_instance_uuid and in conductor method object_class_action_versions it gets the 1.1 object. But then before returning to the client it tries to honor what the client is asking for. Here the version manifest has 1.2 and we blow up.

Chris Stone (cstone-0) wrote :

Dan, in this case the conductors were definitely updated to the newer version prior to updating compute nodes, which were the last ones. Both nova-conductor and nova-compute are reporting the same --version across all hosts.

Dan Smith (danms) wrote :

Ah, okay, makes sense Hans. Are you cooking up a patch for this?

Hans Lindgren (hanlind) wrote :

Ok, I'll look into it.

Changed in nova:
assignee: nobody → Hans Lindgren (hanlind)
status: Confirmed → In Progress
Hans Lindgren (hanlind) on 2016-06-30
tags: added: liberty-backport-potential mitaka-backport-potential
Changed in nova:
importance: Undecided → High

Reviewed: https://review.openstack.org/335629
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1e3e7309997f90fbd0291c05cc859dd9ac0ef161
Submitter: Jenkins
Branch: master

commit 1e3e7309997f90fbd0291c05cc859dd9ac0ef161
Author: Hans Lindgren <email address hidden>
Date: Wed Jun 29 20:29:47 2016 +0200

    Do not try to backport when db has older object version

    Instance extras are stored as serialized objects in the database and
    because of this it is possible to get a version back that is older
    than what the client requested. This is in itself not a problem, but
    the way we do things right now in conductor we end up trying to
    backport to a newer version, which raises InvalidTargetVersion.

    This change adds a check to make sure we skip backporting if the
    requested version is newer than the actual db version as long as the
    major version is the same.

    Change-Id: I34ac0abd016b72d585f83ae2ce34790751082180
    Closes-Bug: #1596119

Changed in nova:
status: In Progress → Fix Released

Can we get this patch back-ported to Mitaka as well?

Bjoern Teipel (bjoern-teipel) wrote :

This is critical enough to get back ported into all possible releases

Tony Breeds (o-tony) on 2016-10-18
tags: removed: liberty-backport-potential

Reviewed: https://review.openstack.org/387249
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7b15ecc22fee0aa4d628a6e0d75850a3275bdaf6
Submitter: Jenkins
Branch: stable/newton

commit 7b15ecc22fee0aa4d628a6e0d75850a3275bdaf6
Author: Hans Lindgren <email address hidden>
Date: Wed Jun 29 20:29:47 2016 +0200

    Do not try to backport when db has older object version

    Instance extras are stored as serialized objects in the database and
    because of this it is possible to get a version back that is older
    than what the client requested. This is in itself not a problem, but
    the way we do things right now in conductor we end up trying to
    backport to a newer version, which raises InvalidTargetVersion.

    This change adds a check to make sure we skip backporting if the
    requested version is newer than the actual db version as long as the
    major version is the same.

    Change-Id: I34ac0abd016b72d585f83ae2ce34790751082180
    Closes-Bug: #1596119
    (cherry picked from commit 1e3e7309997f90fbd0291c05cc859dd9ac0ef161)

tags: added: in-stable-newton

Reviewed: https://review.openstack.org/387214
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a54c4f51e753fda5c6aeb103181d815719a4b532
Submitter: Jenkins
Branch: stable/mitaka

commit a54c4f51e753fda5c6aeb103181d815719a4b532
Author: Hans Lindgren <email address hidden>
Date: Wed Jun 29 20:29:47 2016 +0200

    Do not try to backport when db has older object version

    Instance extras are stored as serialized objects in the database and
    because of this it is possible to get a version back that is older
    than what the client requested. This is in itself not a problem, but
    the way we do things right now in conductor we end up trying to
    backport to a newer version, which raises InvalidTargetVersion.

    This change adds a check to make sure we skip backporting if the
    requested version is newer than the actual db version as long as the
    major version is the same.

    Change-Id: I34ac0abd016b72d585f83ae2ce34790751082180
    Closes-Bug: #1596119
    (cherry picked from commit 1e3e7309997f90fbd0291c05cc859dd9ac0ef161)

tags: added: in-stable-mitaka

This issue was fixed in the openstack/nova 14.0.2 release.

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

This issue was fixed in the openstack/nova 14.0.2 release.

This issue was fixed in the openstack/nova 13.1.3 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers