Booting instance failed with Kilo stable compute node and liberty controller

Bug #1500289 reported by Alex Xu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Alex Xu

Bug Description

When booting instance will get error as below in kilo compute node

 2015-09-27 06:31:13.528 ERROR nova.compute.manager [req-aa4fc705-3d6e-4913-9b6b-49b0371d1b83
  None None] [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4] Instance failed to spawn
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4] Traceback (most recent call last):
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/compute/manager.py", line 2442, in _build_resources
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     yield resources
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/compute/manager.py", line 2314, in _build_and_run_instance
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     block_device_info=block_device_info)
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2351, in spawn
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     write_to_disk=True)
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4172, in _get_guest_xml
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     context)
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3982, in _get_guest_config
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     pci_devs = pci_manager.get_instance_pci_devs(instance, 'all')
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/pci/manager.py", line 279, in get_instance_pci_devs
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     pci_devices = inst.pci_devices
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/objects/base.py", line 72, in getter
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     self.obj_load_attr(name)
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/objects/instance.py", line 1000, in obj_load_attr
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     self._load_generic(attrname)
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]   File "/opt/stack/nova/nova/objects/instance.py", line 890, in _load_generic
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4]     reason='loading %s requires recursion' % attrname)
 2015-09-27 06:31:13.528 TRACE nova.compute.manager [instance: 712e49b9-baef-4e7c-9ece-63a6f9f074e4] ObjectActionError: Object action obj_load_attr failed because: loading pci_devices requires recursion

This is due to https://review.openstack.org/#/c/202616/

The kilo node can't understand 1.2 PCIDeviceList.

Then when kilo compute node retry building instance request send to liberty conductor will get error as below:

 2015-09-27 02:21:23.340 ERROR oslo_messaging.rpc.dispatcher [req-f5e20f63-bdaf-417d-9843-b96c6ef3fae6 admin admin] Exception during message handling: 'instance_type_memory_mb'
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/conductor/manager.py", line 715, in build_instances
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher     instances)
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/scheduler/utils.py", line 63, in build_request_spec
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher     instance_type = flavors.extract_flavor(instance)
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/flavors.py", line 290, in extract_flavor
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher     setattr(flavor, key, sys_meta[type_key])
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher KeyError: 'instance_type_memory_mb'
 2015-09-27 02:21:23.340 TRACE oslo_messaging.rpc.dispatcher

This is due to https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L59 didn't return True anymore, because the kilo node renturn InstanceV1, not InstanceV2.

Alex Xu (xuhj)
Changed in nova:
assignee: nobody → Alex Xu (xuhj)
importance: Undecided → Critical
status: New → In Progress
tags: added: liberty-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/228299

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/228304

Changed in nova:
assignee: Alex Xu (xuhj) → John Garbutt (johngarbutt)
Changed in nova:
assignee: John Garbutt (johngarbutt) → Alex Xu (xuhj)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Alex Xu (<email address hidden>) on branch: master
Review: https://review.openstack.org/228299

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/228304
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=093272489ceb13d4df26916c7ed8b66e39f7c1a4
Submitter: Jenkins
Branch: master

commit 093272489ceb13d4df26916c7ed8b66e39f7c1a4
Author: He Jie Xu <email address hidden>
Date: Mon Sep 28 13:24:06 2015 +0800

    Correct Instance type check to work with InstanceV1

    When Liberty controller running with Kilo compute node, the Kilo
    compute node will send the InstanceV1 to the Liberty controller.
    Then all the checks like "isinstance(instance, objects.Instance)"
    on the Liberty controller won't works. Because objects.Instance
    pointed to objects.InstanceV2 now.

    This patch corrects all the Instance type check to make sure the
    upgrade works.

    Change-Id: Ib431c08dc6934631a71ef55c564fbbf4bde22642
    Closes-bug: #1500289

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/228774

Revision history for this message
John Garbutt (johngarbutt) wrote :

So the impact here is more about when you retry a build, if use use the latest stable/kilo?

Revision history for this message
John Garbutt (johngarbutt) wrote :

oops, that was not a question, it was an update

Changed in nova:
milestone: none → liberty-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/228774
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3be67646c46f7ca58b570770074c1f269c187a30
Submitter: Jenkins
Branch: stable/liberty

commit 3be67646c46f7ca58b570770074c1f269c187a30
Author: He Jie Xu <email address hidden>
Date: Mon Sep 28 13:24:06 2015 +0800

    Correct Instance type check to work with InstanceV1

    When Liberty controller running with Kilo compute node, the Kilo
    compute node will send the InstanceV1 to the Liberty controller.
    Then all the checks like "isinstance(instance, objects.Instance)"
    on the Liberty controller won't works. Because objects.Instance
    pointed to objects.InstanceV2 now.

    This patch corrects all the Instance type check to make sure the
    upgrade works.

    Change-Id: Ib431c08dc6934631a71ef55c564fbbf4bde22642
    Closes-bug: #1500289
    (cherry picked from commit 093272489ceb13d4df26916c7ed8b66e39f7c1a4)

tags: added: in-stable-liberty
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-rc2 → 12.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/235181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (10.6 KiB)

Reviewed: https://review.openstack.org/235181
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d8a3b9d6408627ce9a293d1e62d003757af655a1
Submitter: Jenkins
Branch: master

commit 6df6ad3ff32f2b1fe2978df1032002548ad8eb66
Author: Davanum Srinivas <email address hidden>
Date: Wed Oct 7 08:11:35 2015 -0700

    Omnibus stable/liberty fix

    There are currently 3 different blocking issues in stable/liberty due
    to library releases: webob 1.5, oslo.db 3.0.0, and
    oslo.versionedobjects 0.11.0. This is a squashed fix for all of them
    as none can land without the others.

    Issue #1 - oslo.db

    Add testresources used by oslo.db fixture

    If we use oslo.db fixtures, we'll need the package or
    the next version of oslo.db release will break us.

    (Cherry-picked from 4bcc26487837b7ece7797f88622dea1b6d09bd94)

    Closes-Bug: #1503501

    Issue #2 - oslo.versionedobjects

    Drop unused obj_to_primitive() override

    This was a band-aid override until o.vo gained the obj_relationships fix
    that this method overrides. That has been in place since o.vo 0.8.0, which
    means this is long since no longer necessary (and is actually blocking our
    ability to absorb bug fixes to this code in o.vo). Further, we no longer
    use this directly because we're doing backports based on version manifests,
    which means we no longer consult child_versions _or_ obj_relationships.

    (cherry picked from commit 142f1d9cc4ace90956c665c40b1f78795f9f7e29)

    Issue #3 - webob

    Default ConvertedException code to 500

    webob 1.5.0 released on 10/11 has change f6c749011 which
    strictly enforces status codes in exceptions, and 0 is not
    a valid status code so tests fail.

    Change the default to 500 to match the default in the parent
    class in webob.

    Closes-Bug: #1505153
    (cherry picked from commit 10438c0fc34bd088e018e1a5e8ec57b396528792)

    Change-Id: I1e06e77308a7dd23209124f0807d61fb52470188

commit 606204354b5ed96852240020769c81acda9f9fc8
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 5 20:32:58 2015 +0000

    Revert "[libvirt] Move cleanup of imported files to imagebackend"

    This reverts commit 9ba70756de326ffaa8be43acfde12cad04ed0af2

    The change introduced an UnboundLocalError if we fail to
    create the config_drive_image variable. Also, the original
    change didn't have any unit tests and came late in the
    liberty release so I don't really want to mess with fixing
    this given we need the fix in liberty-rc2.

    Change-Id: Ia7b70aa139b67cf58b5c0f9fbcd2a4deb465914e
    Closes-Bug: #1502961

commit ef655379445693443146f8a3ed31cabb011d9937
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Oct 8 06:41:06 2015 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: Idcac653033ab9808e06451a0dd690db4736834b2

commit eda3029aa74932f421d2992ac24f5ac3c92f347c
Author: Dan Smith <email address hidden>
Date: Tue Oct 6 10:58:18 2...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.