Instance without "extra" data crashes nova-compute

Bug #1446082 reported by Christoph Dwertmann
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dan Smith
Kilo
Fix Released
High
Dan Smith

Bug Description

I'm upgrading from Icehouse to Kilo. I have a single instance that was created in Icehouse. After the upgrade, nova-compute crashes because it's looking for instance "extra" data that is not there.

To fix this, we need to check if there is any "extra" data for the instance before trying to read properties such as "numa_topology".

# dpkg -l | grep nova
ii nova-common 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python-nova 1:2015.1~rc1-0ubuntu1~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu1~cloud0 all client library for OpenStack Compute API

nova-compute.log:
2015-04-20 17:35:09.214 15508 DEBUG oslo_concurrency.lockutils [req-43d3110a-cac7-425e-842c-f725bda91c10 - - - - -] Lock "compute_resources" acquired by "_update_available_resource" :: waited 0.000s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:444
2015-04-20 17:35:09.299 15508 DEBUG oslo_concurrency.lockutils [req-43d3110a-cac7-425e-842c-f725bda91c10 - - - - -] Lock "compute_resources" released by "_update_available_resource" :: held 0.085s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:456
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
    timer()
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 497, in run_service
    service.start()
  File "/usr/lib/python2.7/dist-packages/nova/service.py", line 183, in start
    self.manager.pre_start_hook()
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1287, in pre_start_hook
    self.update_available_resource(nova.context.get_admin_context())
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6236, in update_available_resource
    rt.update_available_resource(context)
  File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 402, in update_available_resource
    self._update_available_resource(context, resources)
  File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 445, in inner
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 436, in _update_available_resource
    'numa_topology'])
  File "/usr/lib/python2.7/dist-packages/nova/objects/base.py", line 163, in wrapper
    result = fn(cls, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 1152, in get_by_host_and_node
    expected_attrs)
  File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 1068, in _make_instance_list
    expected_attrs=expected_attrs)
  File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 501, in _from_db_object
    db_inst.get('extra').get('numa_topology'))
AttributeError: 'NoneType' object has no attribute 'get'
2015-04-20 17:35:09.301 15508 ERROR nova.openstack.common.threadgroup [req-12483464-12a6-4b74-a671-bc6bb943b265 - - - - -] 'NoneType' object has no attribute 'get'
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 145, in wait
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup x.wait()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 47, in wait
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 175, in wait
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 121, in wait
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 497, in run_service
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup service.start()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/service.py", line 183, in start
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup self.manager.pre_start_hook()
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1287, in pre_start_hook
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup self.update_available_resource(nova.context.get_admin_context())
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6236, in update_available_resource
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup rt.update_available_resource(context)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 402, in update_available_resource
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup self._update_available_resource(context, resources)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 445, in inner
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 436, in _update_available_resource
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup 'numa_topology'])
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/objects/base.py", line 163, in wrapper
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup result = fn(cls, context, *args, **kwargs)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 1152, in get_by_host_and_node
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup expected_attrs)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 1068, in _make_instance_list
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup expected_attrs=expected_attrs)
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 501, in _from_db_object
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup db_inst.get('extra').get('numa_topology'))
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup AttributeError: 'NoneType' object has no attribute 'get'
2015-04-20 17:35:09.301 15508 TRACE nova.openstack.common.threadgroup

Tags: ops
Changed in nova:
assignee: nobody → Christoph Dwertmann (cdwertmann)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/175298

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Christoph Dwertmann (<email address hidden>) on branch: master
Review: https://review.openstack.org/175288

Revision history for this message
Christoph Dwertmann (cdwertmann) wrote :

I had many problems during the nova DB migration from icehouse to kilo. Even though I ran the migration tool, instances were missing their "extra" data that contain the flavor information. Most of this could be fixed by running the the "migrate_flavor_data" migration, however for some instances I had to manually create the "extra" data in the DB.

The patch I submitted above seemed to work fine at first, but when deleting VMs later on I ran into this issue in nova-compute:

2015-04-23 13:50:12.213 38580 TRACE oslo_messaging.rpc.dispatcher DetachedInstanceError: Parent instance <InstanceExtra at 0x7f3a4c05e290> is not bound to a Session; deferred load operation of attribute 'pci_requests' cannot proceed

Therefore I'm withdrawing the patch.

I also had a similar issue where instances were missing the "info_cache" and the code in nova/objects/instance.py just assumes that it's there and crashes if the key is not in the dict. I think nova/objects/instance.py needs to be more resilient when it comes to missing dictionary items.

Revision history for this message
cds (craigshannonx) wrote :

Currently hitting this upgrading from Juno->Kilo.

Is there still a fix in progress?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Did you run nova-manage migrate-flavor-data?

https://wiki.openstack.org/wiki/ReleaseNotes/Kilo#Upgrade_Notes_2

"After fully upgrading to kilo (i.e. all nodes are running kilo code), you should start a background migration of flavor information from its old home to its new home. Kilo conductor nodes will do this on the fly when necessary, but the rest of the idle data needs to be migrated in the the background. This is critical to complete before the Liberty release, where support for the old location will be dropped. Use "nova-manage migrate-flavor-data" to perform this transition."

Revision history for this message
cds (craigshannonx) wrote :

nova-manage db migrate_flavor_data 50
No handlers could be found for logger "oslo_config.cfg"
%(total)i instances matched query, %(done)i completed {'total': 1, 'done': 1}

Restart nova-compute on compute node and same error:
 File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 501, in _from_db_object
2015-05-13 11:24:27.109 14189 TRACE nova.openstack.common.threadgroup db_inst.get('extra').get('numa_topology'))

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/182787

Changed in nova:
assignee: Christoph Dwertmann (cdwertmann) → Dan Smith (danms)
Matt Riedemann (mriedem)
tags: added: kilo-backport-potential
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/183127

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/182787
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1bfb65d5ac8f7180edf3c4ecea09c8e15982bf27
Submitter: Jenkins
Branch: master

commit 1bfb65d5ac8f7180edf3c4ecea09c8e15982bf27
Author: Dan Smith <email address hidden>
Date: Wed May 13 11:52:10 2015 -0700

    Fix loading things in instance_extra for old instances

    If we don't have db_inst['extra'] then we can't load those things. We
    should just set them to None instead of exploding.

    Change-Id: I2ace62158faaa0b6a7df3c32f2cb9f235d178013
    Closes-Bug: #1446082

Changed in nova:
status: In Progress → Fix Committed
Alan Pevec (apevec)
tags: removed: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/183127
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8ba0159a0e29d84f8b02a9e037f8fc4585670252
Submitter: Jenkins
Branch: stable/kilo

commit 8ba0159a0e29d84f8b02a9e037f8fc4585670252
Author: Dan Smith <email address hidden>
Date: Wed May 13 11:52:10 2015 -0700

    Fix loading things in instance_extra for old instances

    If we don't have db_inst['extra'] then we can't load those things. We
    should just set them to None instead of exploding.

    Conflicts:
     nova/objects/instance.py

    Change-Id: I2ace62158faaa0b6a7df3c32f2cb9f235d178013
    Closes-Bug: #1446082
    (cherry picked from commit 1bfb65d5ac8f7180edf3c4ecea09c8e15982bf27)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Kevin L. Mitchell (<email address hidden>) on branch: master
Review: https://review.openstack.org/175298
Reason: Uncorrected unit test failures, idle since end of April. Please feel free to re-open if you get time to fix up the change.

Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
Tom Fifield (fifieldt)
tags: added: ops
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.