Errors in scheduler do not set instance fault

Bug #1110808 reported by Vish Ishaya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dan Smith

Bug Description

A scheduler failure will lead to no instance fault being generated and nova-scheduler will print a traceback like the following:

2013-01-30 13:39:24.407 WARNING nova.scheduler.driver [req-be42dcd9-3b2f-4a96-b20c-b8c04251f534 demo demo] [instance: 85354e08-e9e8-4f93-82c2-0e372271f4f8] Setting instance to ERROR state.
2013-01-30 13:39:24.453 WARNING nova.scheduler.manager [req-be42dcd9-3b2f-4a96-b20c-b8c04251f534 demo demo] Failed to schedule_run_instance: sequence index must be integer, not 'str'
2013-01-30 13:39:24.454 ERROR nova.openstack.common.rpc.amqp [req-be42dcd9-3b2f-4a96-b20c-b8c04251f534 demo demo] Exception during message handling2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 276, in _process_data2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 133, in dispatch2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/scheduler/manager.py", line 121, in run_instance2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp context, ex, request_spec)2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp self.gen.next()2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/scheduler/manager.py", line 109, in run_instance2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp requested_networks, is_first_time, filter_properties)2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 88, in schedule_run_instance2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp request_spec)
2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/scheduler/driver.py", line 69, in handle_schedule_error2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp new_ref, ex, sys.exc_info())2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/utils.py", line 58, in add_instance_fault_from_exc
2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp 'instance_uuid': instance['uuid'],2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp TypeError: sequence index must be integer, not 'str'2013-01-30 13:39:24.454 TRACE nova.openstack.common.rpc.amqp

This is due to a recent change (https://review.openstack.org/#/c/19950/) which changed the compute/utils.py:add_instance_fault_from_exc to take a mandatory parameter "conductor", but there are two calls in scheduler that were not modified:

https://github.com/openstack/nova/blob/master/nova/scheduler/manager.py#L192
https://github.com/openstack/nova/blob/master/nova/scheduler/driver.py#L68

this should probably be fixed to add conductor as an optional kwarg and to use db if it is not specified. Alternatively a new method could be added that doesn't use the conductor.

Changed in nova:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Dan Smith (danms)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/20843

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/20843
Committed: http://github.com/openstack/nova/commit/57154884c47143bfd6101d4b758d5e8d45966622
Submitter: Jenkins
Branch: master

commit 57154884c47143bfd6101d4b758d5e8d45966622
Author: Dan Smith <email address hidden>
Date: Wed Jan 30 18:22:45 2013 -0500

    Make scheduler modules pass conductor to add_instance_fault

    The add_instance_fault_from_exc() method was recently changed to
    take a conductor to avoid direct database access. The scheduler was
    not updated for this, and thus was not passing it in a couple of
    cases.

    This makes those calls pass a conductor LocalAPI, resulting in direct
    database access (which is desired from the scheduler). The tests that
    one might have thought would catch this didn't because they mock out
    the method itself. This fixes those and adds two tests that exercise
    the add_instance_fault path all the way down to the DB API, which
    would have caught it in the first place.

    Fixes bug 1110808

    Change-Id: If1c2988487d408a39fdf4080541f58f6bdac216c

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.