OpenStack Compute (nova)

nova scheduler overwrites instance fault and masks underlying compute driver problems

Bug #1165034 reported by Guangya Liu on 2013-04-05

This bug report is a duplicate of: Bug #1161661: Rescheduling loses reasons. Edit Remove

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Undecided	Guangya Liu (Jay Lau)

Bug Description

Exceptions thrown from the driver are not being surfaced in the instance faults - e.g., when a failure occurs and you run [nova show <vm-name>], if there are some error occurred in nova scheduler, then you always see the nova scheduler exceptions of "No valid host." etc instead of the driver problems...which is especially important if/when the last compute node happens to fail. nova scheduler shouldn't be updating the instance fault in the case of when the driver was called and spawn through an error... Nova scheduler should only do this when it blocks requests prior to even invoking the driver.

From filter_scheduler.py in schedule_run_instance(), we can see that both index error or provision exception in nova scheduler can overwrite the exception in from nova compute.

Nova scheduler should not overwrite the instance exceptions from nova compute, this is very important for customer debugging.

for num, instance_uuid in enumerate(instance_uuids):
request_spec['instance_properties']['launch_index'] = num

      try:
          try:
              weighed_host = weighed_hosts.pop(0)
          except IndexError:
              raise exception.NoValidHost(reason="") <<<<<<<<<

          self._provision_resource(context, weighed_host, <<<<<<<<<<
                                   request_spec,
                                   filter_properties,
                                   requested_networks,
                                   injected_files, admin_password,
                                   is_first_time,
                                   instance_uuid=instance_uuid)
      except Exception as ex:
          # NOTE(vish): we don't reraise the exception here to make sure
          # that all instances in the request get set to
          # error properly
          driver.handle_schedule_error(context, ex, instance_uuid,
                                       request_spec)
      # scrub retry host list in case we're scheduling multiple
      # instances:
      retry = filter_properties.get('retry', {})
      retry['hosts'] = []