nova scheduler overwrites instance fault and masks underlying compute driver problems

Bug #1165034 reported by Guangya Liu on 2013-04-05
This bug report is a duplicate of:  Bug #1161661: Rescheduling loses reasons. Edit Remove
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Guangya Liu (Jay Lau)

Bug Description

Exceptions thrown from the driver are not being surfaced in the instance faults - e.g., when a failure occurs and you run [nova show <vm-name>], if there are some error occurred in nova scheduler, then you always see the nova scheduler exceptions of "No valid host." etc instead of the driver problems...which is especially important if/when the last compute node happens to fail. nova scheduler shouldn't be updating the instance fault in the case of when the driver was called and spawn through an error... Nova scheduler should only do this when it blocks requests prior to even invoking the driver.

From filter_scheduler.py in schedule_run_instance(), we can see that both index error or provision exception in nova scheduler can overwrite the exception in from nova compute.

Nova scheduler should not overwrite the instance exceptions from nova compute, this is very important for customer debugging.

for num, instance_uuid in enumerate(instance_uuids):
      request_spec['instance_properties']['launch_index'] = num

      try:
          try:
              weighed_host = weighed_hosts.pop(0)
          except IndexError:
              raise exception.NoValidHost(reason="") <<<<<<<<<

          self._provision_resource(context, weighed_host, <<<<<<<<<<
                                   request_spec,
                                   filter_properties,
                                   requested_networks,
                                   injected_files, admin_password,
                                   is_first_time,
                                   instance_uuid=instance_uuid)
      except Exception as ex:
          # NOTE(vish): we don't reraise the exception here to make sure
          # that all instances in the request get set to
          # error properly
          driver.handle_schedule_error(context, ex, instance_uuid,
                                       request_spec)
      # scrub retry host list in case we're scheduling multiple
      # instances:
      retry = filter_properties.get('retry', {})
      retry['hosts'] = []

Andrew Laski (alaski) on 2013-04-09
Changed in nova:
status: New → Confirmed
Tiantian Gao (gtt116) wrote :

Yes, it is a duplicate one with 1161661

Changed in nova:
assignee: nobody → Jay Lau (jay-lau-513)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers