OpenStack Compute (nova)

Nova show will not display NoValidHost with right exception traces

Bug #1369818 reported by zhu zhu on 2014-09-16

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Medium	Unassigned

Bug Description

As for the nova scheduler for scheduler multiple attempts, If with certain host deployment attempt failed raise with detail exceptions, nova scheduler will choose other host to retry.

But after all attempts are tried. it will raise a Generic NoValidHost exception without a proper message. It will make nova show <instance> will not display useful information straightforward to end users.

So it's suggested to wrap the NoValidHost exception message with last attempt failure exception detail trace.

For example,
When using nova vmware driver to spawn a VM with the disk larger than the datastore upper limit, it will raise an exception
for DatastoreNotFound exception with detail, but after scheduler retries, it will got lost from nova show. So it would be friendly to have operators to view such error directly from the nova show instead for digging into the scheduler log.

filter_scheduler.py

schedule_run_instance

for num, instance_uuid in enumerate(instance_uuids):
request_spec['instance_properties']['launch_index'] = num

            try:
                try:
                    weighed_host = weighed_hosts.pop(0)
                    LOG.info(_("Choosing host %(weighed_host)s "
                                "for instance %(instance_uuid)s"),
                              {'weighed_host': weighed_host,
                               'instance_uuid': instance_uuid})
                except IndexError:
                    raise exception.NoValidHost(reason="")

Tags:

zhu zhu (zhuzhubj) on 2014-09-16

summary:	- Nova show will not display NoValidHost with detail exception traces + Nova show will not display NoValidHost with right exception traces
tags:	added: scheduler

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-16: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/121739

Changed in nova:
assignee:	nobody → zhu zhu (zhuzhubj)
status:	New → In Progress

Davanum Srinivas (DIMS) (dims-v) on 2014-09-16

Changed in nova:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-23: Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/121739
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2015-02-12:

Removing "In Progress" status and assignee as change is abandoned.

Changed in nova:
status:	In Progress → Confirmed
assignee:	zhu zhu (zhuzhubj) → nobody

Revision history for this message

Sudipta Biswas (sbiswas7) wrote on 2015-05-06:

I was scrubbing through the list of Nova bugs, and hoping to work on a few. Looks like this one hasn't been touched upon for a while. I am assigning it to myself to work on it further.

Changed in nova:
assignee:	nobody → Sudipta Biswas (sbiswas7)

Revision history for this message

Sudipta Biswas (sbiswas7) wrote on 2015-05-13:

I feel there are lot of moving parts to this problem.
Currently, I see a discrepancy in the way the NoValidHost exception is being handled/generated.
The amount of information - we would like to provide to this exception are philosophically different and different parts of the code.

In the nova/scheduler/utils.py method - it appears that when the retries exceed the max_attempts - we are putting out a lot of information in the NoValidHost exception:

https://github.com/openstack/nova/blob/master/nova/scheduler/utils.py#L165

However, at the same time, in https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L79 ,
we seem to say - that - we shouldn't be putting out too much information.

Upon chatting with bauzas on IRC, it sounds like - we need a discussion for this at the summit.

Revision history for this message

Zhe Jing (jingzhe) wrote on 2015-07-01:

I am a maintenance egineer who is much interest in it . NoValidHost is not a clear error info to me and customer, so that I have to read lots of logs to find the reason especially there are many compute nodes.

Could we use instance_event or subaction to contain the fault reason of every compute node.

Revision history for this message

Sudipta Biswas (sbiswas7) wrote on 2015-07-07:

You can take a look at this spec: https://review.openstack.org/#/c/194204/
Additionally you can provide your feedback to Ed Leafe.

Revision history for this message

Ed Leafe (ed-leafe) wrote on 2015-07-07:

I've added a more specific exception in this patch: https://review.openstack.org/#/c/194780/

If you have suggestions for improving this, please comment on that patch.

Davanum Srinivas (DIMS) (dims-v) on 2016-03-04

Changed in nova:
assignee:	Sudipta Biswas (sbiswas7) → nobody

Manjunath Ranganathaiah (manjunath-ranganathaiah) on 2016-03-07

Changed in nova:
assignee:	nobody → Manjunath Ranganathaiah (manjunath-ranganathaiah)

Manjunath Ranganathaiah (manjunath-ranganathaiah) on 2016-03-07

Changed in nova:
assignee:	Manjunath Ranganathaiah (manjunath-ranganathaiah) → nobody

Revision history for this message

Chris Dent (cdent) wrote on 2016-03-15:

Ed's change merged 8 months ago, there's been no additional input since. Let's kill this in favour of a new bug that is more in tune with the current state of affairs and more specific about the problems that need to be solved.

Changed in nova:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.