unfriendly user experience if no valid host selected in nova scheduler

Bug #1281014 reported by Haifeng, Song
40
This bug affects 9 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Yathiraj Udupi

Bug Description

nova version: 2.15.0

If no enough resource available on any computer node, command like 'nova resize instancevm 100' will exit silently with no enough error or warning message.
Users can be confused, not knowing what's wrong and what to do next.
Although, there is warning message in /var/log/conductor.log as follows, not much user can find it easily:
2014-02-17 03:43:29.000 6320 WARNING nova.scheduler.utils [req-c0d5f130-c5a9-41b7-8fe4-4d08be4cc774 9ed1534f040c43e98293f6bc6b632e96 bd5848810607480d968b6d1ca9a36637] Failed to compute_task_migrate_server: No valid host was found.
Traceback (most recent call last):

  File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/common.py", line 420, in catch_client_exception
    return func(*args, **kwargs)

  File "/usr/lib/python2.6/site-packages/nova/scheduler/manager.py", line 298, in select_destinations
    filter_properties)

  File "/usr/lib/python2.6/site-packages/nova/scheduler/filter_scheduler.py", line 148, in select_destinations
    raise exception.NoValidHost(reason='')

NoValidHost: No valid host was found.

It's better to report some error or warning message if such situation happens.

Yathiraj Udupi (yudupi)
Changed in nova:
assignee: nobody → Yathiraj Udupi (yudupi)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/75307

Revision history for this message
Wang Bo (chestack) wrote :

1.yes, I also came into this issue. The worse problem is, the response code of is still 202 even when NoValidHost exception raised. It could be reproduced by run command "nova --debug resize". The response code of NoValidHost shoud be 500, I'm curious why the result is 202.

2.I create another flavor to test if the number of vcpus is too many.
"flavor": {
              "vcpus": 1000,
              "ram": 1024,
              "disk": 20
}
Then I run "nova resize" with above flavorid. I found that /nova/filters.py method "get_filtered_objects" does not check the vcpu resource. (I mean even the requested vcpus is too many, it does not raise the NoValidHost exception.)
(Pdb) pp filter_classes
[<class 'nova.scheduler.filters.retry_filter.RetryFilter'>,
 <class 'nova.scheduler.filters.availability_zone_filter.AvailabilityZoneFilter'>,
 <class 'nova.scheduler.filters.ram_filter.RamFilter'>,
 <class 'nova.scheduler.filters.compute_filter.ComputeFilter'>,
 <class 'nova.scheduler.filters.compute_capabilities_filter.ComputeCapabilitiesFilter'>,
 <class 'nova.scheduler.filters.image_props_filter.ImagePropertiesFilter'>]

Could someone help answer my question 2? Thanks a lot!

Revision history for this message
Wang Bo (chestack) wrote :

Per comment #2, The response code is also 202 when run resize to flavor(vcpus:1000). It only report Libvirt Error in compute.log.

My environment is: (x86+Ubuntu12.04+devstack). All-in-one installation.

Revision history for this message
Wang Bo (chestack) wrote :

per comment#2,
I found that why it does not check vcpu resource is because CoreFilter is not included as default ''scheduler_default_filters' in /nova/scheduler/manager.py.

So the left problem why response is 202 when NoValidHost exception is raised.

Revision history for this message
Wang Bo (chestack) wrote :

Found that function "_cold_migrate" in nova/conducntor/manager.py lost the NoValidHost exception which will cause the problem. I will try to fix it.

Revision history for this message
Christopher Yeoh (cyeoh-0) wrote :

The resize call (and many others) are asynchronous. 202 just means that the request has been accepted, not that it has succeeded. The API returns to the client before the resize is attempted - it doesn't know if it is going to succeed or not.

In order to determine if the resize succeeded you need to poll - either through the instance actions interface or with the upcoming tasks api

Changed in nova:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.