Default scheduler should be smarter than ChanceScheduler

Bug #821252 reported by Antony Messerli
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Wishlist
Joe Gordon

Bug Description

The chance scheduler currently doesn't check to see if a server has enough space before dropping an instance on it. In this case, the server was full from other instances taking up all the memory on the server. The instance went to error state and I found this traceback. I'm running rev 1377.

2011-08-05 02:29:10,487 ERROR nova.virt.xenapi.vmops [-] instance instance-00003203: not enough free memory
(nova.virt.xenapi.vmops): TRACE: None
(nova.virt.xenapi.vmops): TRACE:
2011-08-05 02:29:10,555 DEBUG nova.virt.xenapi.vmops [-] Starting VM None... from (pid=7402) _spawn /usr/lib/pymodules/python2.6/nova/virt/xenapi/vmops.py:265
2011-08-05 02:29:10,595 ERROR nova.compute.manager [-] Instance '3203' failed to spawn. Is virtualization enabled in the BIOS? Details: Attempted to power on non-existent instance bad instance id 3203
(nova.compute.manager): TRACE: Traceback (most recent call last):
(nova.compute.manager): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 357, in _run_instance
(nova.compute.manager): TRACE: self.driver.spawn(context, instance, network_info, bd_mapping)
(nova.compute.manager): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/xenapi_conn.py", line 190, in spawn
(nova.compute.manager): TRACE: self._vmops.spawn(context, instance, network_info)
(nova.compute.manager): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/xenapi/vmops.py", line 150, in spawn
(nova.compute.manager): TRACE: self._spawn(instance, vm_ref)
(nova.compute.manager): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/xenapi/vmops.py", line 266, in _spawn
(nova.compute.manager): TRACE: self._start(instance, vm_ref)
(nova.compute.manager): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/xenapi/vmops.py", line 133, in _start
(nova.compute.manager): TRACE: ' bad instance id %s') % instance.id)
(nova.compute.manager): TRACE: Exception: Attempted to power on non-existent instance bad instance id 3203
(nova.compute.manager): TRACE:

Revision history for this message
Ed Leafe (ed-leafe) wrote :

Is this a bug? ChanceScheduler is just that: a host selected by chance. There is not supposed to be any checking of suitability in this class; for that matter, it doesn't check if a host is capable of running the image OS, or is even the correct type of hypervisor.

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 821252] Re: Chance scheduler doesn't check if server has enough space for instance

I think the bug should be change the default scheduler to something that works in more situations :)

Revision history for this message
Todd Willey (xtoddx) wrote : Re: Chance scheduler doesn't check if server has enough space for instance

Then we should mark in comments and a LOG.warn about using a scheduler that will mask errors. I think it is reasonable to expect NoValidHost to be raised by _any_ scheduler when a host can't accept the instance. We should be very vocal if we're not going to blindly trust compute hosts.

Revision history for this message
Jay Pipes (jaypipes) wrote : Re: [Bug 821252] Re: Chance scheduler doesn't check if server has enough space for instance

On Fri, Aug 5, 2011 at 1:45 PM, Vish Ishaya <email address hidden> wrote:
> I think the bug should be change the default scheduler to something that
> works in more situations :)

++

There was a tweet a few weeks ago saying something to the effect of
"Nova doesn't even check to see if a machine has capacity before it
tries to put a VM on it... FAIL." Can't remember the tweet link, and I
searched for it, but that basically paraphrases it well...

-jay

Revision history for this message
Brian Waldon (bcwaldon) wrote : Re: Chance scheduler doesn't check if server has enough space for instance

Definitely agree, Vish and Jay. Should we just file a new bug to change the default scheduler?

Revision history for this message
Jay Pipes (jaypipes) wrote :

Not sure... could use this bug I suppose, just changing the description/summary...

Revision history for this message
Thierry Carrez (ttx) wrote :

Here you go.

summary: - Chance scheduler doesn't check if server has enough space for instance
+ Default scheduler should be smarter than ChanceScheduler
Changed in nova:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :

Any ideas of what scheduler to switch over to?

What about leaving MultiScheduler as default.

flags.DEFINE_string('scheduler_driver',
                    'nova.scheduler.multi.MultiScheduler',
                    'Default driver to use for the scheduler')

And DistributedScheduler for compute calls, and
         VsaScheduler for volume placement?

flags.DEFINE_string('compute_scheduler_driver',
                    'nova.scheduler.distributed_scheduler.DistributedScheduler',
                    'Driver to use for scheduling compute calls')
flags.DEFINE_string('volume_scheduler_driver',
                    'nova.scheduler.vsa.VsaScheduler',
                    'Driver to use for scheduling volume calls')

Joe Gordon (jogo)
Changed in nova:
assignee: nobody → Joe Gordon (joe-gordon0)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/4766

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/4766
Committed: http://github.com/openstack/nova/commit/7428cf5bc53c7630510644fee4ff20bb392f1331
Submitter: Jenkins
Branch: master

commit 7428cf5bc53c7630510644fee4ff20bb392f1331
Author: Joe Gordon <email address hidden>
Date: Thu Mar 1 12:59:54 2012 -0800

    fix for bug 821252. Smarter default scheduler

    compute_scheduler_driver = DistributedScheduler

    Change-Id: I8123a120afd80c2b088a387eaab8f5a99a877fe0

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → essex-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.