Creating large numbers of instances can lead to a timeout while waiting for a response from the scheduler

Bug #1036911 reported by Vish Ishaya
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Vish Ishaya

Bug Description

Using a high max_count can cause the scheduler to take more than 60 seconds to make the required casts which causes an rpc timeout in the compute api.

This needs to be fixed by optimizing the scheduler path or by switching to creating everything locally and casting to the scheduler (like we do for one instance).

Changed in nova:
importance: Undecided → High
status: New → In Progress
assignee: nobody → Vish Ishaya (vishvananda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/11379

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/11379
Committed: http://github.com/openstack/nova/commit/8718f8e47d7d0504724495538eb320be3e209180
Submitter: Jenkins
Branch: master

commit 8718f8e47d7d0504724495538eb320be3e209180
Author: Vishvananda Ishaya <email address hidden>
Date: Tue Aug 14 17:59:06 2012 -0700

    Always create the run_instance records locally

    Currently a request for multiple instances issent to the scheduler,
    where it is written to the database. It appears that this was done so
    that more advanced schedulers could handle the request as one
    batch, but the result is the scheduler is sometimes slow enough
    that the call will timeout.

    Instead this converts to always creating the instance records
    locally and making run_instance into a casting instead of a call.

    This made a small change to the rpc api for run instance, so the
    version was bumped. Legacy messages are still handled properly.

    Fixes bug 1036911

    Co-authored-by: Chris Behrens <email address hidden>

    Change-Id: I63bbc98c285faebec53f8e62857c01548807db68

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.