Live migration should throttle itself

Bug #1478108 reported by Dan Smith on 2015-07-24
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Dan Smith

Bug Description

Nova will accept an unbounded number of live migrations for a single host, which will result in timeouts and failures (at least for libvirt). Since live migrations are seriously IO intensive, allowing this to be unlimited is just never going to be the right thing to do, especially when we have functions in our own client to live migrate all instances to other hosts (nova host-evacuate-live).

We recently added a build semaphore to allow capping the number of parallel builds being attempted on a compute host for a similar reason. This should be the same sort of thing for live migration.

Dan Smith (danms) on 2015-07-24
Changed in nova:
importance: Undecided → Low
tags: added: live-migration

In case anyone wonders which build semaphore is meant, review [1] introduced the config option "max_concurrent_builds".

[1] https://review.openstack.org/#/c/153004/

Fix proposed to branch: master
Review: https://review.openstack.org/212065

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: New → In Progress

Reviewed: https://review.openstack.org/212065
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2c0a306632351fd5bf35ff0ec3f0a133fbe8f1ac
Submitter: Jenkins
Branch: master

commit 2c0a306632351fd5bf35ff0ec3f0a133fbe8f1ac
Author: Dan Smith <email address hidden>
Date: Tue Aug 11 12:30:14 2015 -0700

    Limit parallel live migrations in progress

    This patch extends the previous one[1] to allow limiting the total number of parallel
    builds that nova-compute will attempt to cover live migrations. Since we can now
    block immediately on the semaphore, this also implements the behavior we have in
    build, which spawns a new thread for the process so that we don't starve our
    RPC workers waiting on the semaphore. In reality, live migrations take a long time,
    so this was something we should have already had.

    Further, as soon as we receive the request to do the live migration, we mark the
    migration object as status='queued' to indicate that it's waiting for its turn
    on the compute node. Once we're given a slot to run, the normal status='preparing'
    will be set. This will allow an operator to monitor the status of queued and
    running migrations.

    This includes a change to the libvirt driver to avoid spawning another thread
    for the live migrations process. That makes it synchronous from the perspective
    of compute manager, and in line with all the other drivers that support the
    operation. Since compute manager now spawns the thread, libvirt is unaffected
    and the other drivers avoid potentially starving the RPC worker pool as well.

    [1] Commit 5a542e770648469b0fbb638f6ba53f95424252ec

    DocImpact: Adds a new configuration variable to limit parallel live migrations.
               Zero means "unlimited" and nonzero means "this many in parallel".

    Closes-Bug: #1478108
    Change-Id: Ia8a796372746b7fc75485dc2e663f270dbd5893a

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2015-09-03
Changed in nova:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-10-15
Changed in nova:
milestone: liberty-3 → 12.0.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers