launch a new vm fail in source host after live migration

Bug #1267862 reported by lirenke
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Nova version:Havana

We have two compute nodes: Host A and Host B. Each one have 100G disk, 4 cpus and 2G mem for example.

First, launch an image-backed instance named vm-1 in Host A using the flavor successfully. The flavor's specs is 60G disk, 2cpus and 1G mem. So, obviously the free resource on Host A is 40G disk, 2 cpus and 1 mem.

Second, doing live migration with block migation flag from Host A to Host B.Then success. Now, no active instance exist on Host A.

But, the problem is the free resource on Host A is still 40G disk, 2 cpu and 1G mem. The resource described in compute_nodes table don't add back.

Then adding another new instance named vm-2 to Host A using same flavor as vm-1. We are notified that resource is insuffieint on Host A.(40G<60G disk denied)

Notice that the data would be correct after next priodic task of update_available_resource. Wihin the interval time, it means we can't deploy another instance, but it can be in fact.

I think the resource should be recaculate immediatly on Host A, otherwise it may affects vm delpoyment.

Revision history for this message
lirenke (lvhancy) wrote :

I think call update_available_resource function explicitly in _post_live_migration after the vm's host change to dest in DB is the way to solve this bug.

Rohit Karajgi (rohitk)
tags: added: compute
Revision history for this message
Rohit Karajgi (rohitk) wrote :

Post live migration does not seem to update the available resources after completing the request. I think this should happen after every successful live migration request and not only during periodic task iterations.

Changed in nova:
status: New → Confirmed
Revision history for this message
lirenke (lvhancy) wrote :

Yes, I also think this should happen after every successful live migration request. _post_live_migration would be called after successful migration, so I think call update_available_resource explicitly in _post_live_migration is a easy way to fix it.
Meanwhile, we can use same way in dest host to sync dest's resource immedietly(within rpc method at dest in _post_live_migration).

Changed in nova:
assignee: nobody → Verónica Musso (veronica-a-musso)
lirenke (lvhancy)
Changed in nova:
assignee: Verónica Musso (veronica-a-musso) → lirenke (lvhancy)
Changed in nova:
assignee: lirenke (lvhancy) → Tiago Rodrigues de Mello (timello)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/70150
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ab1e48f4683315db631be3f0995be6258edf6997
Submitter: Jenkins
Branch: master

commit ab1e48f4683315db631be3f0995be6258edf6997
Author: Tiago Mello <email address hidden>
Date: Thu Jan 30 13:47:34 2014 -0200

    Updates available resources after live migration

    A new VM can't be deployed using resources of the VM that has been migrated
    to a new host before the periodic task is called after a live migration.

    This change calls update_available_resource() after live
    migration is completed even before the next periodic task takes place.

    Change-Id: I4c879dcfb1e76cfdc95612b6d8010c1081ac45b9
    Closes-Bug: #1267862

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-3 → 2014.2
Revision history for this message
Bart Wensley (bartwensley) wrote :

This fix didn't address the problem. The instance will still be located on the source node at the time update_available_resource is called (in _post_live_migration), so update_available_resource still removes the resources consumed by this instance from the host resources. The post_live_migration_at_destination RPC will update the location, but it is an async RPC and will not have completed when update_available_resource is run.

Can someone re-open this bug or do I need to file a new one?

Revision history for this message
Bart Wensley (bartwensley) wrote :

I reset the status of this bug to confirmed since I have verified that the problem still exists in kilo (and I assume also in juno).

Changed in nova:
status: Fix Released → Confirmed
tags: added: live-migrate
Revision history for this message
Chris Friesen (cbf123) wrote :

This is not a duplicate bug, since it's talking about freeing up the resources on the source host after a migration, not claiming resources on the destination.

Revision history for this message
Chris Friesen (cbf123) wrote :

I'm pretty sure we still have a problem in current master (Newton). In nova.compute.manager.ComputeManager._post_live_migration() we call self.compute_rpcapi.post_live_migration_at_destination(), which is an RPC cast so we have no idea when it'll actually run. (That routine is what actually updates instance.host to point to the new destination.)

Then we call self.update_available_resource(). If post_live_migration_at_destination() hasn't run yet, then the instance.host will still be be the source host and we'll account for its resources in nova.compute.resource_tracker.ResourceTracker._update_available_resource() when we call self._update_usage_from_instances(). This basically means that we called self.update_available_resource() for nothing.

The only sure fix is to make the call to post_live_migration_at_destination() an RPC cast. Barring that, we could put in a delay before calling self.update_available_resource(), or maybe move it down to the bottom of the function to increase the likelihood that post_live_migration_at_destination() has run by the time we get to it.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
assignee: Tiago Mello (timello) → nobody
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.