Destination host resources not updated post live-migration

Bug #2033434 reported by Raul Moldovan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Unassigned

Bug Description

Currently host resources are updated in the DB on the following occasions:
- periodic task runs `update_available_resources`
- spawn a VM on a host (update based on falvor)
- pre/post live migration on source host

As we can see at no point the resources for the destination host are updated
in the interval set by the configuration value `update_resources_interval`.
This means that even after subsequent live-migrations to the same host,
important resources like used vCPUs, memory, disk will still look empty.

As the scheduler makes decisions based on the weights assigned to these metrics,
the same hosts are chosen over and over. The state of the hosts will remain
unbalanced until the next update periodic task executes.

Reproduce bug:

For the following values configured:
host_subset_size = 3 (on the head node)
update_resources_interval = 3600 (on every compute node)

In the span of an hour of migrations we can see that only the top 3 hosts
considered initially 'the best' are targeted until the next periodic task
runs the `update_available_resources` function.

Just like the function is called in the next stack trace executed on the
source host:

live_migration
_do_live_migration
_post_live_migration_update_host
_post_live_migration
update_available_resource

It should be executed at some point on the next path:

live_migration
_do_live_migration
_post_live_migration_update_host
_post_live_migration
compute_rpcapi.post_live_migration_at_destination
post_live_migration_at_destination

Changed in nova:
status: New → In Progress
Revision history for this message
sean mooney (sean-k-mooney) wrote (last edit ):

triaging this as medium as it does not directly break anything but it will result in a skew that degrades the ability fo the scheduled to
spread instance when configured to do so and that may have negative impact on overall workload performance with low to moderate cloud utilisation.

this is really just a performance/scale hardening opportunity rather then a bug but i don't think it qualifies as a feature so I'm ok to proceed with it as bug

Changed in nova:
importance: Undecided → Medium
tags: added: live-migration
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.