live_migration_permit_post_copy and PAUSED vm fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Description
===========
The combination of allowing post-copy and migrating a VM in PAUSED state fails.
Steps to reproduce
==================
Suppose you have this in /etc/nova/
live_migration_
and you try to live-migrate a qemu VM which is in PAUSED state, it fails:
openstack server migrate <UUID> --host <DEST HOST> --live-migration --os-compute-
Expected result
===============
Migration starts and succeeds
Actual result
=============
Migration fails: from /var/log/
2021-10-12 12:15:24.847 15732 ERROR nova.virt.
which is rather annoying, because post-copy is very likely not needed when migrating a VM which is not doing anything.
On the other hand, post-copy is pretty much essential to get VMs migrated that change their memory faster than it can be migrated.
Environment
===========
I am seeing this problem in Queens and in Ussuri.
# dpkg -l | grep nova
ii nova-common 2:17.0.
ii nova-compute 2:17.0.
ii nova-compute-kvm 2:17.0.
ii nova-compute-
ii python-nova 2:17.0.
ii python-novaclient 2:9.1.1-0ubuntu1 all client library for OpenStack Compute API - Python 2.7
ii python3-novaclient 2:9.1.1-0ubuntu1 all client library for OpenStack Compute API - 3.x
We are using Qemu + kvm + livirt:
ii libvirt-clients 4.0.0-1ubuntu8.19 amd64 Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.19 amd64 Virtualization daemon
ii libvirt-
ii libvirt0:amd64 4.0.0-1ubuntu8.19 amd64 library for interfacing with different virtualization systems
ii python-libvirt 4.0.0-1 amd64 libvirt Python bindings
ii python3-libvirt 4.0.0-1 amd64 libvirt Python 3 bindings
ii qemu-kvm 1:2.11+
We use shared storage (Quobyte) but that should not be relevant here.
We use Queens with Midonet and Ussuri with OVS which should not matter here either.
(In fact, Queens has a much smarter strategy to switch to post-copy than Ussuri...)
A workaround in Nova could be that it doesn't ask for VIR_MIGRATE_
(It would also be nice to be able to migrate SUSPENDED VMs...)
Before enabling post-copy, we sometimes had VMs which did not manage to get migrated in half an hour. The memory transfer statistics showed that by that time, often around 2 orders of magnitude more memory had been transferred than the VMs memory size. So the migration had been flooding the network for half an hour at high speed, wasting lots of bandwidth and making it unavailable for other VMs.
So having post-copy available is quite important. That's why I'm disappointed that Ussuri can only use it after a fixed time-out. Queens automatically starts post-copy if the normal memory migration does not make enough progress. Sometimes that already kicks in after 30 seconds. On Ussuri you have to wait (and waste bandwidth) for several minutes (and the default-time out is waaaay too long; we set it to live_migration_ completion_ timeout = 10, down from 800).