live-migration fails for volume-backed instances with config-drive type vfat

Bug #1589457 reported by Timofey Durakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Unassigned

Bug Description

Description
===========

Volume-backed instances fails to migrate when config-drive is enabled(even with vfat).
Migration fails with exception.InvalidSharedStorage during check_can_live_migrate_source method execution https://github.com/openstack/nova/blob/545d8d8666389f33601b0b003dec844004694919/nova/virt/libvirt/driver.py#L5388

The root cause:
https://github.com/openstack/nova/blob/545d8d8666389f33601b0b003dec844004694919/nova/virt/libvirt/driver.py#L5344 - flags is calculated incorrectly.

Steps to reproduce
==================
1. use vfat as config drive format, no shared storage like nfs;
2. boot instance from volume;
3. try to live-migrate instance;

Expected result
===============
instance migrated successfully

Actual result
=============
live-migration is not even started:
root@node-1:~# nova live-migration server00 node-4.test.domain.local
ERROR (BadRequest): Migration pre-check error: Cannot block migrate instance f477e6da-4a04-492b-b7a6-e57b7823d301 with mapped volumes. Selective block device migration feature requires libvirt version 1.2.17 (HTTP 400) (Request-ID: req-4e0fce45-8b7c-43c0-90e7-cc929d2d60a1)

Environment
===========

multinode env, without file based shared storages like NFS.
driver libvirt/kvm
openstack branch stable/mitaka,
should also be valid for master.

summary: - volume-backed instances with config-drive type vfat live-migration
- fails
+ live-migration fails for volume-backed instances with config-drive type
+ vfat
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Setting to Medium, as only instances with config drive are affected and we currently don't force those to have config drives by default.

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Changed in nova:
assignee: nobody → Timofey Durakov (tdurakov)
Revision history for this message
Timofey Durakov (tdurakov) wrote :

libvirt version prior to 1.2.17 was used, so it has no feature to block-migrate instance with booted volumes

Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Pawel Koniszewski (pawel-koniszewski) wrote :

I believe that this bug is valid and we might corrupt volume-backed VMs when libvirt version is <=1.2.17.

So the bug starts here https://github.com/openstack/nova/blob/660ecaee66ccab895b282c2ed45c95c809ad6833/nova/virt/libvirt/driver.py#L5592 - for volume backed VMs dest_check_data.is_volume_backed will be True, but "not bool(jsonutils.loads(self.get_instance_disk_info(instance, block_device_info)))" will return False and in the result whole method will return that block storage is not shared.

Now we have 3 cases:

* Libvirt version is >= 1.2.17 and tunnelling is OFF. This causes block live migration of volume-backed VM with config drive attached. It works perfectly fine, because we have implemented support for selective disk migration, so that nova will exclude volume from list of devices that needs to be migrated to destination. This is because volume is shared and there is really no need to migrate it:
https://github.com/openstack/nova/blob/660ecaee66ccab895b282c2ed45c95c809ad6833/nova/virt/libvirt/driver.py#L6059
and
https://github.com/openstack/nova/blob/660ecaee66ccab895b282c2ed45c95c809ad6833/nova/virt/libvirt/driver.py#L6068
This even helps with live migration of volume-backed VMs with local config drive, because it finally works. Libvirt takes care of copying config drive to destination... but it works by mistake.

* Libvirt version is >= 1.2.17 and tunnelling is on. This again causes block live migration of volume-backed VM with config drive attached. Because libvirt does not support selective disk migration with tunnelling it will be refused because this feature is not supported, not because live migration with local disk is not supported.

* Libvirt version is < 1.2.17. This causes volumes to be copied to themselves during live migrations. Nova again incorrectly calculates live migration type and fire offs block live migration of volume-backed VMs. Unfortunately condition to exclude volumes from a list of devices that should be migrated to destination is not met:
https://github.com/openstack/nova/blob/660ecaee66ccab895b282c2ed45c95c809ad6833/nova/virt/libvirt/driver.py#L6048
Because of this volumes are not skipped during live migration and therefore we again hit this bug: https://bugs.launchpad.net/nova/+bug/1398999

Please correct me if I'm wrong, but I believe we are hitting #1398999 once again due to wrong calculation of migration type.

Changed in nova:
status: Invalid → Won't Fix
status: Won't Fix → Triaged
assignee: Timofey Durakov (tdurakov) → nobody
Revision history for this message
Pawel Koniszewski (pawel-koniszewski) wrote :

Just realized that case #3 is not valid due to check https://github.com/openstack/nova/blob/660ecaee66ccab895b282c2ed45c95c809ad6833/nova/virt/libvirt/driver.py#L5523

However, still case #1 is valid. For volume-backed VMs with config-drive nova chooses block live migration. Not sure how we should treat this. It definitely helps to live migrate volume-backed VMs with local disks.

Revision history for this message
Pawel Koniszewski (pawel-koniszewski) wrote :

Talked with danpb on IRC and looks like we can use block live migration in such case, so #1 and #2 are invalid too.

Changed in nova:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.