XenAPI: volume VM live migration failed with VDI_NOT_IN_MAP

Bug #1704071 reported by Jianghua Wang
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Brooks Kaminski

Bug Description

When boot a VM with a volume attached, and then perform live migration. It will fail on the destination compute with the following VDI_NOT_IN_MAP error.

Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: File "/opt/stack/nova/nova/compute/manager.py", line 198, in decorated_function
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: return function(self, context, *args, **kwargs)
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: File "/opt/stack/nova/nova/compute/manager.py", line 5343, in check_can_live_migrate_source
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: block_device_info)
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 491, in check_can_live_migrate_source
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: dest_check_data)
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2334, in check_can_live_migrate_source
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: raise exception.MigrationPreCheckError(reason=msg)
Jul 13 05:06:55 ComputeNode-1 nova-compute[7780]: MigrationPreCheckError: Migration pre-check error: assert_can_migrate failed because: VDI_NOT_IN_MAP

XenServer version is 7.1

Tags: xenserver
Sean Dague (sdague)
tags: added: xenserver
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
huan (huan-xie) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/538415

Changed in nova:
assignee: nobody → Brooks Kaminski (bhkaminski)
status: Confirmed → In Progress
Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

All,

I have a check on the change history. "VM.assert_can_migrate" was not only for block device migration before this commit:
https://review.openstack.org/#/c/247853/23/nova/virt/xenapi/vmops.py@2194

I don't know the reason why it went in that way.
But anyway I think it's accepted to skip VM.assert_can_migrate for >XS7.0 (platform version is 2.1) basing on the situation that it is only invoked for block live migration.

Bob's suggestion on "only looking at VDI_MAP' is good if we can do it. But unfortunately seems we can't. The reason is that since XS7.0, assert_cam_migrate requires vif_map, vdi_map existing in the input parameters. Actually it ever was to empty vdi_map and vif_map before invoking "assert_cam_migrate".
https://review.openstack.org/#/c/9879/10/nova/virt/xenapi/vmops.py@1550

Also see the errors of "VIF_NOT_IN_MAP" in https://bugs.launchpad.net/nova/+bug/1658877

Regards,
Jianghua

Revision history for this message
Brooks Kaminski (brooks-kaminski) wrote :

There has been some disagreement here on how to best handle this with this temporary patch. I have submitted my new revision of the commit here that does swallow the exception in the case of VDI_NOT_IN_MAP which is what I believe Bob was mentioning when "looking for".

This commit looks for the VDI_NOT_IN_MAP within the exception, and then will verify that it is received from an XCP 2.1.0+ instance before returning and failing to raise. I have included in the commit message why I have decided this route over skipping the assertion completely, but I will include it here as well for posterity.

1. --Block-migration can be called without regard for whether an iSCSI volume is attached, and we still want to ensure that VIF, CPU and other factors are checked, and not just skip all checks entirely.

 2. Currently the Assert only exists within the --block-migration code base but this needs to change. A future commit will remove this logic to ensure that the commit runs without this flag. Once that is done we want to be able to continue to use this Exception swallow logic rather than continuing to skip the assert for all XCP2.1.0+ even without volumes.

-Brooks Kaminski
irc: Spazmotic

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (4.8 KiB)

Reviewed: https://review.openstack.org/538415
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0e9cd6c4d66ca4afb95bb60edb412af9e96c546e
Submitter: Zuul
Branch: master

commit 0e9cd6c4d66ca4afb95bb60edb412af9e96c546e
Author: Brooks Kaminski <email address hidden>
Date: Sat Jan 27 01:35:07 2018 -0600

    XenAPI: XCP2.1+ Swallow VDI_NOT_IN_MAP Exception

    Changes within XenAPI have enforced a more strict policy when checking
    assert_can_migrate. In particular when checking the source_vdi:dest_sr
    mapping it insists that the SR actually exist. This is not a problem for
    local disks, however this assertation is called extremely early in the
    live migration process (check_can_migrate_source) which is called from
    conductor, which makes a problem for attached volumes.

    This early in the process the host has just barely been chosen and no SR
    information has been configured yet for these volumes or their initiators.
    Additionally we cannot prepare this SR any earlier as BDM information is
    not set up until the pre_live_migration method. With the options to either
    skip this assertion completely or swallow the exception, I have chosen to
    swallow the exception. My reasons for this are two-fold:

    1. --block-migration can be called without regard for whether an iSCSI
    volume is attached, and we still want to ensure that VIF, CPU and other
    factors are checked, and not just skip all checks entirely.
    2. Currently the Assert only exists within the --block-migration code
    base but this needs to change. A future commit will remove this logic
    to ensure that the commit runs without this flag. Once that is done we
    want to be able to continue to use this Exception swallow logic rather
    than continuing to skip the assert for all XCP2.1.0+ even without volumes.

    This decision should help us handle less work in a future commit and does not
    seem to align with the goals of that commit, where it does align properly here.
    This commit still changes very little of the current codebase and puts us in
    a good position to refactor the way this is handled at a later date, while
    adding a TODO note to correct VM.assert_can_migrate only running during a
    block migration.

    Additionally there seems to be some confusion that the mapping data that is
    generated during this initial trip through _call_live_migrate_command is needed
    to continue along the code, however this data appears to be purely used to send
    the mapping information through the assertation call, and is then discarded.
    The only data returned from these methods is the original dest_data which
    is carried into the live_migration method. The _call_live_migration method is
    called again during the live_migration() method, and during this time it does
    need that mapping to send along to XenAPI for the actual migration, but not
    yet. Because this codebase is so confusing, I am providing a little bit of
    context on the movement of these variables with some psuedocode:

    ---CONDUCTOR.TASKS.LIVE_MIGRATE---
    LiveMigrationTask.Execute()
     ...

Read more...

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.