live migration failed with XenServer as hypervisor

Bug #1658877 reported by huan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Bob Ball
Ocata
Fix Released
High
Matt Riedemann

Bug Description

I used devstack to deploy a multi compute node test environment with XenServer
Then I executed the command "nova live-migration --block-migrate admin-vm5 ComputeNode3"
Then I got the below errors:

===============================================
2017-01-23 07:18:11.243 ERROR nova.virt.xenapi.vmops [req-6e4f8d0b-ea2f-4a69-bcd8-98d5f94e8ab0 admin admin] Migrate Send failed
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops Traceback (most recent call last):
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2396, in live_migrate
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops "VM.migrate_send", vm_ref, migrate_data)
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2361, in _call_live_migrate_command
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops vdi_map, vif_map, options)
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops File "/usr/local/lib/python2.7/dist-packages/os_xenapi/client/session.py", line 200, in call_xenapi
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops return session.xenapi_request(method, args)
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops File "/usr/local/lib/python2.7/dist-packages/os_xenapi/client/XenAPI.py", line 130, in xenapi_request
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops result = _parse_result(getattr(self, methodname)(*full_params))
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops File "/usr/local/lib/python2.7/dist-packages/os_xenapi/client/XenAPI.py", line 212, in _parse_result
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops raise Failure(result['ErrorDescription'])
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops Failure: ['VIF_NOT_IN_MAP', 'OpaqueRef:b0636c87-539f-59f6-8fef-8c15c6d58665']
2017-01-23 07:18:11.243 TRACE nova.virt.xenapi.vmops

================================================
2017-01-23 07:18:11.355 ERROR nova.compute.manager [req-6e4f8d0b-ea2f-4a69-bcd8-98d5f94e8ab0 admin admin] [instance: b539c9fd-6f29-472b-908c-5c0146c31917] Live migration failed.
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] Traceback (most recent call last):
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] File "/opt/stack/nova/nova/compute/manager.py", line 5368, in _do_live_migration
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] block_migration, migrate_data)
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 520, in live_migration
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] recover_method, block_migration, migrate_data)
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2414, in live_migrate
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] recover_method(context, instance, destination_hostname)
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] self.force_reraise()
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] six.reraise(self.type_, self.value, self.tb)
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 2400, in live_migrate
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] reason=_('Migrate Send failed'))
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917] MigrationError: Migration error: Migrate Send failed
2017-01-23 07:18:11.355 TRACE nova.compute.manager [instance: b539c9fd-6f29-472b-908c-5c0146c31917]

huan (huan-xie)
Changed in nova:
assignee: nobody → huan (huan-xie)
huan (huan-xie)
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/424428

Changed in nova:
status: New → In Progress
Bob Ball (bob-ball)
Changed in nova:
importance: Undecided → High
milestone: none → ocata-3
Bob Ball (bob-ball)
tags: added: ocata-rc-potential
Matt Riedemann (mriedem)
tags: added: live-migration xenserver
Revision history for this message
Matt Riedemann (mriedem) wrote :

Is this a regression introduced in Ocata? Or was this a latent issue?

Revision history for this message
Bob Ball (bob-ball) wrote :

This is a regression in Nova when using XenServer 6.5, but it was also regressed by an upgrade to XenServer 7.0.

i.e. if you used XenServer 6.5 with Newton it worked, but if you upgrade hypervisor or you upgrade Nova then it no longer works due to 1) the change in network name (i.e. the XenServer 6.5 logic can't figure out what to do) and 2) the requirement to specify the network anyway.

Changed in nova:
assignee: huan (huan-xie) → Bob Ball (bob-ball)
Revision history for this message
Matt Riedemann (mriedem) wrote :

@Bob, so to be clear, there wasn't a change in Nova in the Ocata release that regressed this, was there? If it's upgrading xenserver to a new major version, then that's an issue that Nova needs to deal with, but not a regression in Nova per se, correct?

Revision history for this message
Matt Riedemann (mriedem) wrote :

What does "1) the change in network name" mean? What change?

Revision history for this message
Bob Ball (bob-ball) wrote :

The change in network name was actually introduced in Newton - see http://git.openstack.org/cgit/openstack/nova/log/nova/virt/xenapi/vif.py?h=stable/newton

I'd would personally still very much want to see this fixed in Ocata as backports will be tricky due to the changes in object version.

Revision history for this message
Bob Ball (bob-ball) wrote :

Sorry for the incorrect assertion that it worked in Newton; I currently believe live migration with XenServer did not work in vanilla Newton.

Changed in nova:
assignee: Bob Ball (bob-ball) → huan (huan-xie)
Changed in nova:
assignee: huan (huan-xie) → Bob Ball (bob-ball)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/435049

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/424428
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4cd32645fb26d39a900433c4c1dfecaac1767522
Submitter: Jenkins
Branch: master

commit 4cd32645fb26d39a900433c4c1dfecaac1767522
Author: Huan Xie <email address hidden>
Date: Sun Jan 22 03:08:40 2017 -0800

    Fix live migrate with XenServer

    Live migration with XenServer as hypervisor failed with xapi
    errors "VIF_NOT_IN_MAP". There are two reasons for this
    problem:

    (1) Before XS7.0, it supports VM live migration without
    setting vif_ref and network_ref explicitly if the destination
    host has same network, but since XS7.0, it doesn't support
    this way, we must give vif_ref and network_ref mapping.

    (2) In nova, XenServer has introduced interim network for
    fixing ovs updating wrong port in neutron, see bug 1268955
    and also interim network can assist support neutron security
    group (linux bridge) as we cannot make VIF connected to
    linux bridge directly via XAPI

    To achieve this, we will add {src_vif_ref: dest_network_ref}
    mapping information, in pre_live_migration, we first create
    interim network in destination host and store
    {neutron_vif_uuid: dest_network_ref} in migrate_data, then in
    source host, before live_migration, we will calculate the
    {src_vif_ref: dest_network_ref} and set it as parameters to
    xapi when calling VM.migrate_send. Also, we will handle the
    case where the destination host is running older code that
    doesn't have this new src_vif_ref mapping, like live migrating
    from an Ocata compute node to a Newton compute node.

    Closes-bug: 1658877

    Change-Id: If0fb5d764011521916fbbe15224f524a220052f3

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/435049
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ef0c67934c6ccc0ef6685697b190c2c9324ee321
Submitter: Jenkins
Branch: stable/ocata

commit ef0c67934c6ccc0ef6685697b190c2c9324ee321
Author: Huan Xie <email address hidden>
Date: Sun Jan 22 03:08:40 2017 -0800

    Fix live migrate with XenServer

    Live migration with XenServer as hypervisor failed with xapi
    errors "VIF_NOT_IN_MAP". There are two reasons for this
    problem:

    (1) Before XS7.0, it supports VM live migration without
    setting vif_ref and network_ref explicitly if the destination
    host has same network, but since XS7.0, it doesn't support
    this way, we must give vif_ref and network_ref mapping.

    (2) In nova, XenServer has introduced interim network for
    fixing ovs updating wrong port in neutron, see bug 1268955
    and also interim network can assist support neutron security
    group (linux bridge) as we cannot make VIF connected to
    linux bridge directly via XAPI

    To achieve this, we will add {src_vif_ref: dest_network_ref}
    mapping information, in pre_live_migration, we first create
    interim network in destination host and store
    {neutron_vif_uuid: dest_network_ref} in migrate_data, then in
    source host, before live_migration, we will calculate the
    {src_vif_ref: dest_network_ref} and set it as parameters to
    xapi when calling VM.migrate_send. Also, we will handle the
    case where the destination host is running older code that
    doesn't have this new src_vif_ref mapping, like live migrating
    from an Ocata compute node to a Newton compute node.

    Closes-bug: 1658877

    Change-Id: If0fb5d764011521916fbbe15224f524a220052f3
    (cherry picked from commit 4cd32645fb26d39a900433c4c1dfecaac1767522)

tags: added: in-stable-ocata
Matt Riedemann (mriedem)
tags: removed: ocata-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0rc2

This issue was fixed in the openstack/nova 15.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b1

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.