live migration fails with NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'

Bug #1783917 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
neutron
Invalid
High
Unassigned

Bug Description

Seen in the gate starting today:

http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Unsupported%20VIF%20type%20unbound%20convert%20'_nova_to_osvif_vif_unbound'%5C%22%20AND%20tags:screen-n-cpu.txt&from=7d

My guess is that it's related to https://review.openstack.org/522537 which began using the new neutron port binding API.

Trace looks like this:

http://logs.openstack.org/52/564452/30/check/nova-live-migration/d3172a4/logs/screen-n-cpu.txt.gz#_Jul_26_22_44_44_739885

Jul 26 22:44:44.739885 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: WARNING nova.virt.libvirt.driver [None req-7c0f6655-5920-4894-b3d5-062ba4f8270e service nova] [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] Error monitoring migration: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound': NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'
Jul 26 22:44:44.740334 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] Traceback (most recent call last):
Jul 26 22:44:44.740578 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7557, in _live_migration
Jul 26 22:44:44.740839 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] finish_event, disk_paths)
Jul 26 22:44:44.741086 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7421, in _live_migration_monitor
Jul 26 22:44:44.741328 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] migrate_data)
Jul 26 22:44:44.741562 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/exception_wrapper.py", line 79, in wrapped
Jul 26 22:44:44.741785 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] function_name, call_dict, binary, tb)
Jul 26 22:44:44.742008 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jul 26 22:44:44.742246 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] self.force_reraise()
Jul 26 22:44:44.742485 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jul 26 22:44:44.742791 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] six.reraise(self.type_, self.value, self.tb)
Jul 26 22:44:44.743117 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/exception_wrapper.py", line 69, in wrapped
Jul 26 22:44:44.743332 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] return f(self, context, *args, **kw)
Jul 26 22:44:44.743544 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/compute/manager.py", line 216, in decorated_function
Jul 26 22:44:44.743758 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] kwargs['instance'], e, sys.exc_info())
Jul 26 22:44:44.745586 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jul 26 22:44:44.747347 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] self.force_reraise()
Jul 26 22:44:44.747790 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jul 26 22:44:44.748166 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] six.reraise(self.type_, self.value, self.tb)
Jul 26 22:44:44.748507 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/compute/manager.py", line 204, in decorated_function
Jul 26 22:44:44.748859 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] return function(self, context, *args, **kwargs)
Jul 26 22:44:44.749207 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/compute/manager.py", line 6582, in _post_live_migration
Jul 26 22:44:44.749626 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] network_info)
Jul 26 22:44:44.750459 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7971, in post_live_migration_at_source
Jul 26 22:44:44.751201 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] self.unplug_vifs(instance, network_info)
Jul 26 22:44:44.751576 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 829, in unplug_vifs
Jul 26 22:44:44.752054 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] self._unplug_vifs(instance, network_info, False)
Jul 26 22:44:44.752517 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 823, in _unplug_vifs
Jul 26 22:44:44.752885 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] self.vif_driver.unplug(instance, vif)
Jul 26 22:44:44.753247 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/virt/libvirt/vif.py", line 928, in unplug
Jul 26 22:44:44.753613 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] vif_obj = os_vif_util.nova_to_osvif_vif(vif)
Jul 26 22:44:44.753983 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] File "/opt/stack/new/nova/nova/network/os_vif_util.py", line 505, in nova_to_osvif_vif
Jul 26 22:44:44.754411 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] {'type': vif['type'], 'func': funcname})
Jul 26 22:44:44.754752 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2] NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'
Jul 26 22:44:44.755092 ubuntu-xenial-rax-dfw-0000999143 nova-compute[2823]: ERROR nova.virt.libvirt.driver [instance: 582afae2-510c-4b73-b41d-1edc93e902a2]

Revision history for this message
melanie witt (melwitt) wrote :

Added neutron to the bug -- we're not yet sure whether the bug is in nova or neutron.

summary: - live migration fails with NovaException: Unsupported VIF type unbound
+ live migration raises NovaException: Unsupported VIF type unbound
convert '_nova_to_osvif_vif_unbound'
Revision history for this message
Matt Riedemann (mriedem) wrote : Re: live migration raises NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'
Changed in nova:
status: New → Confirmed
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
tags: added: live-migration rocky-rc-potential
Changed in nova:
importance: Undecided → High
melanie witt (melwitt)
summary: - live migration raises NovaException: Unsupported VIF type unbound
+ live migration fails with NovaException: Unsupported VIF type unbound
convert '_nova_to_osvif_vif_unbound'
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/586391

Revision history for this message
melanie witt (melwitt) wrote :

[19:07:37] <sean-k-mooney> http://logs.openstack.org/63/585163/1/check/nova-live-migration/1b2aebb/logs/screen-n-cpu.txt#_Jul_27_01_44_01_083831 looks like its happening because we are calling unplug on the source node after we have activated the binding on the dest

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.8 KiB)

(9:12:17 PM) mriedem: yeah the error is from unplugging vifs in _post_live_migration which happens on the source,
(9:12:58 PM) mriedem: https://github.com/openstack/nova/blob/2afc5fed1f60077e7ff0b9e81b64cff4e4dbabfc/nova/compute/manager.py#L6581
(9:13:10 PM) mriedem: right before that,
(9:13:11 PM) mriedem: https://github.com/openstack/nova/blob/2afc5fed1f60077e7ff0b9e81b64cff4e4dbabfc/nova/compute/manager.py#L6572
(9:13:17 PM) mriedem: we activate the port bindings for the dest host
(9:14:03 PM) melwitt: ah, I see
(9:14:12 PM) sean-k-mooney: mriedem: yep that will deactivaate all other port bindings for that port meaning it will be in the unbound state on the sorce host
(9:14:16 PM) melwitt: so just flip that?
(9:14:18 PM) mriedem: https://github.com/openstack/nova/blob/2afc5fed1f60077e7ff0b9e81b64cff4e4dbabfc/nova/network/neutronv2/api.py#L2534
(9:14:42 PM) mriedem: i didn't know we couldn't unplug a deactivated port...
(9:14:49 PM) melwitt: I wonder how it doesn't fail 100% of the time
(9:14:53 PM) mriedem: melwitt: race
(9:14:55 PM) mriedem: apparently
(9:14:56 PM) melwitt: ah
(9:15:16 PM) melwitt: yeah, what luck that the actual change *didn't* fail
(9:15:21 PM) sean-k-mooney: mriedem: your raising with the notification neutron send for the port status change
(9:15:32 PM) mriedem: hmm http://logs.openstack.org/63/585163/1/check/nova-live-migration/1b2aebb/logs/screen-n-cpu.txt#_Jul_27_01_44_00_974248
(9:15:56 PM) mriedem: melwitt: i had seen this once in the series and mlavalle debugged it and couldn't find anything wrong
(9:16:07 PM) mriedem: Jul 27 01:44:00.974248 ubuntu-xenial-rax-dfw-0001002000 nova-compute[2629]: DEBUG nova.network.neutronv2.api [None req-33283139-ba55-4106-b76c-8751a025f153 service nova] [instance: 6b72a721-0995-446e-848f-f407b788c7f4] Port 21095ff0-6bcd-414b-9d6f-b63e03aacb23 binding to destination host ubuntu-xenial-rax-dfw-0001002004 is already ACTIVE. {{(pid=2629) migrate_instance_start /opt/stack/new/nova/nova/network/neutronv2/api.py:2546}}
(9:16:13 PM) melwitt: ah, okay
(9:16:18 PM) mriedem: oh i know why it's already active,
(9:16:26 PM) mriedem: because we activate the dest host port binding during post-copy
(9:16:42 PM) mriedem: which is the whole point of the blueprint - to shorten the window of time that you don't have networking on the dest host
(9:17:24 PM) melwitt: right
(9:17:38 PM) melwitt: shorten the window
(9:18:22 PM) mriedem: this is the unplug event http://logs.openstack.org/63/585163/1/check/nova-live-migration/1b2aebb/logs/screen-n-cpu.txt#_Jul_27_01_44_00_069526
(9:18:57 PM) mriedem: this is where we activate the ports on the dest host during post-copy http://logs.openstack.org/63/585163/1/check/nova-live-migration/1b2aebb/logs/screen-n-cpu.txt#_Jul_27_01_43_58_561391
(9:20:10 PM) mriedem: we could have the live migration method wait for the unplug event before starting with post live migration, but (1) i'm not sure that helps anything and (2) it might not work that way for all virt drivers - only libvirt + post-copy has this
(9:20:35 PM) melwitt: yeah, events are sketch depending on which networking backend too, right
(9:20:47 PM) melwitt: like ovs vs other
(9:20:47 PM) openstackgerrit: Y...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

To summarize, during post-copy on the source host, nova activates the port binding on the dest host. During post live migration on the source host, nova refreshes the ports in the instance info cache and then calls unplug_vifs in the virt driver, but the vif type now shows up as 'unbound' - and I'm not sure if that's a problem in neutron or not; why would the vif type be unbound if we activated the dest host port binding? Is that a bug in neutron? We can workaround this in nova by getting the vif type off the migrate_data.vifs and using that during the unplug call, but it seems like we shouldn't need to do this, and the unbound vif type might be the race (from neutron?).

Changed in neutron:
status: Confirmed → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :

(9:40:13 PM) mriedem: http://logs.openstack.org/63/585163/1/check/nova-live-migration/1b2aebb/logs/screen-n-cpu.txt#_Jul_27_01_44_00_726935
(9:40:20 PM) mriedem: that's where we refresh the info cache in _post_live_migration
(9:40:26 PM) mriedem: after activating the dest host port binding
(9:40:39 PM) mriedem: [{"profile": {"migrating_to": "ubuntu-xenial-rax-dfw-0001002004"}, "ovs_interfaceid": null, "preserve_on_delete": false, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.1.0.10"}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": "10.1.0.0/28", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.1.0.1"}}], "meta": {"injected": false, "tenant_id": "7dbeedd7076e472091193779ebbcf887", "mtu": 1400}, "id": "1d8de970-331e-46b5-8c7b-574821e891e5", "label": "tempest-LiveMigrationTest-411356071-network"}, "devname": "tap21095ff0-6b", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {}, "address": "fa:16:3e:34:c9:90", "active": false, "type": "unbound", "id": "21095ff0-6bcd-414b-9d6f-b63e03aacb23", "qbg_params": null}]
(9:41:06 PM) mriedem: yeah...that's wrong
(9:41:11 PM) mriedem: it should be bound to the dest host

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/586402

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/586391

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/586568

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/586402
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e17c3784ec2a394f01eb6e6a242a921076cc5dc
Submitter: Zuul
Branch: master

commit 8e17c3784ec2a394f01eb6e6a242a921076cc5dc
Author: Matt Riedemann <email address hidden>
Date: Thu Jul 26 23:10:58 2018 -0400

    Use source vifs when unplugging on source during post live migrate

    With the changes in blueprint neutron-new-port-binding-api, we now
    manage port bindings on both the source and destination host and
    those can be different vif types to support live migrating between
    different network backends, e.g. linuxbridge-ovs, ovs->ovn, etc.

    The dest host port bindings are created during pre_live_migration
    and then activated on the source host during post-copy (for the
    libvirt driver) or in _post_live_migration when we call the
    migrate_instance_start() method in the network API. When the dest
    host port bindings are activated, the source host port bindings
    are automatically deactivated (it's an atomic operation in neutron).

    Depending on when we refresh the nw info cache on the instance,
    we can end up calling driver.post_live_migration_at_source() with
    information about the unbound source port binding, or the vif type
    for the destination host port binding. Because of that, unplug_vifs
    on the source can fail since it's not using the proper vif type.

    To resolve that, we use the stashed VIFMigrateData.source_vif
    when unplugging vifs on the source host. Those source vifs are
    an exact copy of the vifs on the instance before live migration
    started on the source host, so they are appropriate for unplugging
    on the source during _post_live_migration.

    Change-Id: Ie6439499cc7c6f05eec692dda677caf649dd7c94
    Closes-Bug: #1783917

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/586568
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=64b030ffaf578dee925a6ac2d47264346843ba2d
Submitter: Zuul
Branch: master

commit 64b030ffaf578dee925a6ac2d47264346843ba2d
Author: Matt Riedemann <email address hidden>
Date: Fri Jul 27 09:58:47 2018 -0400

    Pass source vifs to driver.cleanup in _post_live_migration

    This is a follow up to Ie6439499cc7c6f05eec692dda677caf649dd7c94
    where we use the VIFMigrateData.source_vif objects, if available,
    to unplug vifs on the source host in _post_live_migration. That
    change missed that if driver.post_live_migration_at_source()
    fails, the destroy_vifs flag tells driver.cleanup() (if called)
    to unplug vifs. Note that the libvirt driver heeds the destroy_vifs
    flag but some drivers do not, like the hyper-v driver, and
    unconditionally unplug vifs when driver.cleanup() is called.

    This change simply passes the same unplug_nw_info variable to
    driver.cleanup() as gets passed to post_live_migration_at_source()
    and the unit test is updated accordingly.

    Change-Id: I7a7cd5ba708433c29757d393efaa8b5ef8948da1
    Related-Bug: #1783917

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b3

This issue was fixed in the openstack/nova 18.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.