Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound' on compute restart

Bug #1809136 reported by Stephen Finucane on 2018-12-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Stephen Finucane
Ocata
Medium
Stephen Finucane
Pike
Medium
Stephen Finucane
Queens
Medium
Stephen Finucane
Rocky
Medium
melanie witt

Bug Description

This is a variant of an existing bug:

- https://bugs.launchpad.net/nova/+bug/1738373 tracks a similar exception ('_nova_to_osvif_vif_binding_failed') on compute startup.

There are also two other closely related bugs:

- https://bugs.launchpad.net/nova/+bug/1783917 tracks this same exception ('_nova_to_osvif_vif_unbound') but for live migrations
- https://bugs.launchpad.net/nova/+bug/1784579 tracks a similar exception ('_nova_to_osvif_vif_binding_failed') but for live migration

In addition, there are a few bugs which are likely the root cause of all of the above issues (and this one) in the first place:

- https://bugs.launchpad.net/nova/+bug/1751923

In this instance, as with bug 1738373, we are unable to start nova-compute service on compute node due to an os-vif invoked error.

nova-compute.log on compute shows:

2018-05-12 16:42:47.323 305978 INFO os_vif [req-0a72cdea-843a-4932-b8a0-bc24c2f21d9f - - - - -] Successfully plugged vif VIFBridge(active=True,address=fa:16:3e:41:a9:2c,bridge_name='qbr8d027ff4-23',has_traffic_filtering=True,id=8d027ff4-2328-47df-9f9a-2c1a9914a83b,network=Network(9a98b244-b1d2-46b3-ab0e-be8456e3a984),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name='tap8d027ff4-23')
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service [req-0a72cdea-843a-4932-b8a0-bc24c2f21d9f - - - - -] Error starting thread.
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service Traceback (most recent call last):
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 708, in run_service
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service service.start()
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/service.py", line 117, in start
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self.manager.init_host()
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1154, in init_host
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self._init_instance(context, instance)
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 957, in _init_instance
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self.driver.plug_vifs(instance, net_info)
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 703, in plug_vifs
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service self.vif_driver.plug(instance, vif)
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 771, in plug
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service vif_obj = os_vif_util.nova_to_osvif_vif(vif)
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/network/os_vif_util.py", line 408, in nova_to_osvif_vif
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service {'type': vif['type'], 'func': funcname})
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'
2018-05-12 16:42:47.369 305978 ERROR oslo_service.service

Inspecting the available ports shows the port does exist, so this looks like a caching issue.

[stack@director:~]$ neutron port-list | grep fa:16:3e:41:a9:2c
| 8d027ff4-2328-47df-9f9a-2c1a9914a83b | | fa:16:3e:41:a9:2c | {"subnet_id": "1f5ed9bc-aa7d-49bd-ac48-23b430fc0eb4", "ip_address": "172.19.9.17"} |
[stack@director:~]$ neutron port-show 8d027ff4-2328-47df-9f9a-2c1a9914a83b
+-----------------------+------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | overcloud-compute-7.localdomain |
| binding:profile | {} |
| binding:vif_details | {"port_filter": true, "ovs_hybrid_plug": true} |
| binding:vif_type | ovs |
| binding:vnic_type | normal |
| created_at | 2017-10-31T12:31:45Z |
| description | |
| device_id | b4ef4d0b-9e39-4741-a2dd-7fd7c066d13b |
| device_owner | compute:nova |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "1f5ed9bc-aa7d-49bd-ac48-23b430fc0eb4", "ip_address": "172.19.9.17"} |
| id | 8d027ff4-2328-47df-9f9a-2c1a9914a83b |
| mac_address | fa:16:3e:41:a9:2c |
| name | |
| network_id | 9a98b244-b1d2-46b3-ab0e-be8456e3a984 |
| port_security_enabled | True |
| project_id | 3b2049626c954cdc9147beee2d34b441 |
| qos_policy_id | |
| revision_number | 184 |
| security_groups | 97aa0764-c0b5-47d1-88b2-285673d46a31 |
| | c7addc13-5a77-4322-953a-9d89d42468e6 |
| | cecdad42-7c78-45e7-9ec2-fef1086dbb7e |
| | de0a6da8-c44e-475f-90fd-1fb625840c52 |
| status | ACTIVE |
| tenant_id | 3b2049626c954cdc9147beee2d34b441 |
| updated_at | 2018-05-12T15:37:46Z |
+-----------------------+------------------------------------------------------------------------------------+

We should figure out why the invalid cache is getting saved, but we're going to track that effort separately. For now, we should just focus on letting the service start, putting instances with errors like this into error state.

This was originally reported here https://bugzilla.redhat.com/show_bug.cgi?id=1578028

Fix proposed to branch: master
Review: https://review.openstack.org/626228

Changed in nova:
assignee: nobody → Stephen Finucane (stephenfinucane)
status: New → In Progress

Reviewed: https://review.openstack.org/626228
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1def76a1c49032d93ab6c7ee61dbbfe8e29cafca
Submitter: Zuul
Branch: master

commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136

Changed in nova:
status: In Progress → Fix Released
melanie witt (melwitt) on 2018-12-19
Changed in nova:
importance: Undecided → Medium

Reviewed: https://review.openstack.org/626410
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bc0a5d0355311641daa87b46e311ae101f1817ad
Submitter: Zuul
Branch: stable/rocky

commit bc0a5d0355311641daa87b46e311ae101f1817ad
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)

Reviewed: https://review.openstack.org/626550
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=79a90d37027b7ca131218e16eaee70d6d5152206
Submitter: Zuul
Branch: stable/queens

commit 79a90d37027b7ca131218e16eaee70d6d5152206
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)
    (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad)

Reviewed: https://review.openstack.org/626554
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7b4f5725f821ef89176ef69f036471eaaf8a6201
Submitter: Zuul
Branch: stable/pike

commit 7b4f5725f821ef89176ef69f036471eaaf8a6201
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)
    (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad)
    (cherry picked from commit 79a90d37027b7ca131218e16eaee70d6d5152206)

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

This issue was fixed in the openstack/nova 17.0.10 release.

This issue was fixed in the openstack/nova 18.2.0 release.

Reviewed: https://review.openstack.org/626556
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e61b1d7d72470a95068470d67779e08ececdb2e5
Submitter: Zuul
Branch: stable/ocata

commit e61b1d7d72470a95068470d67779e08ececdb2e5
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Conflicts:
     nova/compute/manager.py
     nova/tests/unit/network/test_os_vif_util.py

    NOTE(sfinucan): As with the 'stable/ocata' backport of change
    Ia963a093a1b26d90b4de2e8fc623031cf175aece, the compute manager conflicts
    are due to change I2740ea14e0c4ecee0d91c7f3e401b2c29498d097 in Queens.
    The _LE() marker has to be left intact for pep8 checks in Ocata. The
    test_os_vif_util conflicts are due to not having change
    Ic23effc05c901575f608f2b4c5ccd2b1fb3c2d5a nor change
    I3f38954bc5cf7b1690182dc8af45078eea275aa4 in Ocata

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)
    (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad)
    (cherry picked from commit 79a90d37027b7ca131218e16eaee70d6d5152206)
    (cherry picked from commit 7b4f5725f821ef89176ef69f036471eaaf8a6201)

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers