Plugging VFs no longer works without a readable phys_switch_id

Bug #1713590 reported by Brent Eagles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Moshe Levi

Bug Description

Attempting to plug a VF fails with the following stack trace in the nova compute logs:

2017-08-28 17:50:34.716 2843 ERROR os_vif [req-9fe05e3e-f7ae-4b2d-be27-90d81fe0b9fd 66e36d5620c24020ac6fa6fb8e580b6c df21f729c47347b299783a4c1f83e774 - default default] Failed to plug vif VIFHostDevice(active=False,address=fa:16:3e:de:b2:7d,dev_address=0000:0b:11.0,dev_type='ethernet',has_traffic_filtering=True,id=b5858ca0-c315-4b2a-b1a9-82a5b508bf2f,network=Network(19c75cc1-a553-4d3d-9a1a-9ad010102e31),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True): PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found
2017-08-28 17:50:34.716 2843 ERROR os_vif Traceback (most recent call last):
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/os_vif/__init__.py", line 77, in plug
2017-08-28 17:50:34.716 2843 ERROR os_vif plugin.plug(vif, instance_info)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 191, in plug
2017-08-28 17:50:34.716 2843 ERROR os_vif self._plug_vf_passthrough(vif, instance_info)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 163, in _plug_vf_passthrough
2017-08-28 17:50:34.716 2843 ERROR os_vif pci_slot, pf_interface=True, switchdev=True)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/linux_net.py", line 373, in get_ifname_by_pci_address
2017-08-28 17:50:34.716 2843 ERROR os_vif raise exception.PciDeviceNotFoundById(id=pci_addr)
2017-08-28 17:50:34.716 2843 ERROR os_vif PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found
2017-08-28 17:50:34.716 2843 ERROR os_vif

It appears that patch https://review.openstack.org/#/c/484051/ altered get_ifname_by_pci_address() always run a new helper function _is_switchdev() (it appears that it is assumed that switchdev should always be True). This causes plugging VFs on systems with drivers that do not support a readable phys_switch_id to fail.

I ran the code interactively on the host system to determine the actual exception:

>>> f = open('/sys/class/net/enp11s17/phys_switch_id', 'r')
>>> print f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 95] Operation not supported

From what I can tell, this should also cause plugging to fail on systems that have no phys_switch_id file at all.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

This is cause by trying to use sriov passthorugh on a host that does not support hardware offload of ovs.

the work around is to list the sriovnic agent before ovs in the ml2 conf.

e.g. change /etc/neutron/plugins/ml2/ml2_conf.ini form
[ml2]
...
mechanism_drivers = openvswitch,sriovnicswitch

to

[ml2]
...
mechanism_drivers = sriovnicswitch,openvswitch

you might want to also make sure that supported_pci_vendor_devs
in the ml2_sriov section does not contain the vendor id and product id of
the vf used for ovs offload. this will ensure that the sriovnic agent will
only manage interfaces that do not require ovs configuration.

if you had a nic that supported ovs offload and it is enable then doing a pci
pass through of the device without os-vif plugging the nic woudl resulst in a broken
dataplane hence the reason from removing them from the supported_pci_vendor_devs.

there is still a bug in os-vif here where we should first check the file exits before trying to use it so we should still harden the code. so lets keep this open to track that.

Changed in os-vif:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Brent Eagles (beagles) wrote :

AFAIK, supported_pci_vendor_devs is not a valid neutron configuration any longer. When it did exist, not having the device information in there would cause bindings to fail for direct SR-IOV ports as well so would not have been an option either.

Is there a property that you can set on a port so that it *doesn't* get treat as a OVS SR-IOV offload port? In the case where a bind attempt for the OVS offload fails, shouldn't it attempt with the next mechanism driver or is that not an option because this is happening in the os-vif library and not a neutron agent?

Brent Eagles (beagles)
affects: os-vif → neutron
Changed in neutron:
assignee: nobody → Moshe Levi (moshele)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/499203
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b184558ab6a61571160346818dcf220d925c5b30
Submitter: Jenkins
Branch: master

commit b184558ab6a61571160346818dcf220d925c5b30
Author: Moshe Levi <email address hidden>
Date: Wed Aug 30 18:35:48 2017 +0300

    ovs mech: bind only if user request switchdev

    In I77650be5f04775a72e2bdf694f93988825a84b72 we added
    vnic_type direct to the ovs mechanism drivers supported
    vnic_types. This cause problems when working with ovs and sriovnicswitch
    mechanism drivers in that order. In this case the ovs will bind
    the direct port instead of the sriovnicswitch.

    This change make ovs mech driver to bind the direct port only
    if user requested --binding-profile '{"capabilities": ["switchdev"]}'
    in the direct port if a user don't request this capability the SR-IOV
    legacy NIC mode is used.

    When enable-sriov-nic-features will be implemented in nova and
    libvirt will expose the switchdev capability then nova will be
    able to select a host which supports SR-IOV nic with switchdev
    mode.

    [1] - https://review.openstack.org/#/c/435954/11/specs/pike/approved/enable-sriov-nic-features.rst
    [2] - https://www.redhat.com/archives/libvir-list/2017-August/msg00583.html

    Closes-Bug: #1713590

    Change-Id: I0b5f062bcbf02381bdf4f694fc039f9bb17a2db5

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/504427

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/504427
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f6d956c1f5daa6272145955e52986de12ba86bc4
Submitter: Jenkins
Branch: stable/pike

commit f6d956c1f5daa6272145955e52986de12ba86bc4
Author: Moshe Levi <email address hidden>
Date: Wed Aug 30 18:35:48 2017 +0300

    ovs mech: bind only if user request switchdev

    In I77650be5f04775a72e2bdf694f93988825a84b72 we added
    vnic_type direct to the ovs mechanism drivers supported
    vnic_types. This cause problems when working with ovs and sriovnicswitch
    mechanism drivers in that order. In this case the ovs will bind
    the direct port instead of the sriovnicswitch.

    This change make ovs mech driver to bind the direct port only
    if user requested --binding-profile '{"capabilities": ["switchdev"]}'
    in the direct port if a user don't request this capability the SR-IOV
    legacy NIC mode is used.

    When enable-sriov-nic-features will be implemented in nova and
    libvirt will expose the switchdev capability then nova will be
    able to select a host which supports SR-IOV nic with switchdev
    mode.

    [1] - https://review.openstack.org/#/c/435954/11/specs/pike/approved/enable-sriov-nic-features.rst
    [2] - https://www.redhat.com/archives/libvir-list/2017-August/msg00583.html

    Closes-Bug: #1713590

    Change-Id: I0b5f062bcbf02381bdf4f694fc039f9bb17a2db5
    (cherry picked from commit b184558ab6a61571160346818dcf220d925c5b30)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.1

This issue was fixed in the openstack/neutron 11.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.0.0b1

This issue was fixed in the openstack/neutron 12.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.