wrong check for physical function in pci utils

Bug #1499204 reported by Moshe Levi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Moshe Levi

Bug Description

in pci utils the is_physical_function function check it based on existing virtfn* symbolic link. The check is incorrect because
if the PF doen't enable SR-IOV meaning sriov_numvfs is set to zero there are no virtfn* ljnks and the nova-compute recognize it as VF.

see:
root@r-ufm160:/opt/stack/logs# ls /sys/bus/pci/devices/0000\:03\:00.0/
broken_parity_status d3cold_allowed enable iommu_group modalias pools reset sriov_numvfs uevent
class device infiniband irq msi_bus power resource sriov_totalvfs vendor
commands_cache dma_mask_bits infiniband_cm local_cpulist msi_irqs real_miss resource0 subsystem vpd
config driver infiniband_mad local_cpus net remove resource0_wc subsystem_device
consistent_dma_mask_bits driver_override infiniband_verbs mlx5_num_vfs numa_node rescan sriov subsystem_vendor
root@r-ufm160:/opt/stack/logs# cat /sys/bus/pci/devices/0000\:03\:00.0/sriov_numvfs
0

root@r-ufm160:/opt/stack/logs# echo 4 > /sys/bus/pci/devices/0000\:03\:00.0/sriov_numvfs
root@r-ufm160:/opt/stack/logs# ls /sys/bus/pci/devices/0000\:03\:00.0/
broken_parity_status d3cold_allowed enable iommu_group modalias pools reset sriov_numvfs uevent virtfn3
class device infiniband irq msi_bus power resource sriov_totalvfs vendor vpd
commands_cache dma_mask_bits infiniband_cm local_cpulist msi_irqs real_miss resource0 subsystem virtfn0
config driver infiniband_mad local_cpus net remove resource0_wc subsystem_device virtfn1
consistent_dma_mask_bits driver_override infiniband_verbs mlx5_num_vfs numa_node rescan sriov subsystem_vendor virtfn2

Moshe Levi (moshele)
tags: added: pci-passthogth
tags: added: passthrough pci
removed: pci-passthogth
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/227160

Changed in nova:
assignee: nobody → Moshe Levi (moshele)
status: New → In Progress
Revision history for this message
Moshe Levi (moshele) wrote :
Revision history for this message
Moshe Levi (moshele) wrote :
Revision history for this message
Moshe Levi (moshele) wrote :

I added the n-cpu.log and q-srv.log
the vm failed of vif bind error because nova-cmpute create port with pci_vendor 15b3:1013 which is a pf and not a vf
015-11-08 14:04:11.477 DEBUG neutron.plugins.ml2.drivers.mech_sriov.mech_driver.mech_driver [req-8dc2102e-df03-454a-a0e9-87ad58a7db56 neutron service] Unsupported pci_vendor 15b3:1013 _check_supported_pci_vendor_device /opt/stack/neutron/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py:171

see:
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.2 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
03:00.3 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
03:00.4 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
03:00.5 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]

03:00.0 0200: 15b3:1013
03:00.1 0200: 15b3:1013
03:00.2 0200: 15b3:1014
03:00.3 0200: 15b3:1014
03:00.4 0200: 15b3:1014
03:00.5 0200: 15b3:1014

Revision history for this message
Moshe Levi (moshele) wrote :

the is the pci_passthrough_whitelist in nova
pci_passthrough_whitelist = {"address":"*:03:00.*","physical_network":"physnet1"}

Revision history for this message
Baodong (Robert) Li (baoli) wrote :

I think that the whtelist is not defined properly. If the address "*:03:00.*" includes some PFs, then it shouldn't use "*" in the address.

The is_physical_funciton() won't be called for the above whitelist entry since the entry contains "*".

Check https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L115
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4806

According to the the whitelist entry, the PF will be covered by the entry, But the PF shouldn't be added by the caller from the above two references.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/227160
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2ba4644f91aa523c2a14e32a168b853cf0b8c4e1
Submitter: Jenkins
Branch: master

commit 2ba4644f91aa523c2a14e32a168b853cf0b8c4e1
Author: Moshe Levi <email address hidden>
Date: Wed Sep 23 02:49:28 2015 +0300

    libvirt: report pci Type-PF type even when VFs are disabled

    libvirt < 1.3 reports virt_functions capability only when pf has
    VFs enabled. This workaround patch updates the is_physical_function
    function to read the sriov_totalvfs if exists and check it is
    greater than 0. The sriov_totalvfs is the number for the
    maximum possible VF for this PF. _get_pcidev_info in libvirt driver
    is updated to get the correct pci device type using this function.

    Closes-Bug: #1499204
    Change-Id: I8990c36fb1d6c66093a465930ff3f0948dd64986

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b2

This issue was fixed in the openstack/nova 13.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.