Whitelisted PFs aren't being recognized

Bug #1613434 reported by Brent Eagles on 2016-08-15
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Ludovic Beliveau
Newton
Medium
Sahid Orentino

Bug Description

Note: This is with libvirt < 1.3.0 so may be specific to earlier versions. This has also been verified on mitaka only so far. I haven't had a chance to try on newton/master. It may have something to do with the fact that PFs don't have parent_addrs'

Either with devname or address/vendor/product ids and also with specifying the device_type of type-PF, physical functions are not being included in the PCI stats information. If there are VFs present, they are included, but not the PF itself.

I checked this by running code similar to this in an interactive python session:

import nova.pci.whitelist
from oslo_serialization import jsonutils

filter = nova.pci.whitelist.Whitelist(['[{"address":"0000:05:00.1", "product_id":"154d", "vendor_id":"8086", "physical_network":"physnet", "device_type":"type-PF"}]'])

# the following was extracted from debug logs on compute node where we are seeing the issue
dev_dict = jsonutils.loads('{"dev_id": "pci_0000_05_00_1", "product_id": "154d", "dev_type": "type-PF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_154d", "address": "0000:05:00.1"}')

print filter.specs[0].address.match(dev_dict['address'], dev_dict.get('parent_addr'))

# returns False
# for laughs
print filter.specs[0].address.match(None, dev_dict.get('address'))
# returns True

Tags: pci Edit Tag help
tags: added: pci
Maciej Szankin (mszankin) wrote :

Can you describe this better? Please follow format proposed for bug reports. Would be great if you could include logs from node.

Changed in nova:
status: New → Incomplete
Brent Eagles (beagles) wrote :

I'll see about the logs. In the meantime, I'll clarify. I should've indicated that calls mentioned in the bug report are simulations of what actually gets called in the environment. That is:

print filter.specs[0].address.match(dev_dict['address'], dev_dict.get('parent_addr'))

is the address match part of the PciDeviceSpec::match() method in nova/pci/devspec.py

dev_dict = jsonutils.loads('{"dev_id": "pci_0000_05_00_1", "product_id": "154d", "dev_type": "type-PF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_154d", "address": "0000:05:00.1"}')

is just initializing a device dict from an extract from the debug logs of the resource_tracker when it queries the hypervisor for the list of devices

filter = nova.pci.whitelist.Whitelist(['[{"address":"0000:05:00.1", "product_id":"154d", "vendor_id":"8086", "physical_network":"physnet", "device_type":"type-PF"}]'])

Is simply taking the whitelist from the configuration file, and directly instantiating the whitelist object with it.

This is all to simulate with basically real info from the environment in question what happens when
ResourceTracker::_update_available_resource() instantiates it's pci_tracker member with PciDevTracker. After it instantiates it calls PciDevTracker::update_devices_from_hypervisor_resources() with a complete list of serialized devices obtained by calling libvirt (I included just the relevant one). This in turn calls WhiteList::device_assignable for each device. WhiteList::device_assignable goes through it's list of entries (which I include just one - the one that should match), if there is a match, the PciDevTracker will add it to its list of devices. The example calls and data I provided indicate that the PCI address matching doesn't make sense for PFs. An examination of the call on PciAddress::match() where if the second parameter to match call() representing the parent's physical PCI address is None, and the address is for a physical function - which it is - then False is returned and the device is skipped.

Ultimately, the problem might lie elsewhere, but from an examination of the sample data and the code in question, there does seem to be something wrong with how physical functions are being handled by the code in devspec.py.

Changed in nova:
status: Incomplete → Confirmed
Brent Eagles (beagles) wrote :

Duplicate of https://bugs.launchpad.net/nova/+bug/1618984 - patch(es) in progress against the other BZ

Jay Pipes (jaypipes) on 2016-11-02
Changed in nova:
importance: Undecided → High
Changed in nova:
assignee: nobody → Ludovic Beliveau (ludovic-beliveau)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/363884
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d38d5767d15b24df455b1844dfe53ada2ebf9751
Submitter: Jenkins
Branch: master

commit d38d5767d15b24df455b1844dfe53ada2ebf9751
Author: Ludovic Beliveau <email address hidden>
Date: Wed Aug 31 14:27:43 2016 -0400

    PCI: Fix PCI with fully qualified address

    Specifying a PF passthrough device in the pci_passthrough_whitelist using its
    fully qualified PCI address (no wildcard) causes the device to not be
    properly loaded. The PCI device is then not available to be assigned to any
    guest.

    In this case, the hypervisor reports the PF device without a 'parent_addr'.
    But in the PciAddress, match() is using it when doing the comparison to its
    own address.

    This commit changes the logic of the address matching method in PciDevSpec to
    only try to match the address with a physical function device when a
    'parent_addr' is reported by the hypervisor.

    Change-Id: I5255240871d8ad5c216500f39520339efe46e84b
    Closes-Bug: #1613434

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/393752
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=30deffaca4c0307170bf46cea439dea2c11a8ed9
Submitter: Jenkins
Branch: stable/newton

commit 30deffaca4c0307170bf46cea439dea2c11a8ed9
Author: Ludovic Beliveau <email address hidden>
Date: Wed Aug 31 14:27:43 2016 -0400

    PCI: Fix PCI with fully qualified address

    Specifying a PF passthrough device in the pci_passthrough_whitelist using its
    fully qualified PCI address (no wildcard) causes the device to not be
    properly loaded. The PCI device is then not available to be assigned to any
    guest.

    In this case, the hypervisor reports the PF device without a 'parent_addr'.
    But in the PciAddress, match() is using it when doing the comparison to its
    own address.

    This commit changes the logic of the address matching method in PciDevSpec to
    only try to match the address with a physical function device when a
    'parent_addr' is reported by the hypervisor.

    Change-Id: I5255240871d8ad5c216500f39520339efe46e84b
    Closes-Bug: #1613434
    (cherry picked from commit d38d5767d15b24df455b1844dfe53ada2ebf9751)

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

This issue was fixed in the openstack/nova 14.0.3 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers