Brief Description
-----------------
Unable to enable a host as nova hypervisor due to pci device cannot be found if the host has QAT devices (C62x or DH895XCC) configured.
Severity
--------
Major
Steps to Reproduce
------------------
- Install and configure a system where worker nodes have QAT devices configured. e.g.,
[wrsroot@controller-0 ~(keystone_admin)]$ system host-device-list compute-0
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
compute-0:~$ lspci | grep QAT
09:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
09:01.0 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
09:01.1 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
...
- check nova hypervisor-list
Expected Behavior
------------------
- Nova hypervisors exist on system
Actual Behavior
----------------
[wrsroot@controller-0 ~(keystone_admin)]$ nova hypervisor-list
+----+---------------------+-------+--------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------+-------+--------+
+----+---------------------+-------+--------+
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Any system type with QAT devices configured on worker node
Branch/Pull Time/Commit
-----------------------
master as of 2019-03-18
Last Pass
--------------
on f/stein branch in early feb
Timestamp/Logs
--------------
# nova-compute pods are spewing errors so they can't register themselves properly as hypervisors:
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager [req-4f652d4c-da7e-4516-9baa-915265c3fdda - - - - -] Error updating resources for node compute-0.: PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager Traceback (most recent call last):
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 7956, in _update_available_resource_for_node
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager startup=startup)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 727, in update_available_resource
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7098, in get_available_resource
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager self._get_pci_passthrough_devices()
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6102, in _get_pci_passthrough_devices
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6062, in _get_pcidev_info
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager device.update(_get_device_type(cfgdev, address))
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6021, in _get_device_type
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_address, pf_interface=True),
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=pci_addr)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager
Brief Description
-----------------
Unable to enable a host as nova hypervisor due to pci device cannot be found if the host has QAT devices (C62x or DH895XCC) configured.
Severity
--------
Major
Steps to Reproduce controller- 0 ~(keystone_admin)]$ system host-device-list compute-0 ------- -----+- ------- ------+ ------- ---+--- ------- -+----- ------+ ------- ------- ------- ------+ ------- ------- ------- ------- -----+- ------- ------- ------- ------- ------- ----+-- ------- --+---- -----+ ------- -----+- ------- ------+ ------- ---+--- ------- -+----- ------+ ------- ------- ------- ------+ ------- ------- ------- ------- -----+- ------- ------- ------- ------- ------- ----+-- ------- --+---- -----+ ------- -----+- ------- ------+ ------- ---+--- ------- -+----- ------+ ------- ------- ------- ------+ ------- ------- ------- ------- -----+- ------- ------- ------- ------- ------- ----+-- ------- --+---- -----+
------------------
- Install and configure a system where worker nodes have QAT devices configured. e.g.,
[wrsroot@
+------
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------
| pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------
compute-0:~$ lspci | grep QAT
09:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
09:01.0 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
09:01.1 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
...
- check nova hypervisor-list
Expected Behavior
------------------
- Nova hypervisors exist on system
Actual Behavior controller- 0 ~(keystone_admin)]$ nova hypervisor-list ------- ------- ------+ ------- +------ --+ ------- ------- ------+ ------- +------ --+ ------- ------- ------+ ------- +------ --+
----------------
[wrsroot@
+----+-
| ID | Hypervisor hostname | State | Status |
+----+-
+----+-
Reproducibility
---------------
Reproducible
System Configuration ------- ------
-------
Any system type with QAT devices configured on worker node
Branch/Pull Time/Commit ------- ------- --
-------
master as of 2019-03-18
Last Pass
--------------
on f/stein branch in early feb
Timestamp/Logs manager [req-4f652d4c- da7e-4516- 9baa-915265c3fd da - - - - -] Error updating resources for node compute-0.: PciDeviceNotFou ndById: PCI device 0000:09:02.3 not found manager Traceback (most recent call last): manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/compute/ manager. py", line 7956, in _update_ available_ resource_ for_node manager startup=startup) manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/compute/ resource_ tracker. py", line 727, in update_ available_ resource manager resources = self.driver. get_available_ resource( nodename) manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/virt/ libvirt/ driver. py", line 7098, in get_available_ resource manager self._get_ pci_passthrough _devices( ) manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/virt/ libvirt/ driver. py", line 6102, in _get_pci_ passthrough_ devices manager pci_info. append( self._get_ pcidev_ info(name) ) manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/virt/ libvirt/ driver. py", line 6062, in _get_pcidev_info manager device. update( _get_device_ type(cfgdev, address)) manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/virt/ libvirt/ driver. py", line 6021, in _get_device_type manager pci_address, pf_interface=True), manager File "/var/lib/ openstack/ lib/python2. 7/site- packages/ nova/pci/ utils.py" , line 159, in get_ifname_ by_pci_ address manager raise exception. PciDeviceNotFou ndById( id=pci_ addr) manager PciDeviceNotFou ndById: PCI device 0000:09:02.3 not found manager
--------------
# nova-compute pods are spewing errors so they can't register themselves properly as hypervisors:
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.