[sriov] Modifying or removing pci_passthrough_whitelist may result in inconsistent VF availability
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
OpenStack Version: v14 (Newton)
NIC: Mellanox ConnectX-3 Pro
While testing an SR-IOV implementation, we found that pci_passthrough
In the following table, you can see the original implementation that includes an allocated VF. During testing, we commented out the pci_passthrough
MariaDB [nova]> select * from pci_devices;
+------
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr |
+------
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 | 72 | 72 | 6 | 0000:07:00.0 | 1007 | 15b3 | type-PF | pci_0000_07_00_0 | label_15b3_1007 | unavailable | {} | NULL | NULL | 0 | NULL |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:43:23 | 75 | 75 | 6 | 0000:07:00.1 | 1004 | 15b3 | type-VF | pci_0000_07_00_1 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 | 78 | 78 | 6 | 0000:07:00.2 | 1004 | 15b3 | type-VF | pci_0000_07_00_2 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:44:25 | 81 | 81 | 6 | 0000:07:00.3 | 1004 | 15b3 | type-VF | pci_0000_07_00_3 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 | 84 | 84 | 6 | 0000:07:00.4 | 1004 | 15b3 | type-VF | pci_0000_07_00_4 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:43:23 | 87 | 87 | 6 | 0000:07:00.5 | 1004 | 15b3 | type-VF | pci_0000_07_00_5 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 | 90 | 90 | 6 | 0000:07:00.6 | 1004 | 15b3 | type-VF | pci_0000_07_00_6 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:44:51 | 93 | 93 | 6 | 0000:07:00.7 | 1004 | 15b3 | type-VF | pci_0000_07_00_7 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2016-12-29 15:23:36 | 2016-12-29 17:40:25 | 2016-12-29 20:42:26 | 96 | 96 | 6 | 0000:07:01.0 | 1004 | 15b3 | type-VF | pci_0000_07_01_0 | label_15b3_1004 | allocated | {} | 178c733b-
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 231 | 6 | 0000:07:00.0 | 1007 | 15b3 | type-PF | pci_0000_07_00_0 | label_15b3_1007 | available | {} | NULL | NULL | 0 | NULL |
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 234 | 6 | 0000:07:00.1 | 1004 | 15b3 | type-VF | pci_0000_07_00_1 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 237 | 6 | 0000:07:00.2 | 1004 | 15b3 | type-VF | pci_0000_07_00_2 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 240 | 6 | 0000:07:00.3 | 1004 | 15b3 | type-VF | pci_0000_07_00_3 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 243 | 6 | 0000:07:00.4 | 1004 | 15b3 | type-VF | pci_0000_07_00_4 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 246 | 6 | 0000:07:00.5 | 1004 | 15b3 | type-VF | pci_0000_07_00_5 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2017-01-03 22:23:37 | NULL | NULL | 0 | 249 | 6 | 0000:07:00.6 | 1004 | 15b3 | type-VF | pci_0000_07_00_6 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
| 2017-01-03 22:23:38 | NULL | NULL | 0 | 252 | 6 | 0000:07:01.0 | 1004 | 15b3 | type-VF | pci_0000_07_01_0 | label_15b3_1004 | available | {} | NULL | NULL | 0 | 0000:07:00.0 |
+------
A new instance was then booted using a new 'direct' port. The instance was marked in an ERROR state with the following error:
2017-01-03 16:10:10.513 12103 ERROR nova.compute.
Instance instance-0000007e corresponds to the instance UUID in the DB, 178c733b-
root@compute01:# ip link show ens1d1
22: ens1d1: <BROADCAST,
link/ether e4:1d:2d:5b:62:13 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
vf 7 MAC fa:16:3e:27:bd:90, vlan 50, spoof checking on, link-state enable
No attempt was made to provision a different VF, or to re-populate the entries in pci_devices based on the existing VF allocation on the host. I'm not sure what the expected action was meant to be in this circumstance, if any.
A similar bug was reported at: https:/
Please let me know if you need any additional info.
Changed in nova: | |
assignee: | Andrey Volkov (avolkov) → Maciej Kucia (maciejkucia) |
Changed in nova: | |
assignee: | Maciej Kucia (maciejkucia) → nobody |
Changed in nova: | |
status: | In Progress → Triaged |
status: | Triaged → Invalid |
Will need Moshe Levi (moshele in IRC) to take a look at this.