Unable to create vm with GPU/Crypto passthrough devices

Bug #1824831 reported by Yang Liu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
zhipeng liu

Bug Description

Brief Description
-----------------
Launching vm with GPU or Crypto PF failed due to invalid class_id field for pci alias in nova conf file

Severity
--------
Major

Steps to Reproduce
------------------
Ensure computes have GPU or Crypto QAT pci devices, these devices can be found via system host-device-list $hostname
1. Check system helm-charts-show nova openstack to ensure pci alias for supported gpu and crypto devices are added
2. Check /etc/nova/nova.conf from nova-compute pods
3. Create flavor with crypto or gpu alias via pci_passthrough:alias extra spec
4. Attempt to launch a vm with above flavor

Expected Behavior
------------------
1. class_id should not be included passthrough_whitelist for helm overrides and nova.conf
2. Based on upstream document, I'd expect nova.conf to contain something like this:
[pci]
alias = {"vendor_id":"8086", "product_id":"0435", "device_type":"type-PF", "name":"qat-dh895xcc-pf" }
...
passthrough_whitelist = { "vendor_id": "8086", "product_id": "0435" }
https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
4. VM is launched successfully

Actual Behavior
----------------
1. pci passthrough alias/whitelist were already added with a class_id field:
| | nova:
...

| | pci:
| | alias:
| | type: multistring
| | values:
| | - '{"vendor_id": "8086", "product_id": "0435", "name": "qat-dh895xcc-pf"}'
| | - '{"vendor_id": "8086", "product_id": "0443", "name": "qat-dh895xcc-vf"}'
| | - '{"vendor_id": "8086", "product_id": "37c8", "name": "qat-c62x-pf"}'
| | - '{"vendor_id": "8086", "product_id": "37c9", "name": "qat-c62x-vf"}'
| | - '{"class_id": "030000", "name": "gpu"}'
...

| | overrides:
| | nova_compute:
| | hosts:
| | - conf:
| | nova:
...

| | pci:
| | passthrough_whitelist:
| | type: multistring
| | values:
| | - '{"class_id": "0b4000", "address": "0000:08:00.0"}'
| | - '{"class_id": "030000", "address": "0000:0c:00.0"}'
| | vnc:
| | vncserver_listen: 0.0.0.0
| | vncserver_proxyclient_address: 192.168.206.169
| | name: compute-1

2. /etc/nova/nova.conf in a nova-compute container:
[pci]
alias = {"vendor_id": "8086", "product_id":"0435", "device_type": "type-PF", "name": "qat-dh895xcc-pf"}
alias = {"vendor_id": "8086", "product_id":"0443", "device_type": "type-VF", "name": "qat-dh895xcc-vf"}
alias = {"vendor_id": "8086", "product_id":"0522", "device_type": "type-PF", "name": "gpu"}
passthrough_whitelist = {"class_id": "0b4000", "address": "0000:08:00.0"}
passthrough_whitelist = {"class_id": "030000", "address": "0000:0c:00.0"}

4. I tried to launch the vm from horizon, which returns following error message on horizon:
Invalid PCI alias definition: Additional properties are not allowed (u'class_id' was unexpected) (HTTP 400) (Request-ID: req-47595675-2f86-49af-991a-6dae911b387a)

Reproducibility
---------------
Reproducible

System Configurations
--------------------
Any system with Crypto or GPU passthrough devices
Lab-name: wcp15-22

Branch/Pull Time/Commit
-----------------------
stx master as of "20190410T013000Z"

Last Pass
---------
non-containerized load

Timestamp/Logs
--------------
VM launch attempt was around Mon Apr 15 14:51:36 UTC 2019

Error message:
Invalid PCI alias definition: Additional properties are not allowed (u'class_id' was unexpected) (HTTP 400) (Request-ID: req-47595675-2f86-49af-991a-6dae911b387a)

Test Activity
-------------
Regression Testing

Revision history for this message
Yang Liu (yliu12) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; functionality broken as part of the move to containerized openstack.

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.2.0 stx.containers
Changed in starlingx:
status: New → Triaged
assignee: nobody → Cindy Xie (xxie1)
tags: added: stx.retestneeded
Cindy Xie (xxie1)
Changed in starlingx:
assignee: Cindy Xie (xxie1) → zhipeng liu (zhipengs)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Don Penney (dpenney) wrote :

Review posted for proposed fix:
https://review.opendev.org/657535

Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi Yang,

Could you help retest with my patch, thanks!

Zhipeng

Revision history for this message
chendongqi (chen-dq) wrote :

Be able to create vm with QAT passthrough devices

ISO:
base:6.13 code
patch:
https://review.opendev.org/657535
starlingx/config / sysinv/sysinv/sysinv/sysinv/helm/nova.py
starlingx/config / sysinv/sysinv/sysinv/sysinv/puppet/nova.py

helm-charts:
http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20190530T152953Z/outputs/helm-charts/
stx-openstack-1.0-13-centos-stable-latest.tgz

image:cirros

Device:QAT passthrough device
As we don't have GPU/Crypto passthrough devices, we test with QAT passthrough device instead.

result:
Be able to create vm with QAT passthrough devices

Revision history for this message
yong hu (yhu6) wrote :

go to push for more +2 reviews from core reviewers, in order to merge the patch: https://review.opendev.org/#/c/657535

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/657535
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=05b7f6d4450ce1afc8e3f8b2d41f24ea647176a7
Submitter: Zuul
Branch: master

commit 05b7f6d4450ce1afc8e3f8b2d41f24ea647176a7
Author: zhipengl <email address hidden>
Date: Tue May 7 23:52:00 2019 +0800

    Fix unable to create vm with GPU/Crypto passthrough devices

    class_id should not be included passthrough_whitelist for helm
    overrides and nova.conf

    Verified both QAT and GPU passthrough VM created successfully

    Closes-Bug: #1824831
    Change-Id: Ie045e4dfb3ffde58dedfc99311a1073fb3b8dee3
    Signed-off-by: zhipengl <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Yang Liu (yliu12) wrote :
Download full text (20.4 KiB)

Retest failed with stx2.0 0813 load.

Looking at the nova.conf file, passthrough_whitelist looks fine, but the alias seems to be missing device_type.

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl exec -n openstack nova-compute-compute-1-532206f8-kkqwn -it -- grep alias /etc/nova/nova.conf
alias = {"vendor_id": "8086", "product_id": "0435", "name": "qat-dh895xcc-pf"}
alias = {"vendor_id": "8086", "product_id": "0443", "name": "qat-dh895xcc-vf"}
alias = {"vendor_id": "8086", "product_id": "37c8", "name": "qat-c62x-pf"}
alias = {"vendor_id": "8086", "product_id": "37c9", "name": "qat-c62x-vf"}

Time stamp:
[2019-08-14 20:51:14,903] 301 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server show 5668f02e-086b-49d2-8bad-e4a191cee0fe'

| created | 2019-08-14T20:51:09Z ...

Revision history for this message
Yang Liu (yliu12) wrote :
Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi Yang,
Device type is not mandatory, we test PASS, also without device type.
In my patch, it just remove class_id.
In your latest log, you just test QAT?, I do not see gpu in alias list.

Thanks!
Zhipeng

Revision history for this message
Yang Liu (yliu12) wrote :

I revisited the automated testcase based on Zhipeng's comments. It failed in retest due to it was trying to launch a vm with sriov vif as well as qat vf, and the system I used does not have Ethernet device with sriov.
After removing the sriov vif requirement from the test, the VM with crypto QAT-VF was successfully launched.

# Within nova instance:
localhost:~# lspci -nn | grep --color=never QAT
00:05.0 Co-processor [0b40]: Intel Corporation DH895XCC Series QAT Virtual Function [8086:0443]

I will find another system to verify GPU passthrough.

Changed in starlingx:
status: Confirmed → Fix Released
Revision history for this message
Yang Liu (yliu12) wrote :

GPU device is missing "vendor_id" and "product_id" in the alias, so nova instance ended up with a device from a random passthrough device, in my case, on a compute host with both Crypto and GPU devices, even though the nova flavor specifies "pci_passthrough:alias"="gpu:1", eventually the Crypto VF got used.

My expectation is the whole gpu pf should have been passed to nova instance.

# Here are the devices on the compute:
[sysadmin@controller-0 ~(keystone_admin)]$ system host-device-list compute-1
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_0b_00_0 | 0000:0b:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0f_00_0 | 0000:0f:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+

# flavor extra specs:
| extra_specs | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "2048", "pci_passthrough:alias": "gpu:1"} |

# nova conf for passthrough devices:
controller-0:~$ kubectl exec -n openstack nova-compute-compute-1-532206f8-lvn77 -it -- grep -E "whitelist|alias" /etc/nova/nova.conf
alias = {"vendor_id": "8086", "product_id": "0435", "name": "qat-dh895xcc-pf"}
alias = {"vendor_id": "8086", "product_id": "0443", "name": "qat-dh895xcc-vf"}
alias = {"vendor_id": "8086", "product_id": "37c8", "name": "qat-c62x-pf"}
alias = {"vendor_id": "8086", "product_id": "37c9", "name": "qat-c62x-vf"}
alias = {"name": "gpu"}
passthrough_whitelist = {"address": "0000:0b:00.0"}
passthrough_whitelist = {"address": "0000:0f:00.0"}

# from VM:
localhost:~# lspci -nn | grep -E "0443|0522"
00:05.0 Co-processor [0b40]: Intel Corporation DH895XCC Series QAT Virtual Function [8086:0443]

Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi Yang,

For QAT, we hardcode alias as you can see the list in your above comment.
For GPU, we also hardcode class_id, name, without vendor_id and product_id, as it may come from different vendors.
In our current design, it do not get GPU info and add it to alias list now.
So, you need specify an alias manually with a system parameter to get a GPU passthrough device, such as
system service-parameter-add nova pci_alias gpu-pf="device_id=0522,vendor_id=8086,name=mygpu"

Zhipeng
Thanks!

Revision history for this message
Yang Liu (yliu12) wrote :

Following step is no longer valid.
system service-parameter-add nova pci_alias gpu-pf="device_id=0522,vendor_id=102b,name=gpu"
Invalid service name nova.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Raising the priority of this bug to High as this is basic VM pass-through functionality. If this is indeed broken and requires additional code changes, these will need to be cherrypicked to the stx.2.0 release.

Changed in starlingx:
importance: Medium → High
Revision history for this message
Yang Liu (yliu12) wrote :

I think possible steps for GPU passthrough are as following:
1. Find out valid gpu devices via system host-device-list --a, enable it if not already enabled.
2. Find out all the existing pci alias minus the invalid alias without vendor id and product id via system helm-override-show stx-openstack nova openstack
3. Compose a helm override yaml file containing pci alias found in step2 plus the valid one for gpu with the device id and product id found in step1.
such as below:
conf:
 nova:
  pci:
    alias:
        type: multistring
        values:
        - '{"vendor_id": "8086", "product_id": "0435", "name": "qat-dh895xcc-pf"}'
        - '{"vendor_id": "8086", "product_id": "0443", "name": "qat-dh895xcc-vf"}'
        - '{"vendor_id": "8086", "product_id": "37c8", "name": "qat-c62x-pf"}'
        - '{"vendor_id": "8086", "product_id": "37c9", "name": "qat-c62x-vf"}'
        - '{"vendor_id": "102b", "product_id": "0522","device_type":"type-PF","name": "gpu"}'
4. Override nova helm charts with above yaml file and reapply stx-openstack
5. create flavor with pci_passthrough:alias=gpu:1
6. launch vm using above flavor

Revision history for this message
Yang Liu (yliu12) wrote :
Download full text (3.6 KiB)

However, with stx2.0 and master "2019-08-20_20-59-00" load, even the crypto QAT VF seems to be broken (it was working with stx2.0 2019-08-16_20-59-00 load).
# nova instance launches into error state right away
# compute host does not have virtual QAT devices listed via lspci (both C62x and dh895xcc)

[sysadmin@controller-1 ~(keystone_admin)]$ system host-device-list controller-1
+------------------+--------------+----------+-----------+-----------+---------------------------+-------------------------+-------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+-------------------------+-------------------------------------+-----------+---------+
| pci_0000_02_00_0 | 0000:02:00.0 | 030000 | 1a03 | 2000 | VGA compatible controller | ASPEED Technology, Inc. | ASPEED Graphics Family | 0 | True |
| pci_0000_3d_00_0 | 0000:3d:00.0 | 0b4000 | 8086 | 37c8 | Co-processor | Intel Corporation | C62x Chipset QuickAssist Technology | 0 | True |
| pci_0000_3f_00_0 | 0000:3f:00.0 | 0b4000 | 8086 | 37c8 | Co-processor | Intel Corporation | C62x Chipset QuickAssist Technology | 0 | True |
| pci_0000_da_00_0 | 0000:da:00.0 | 0b4000 | 8086 | 37c8 | Co-processor | Intel Corporation | C62x Chipset QuickAssist Technology | 1 | True |
+------------------+--------------+----------+-----------+-----------+---------------------------+-------------------------+-------------------------------------+-----------+---------+
s[sysadmin@controller-1 ~(keystone_admin)]$ lspci | grep -i c62x
3d:00.0 Co-processor: Intel Corporation C62x Chipset QuickAssist Technology (rev 04)
3f:00.0 Co-processor: Intel Corporation C62x Chipset QuickAssist Technology (rev 04)
da:00.0 Co-processor: Intel Corporation C62x Chipset QuickAssist Technology (rev 04)

[sysadmin@controller-0 ~(keystone_admin)]$ system host-device-list compute-1
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_0b_00_0 | 0000:0b:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0f_00_0 | 0000:0f:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] Ser...

Read more...

Revision history for this message
Yang Liu (yliu12) wrote :

controller-0 and compute-1 logs from stx2.0 0821 load are attached, where compute-1 is available, QAT device is inventoried and can be seen via system host-device-list, but no virtual PCI devices listed in lspci on compute-1.

Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi Yang,

I'm glad to see the test solution you mentioned in comment #16.
It should be able to fix the remain issue in this ticket.

for comment #17/#18, are you sure you used the same test setup for both version?
Please double confirm it, it may be caused by different test host. After that, if you think
it is really a bug, I propose you to submit a new ticker to track this new issue.

Thanks!
Zhipeng

Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi Yang,

Could you help double confirm it as I mentioned in my last comment?
Since the test solution for this bug is clear, do you agree to close this one now?

Thanks!
Zhipeng

Revision history for this message
Yang Liu (yliu12) wrote :

Yes agreed. Closing. Also the issue mentioned in #17 is not seen on Aug25 load.

tags: removed: stx.retestneeded
zhipeng liu (zhipengs)
Changed in starlingx:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers