When creating instance with pci-passthrough port getting error

Bug #1836682 reported by sathish subramanian
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
ChenjieXu

Bug Description

Brief Description
-----------------
When creating instance with pci-passthrough port getting error

Severity
--------
Provide the severity of the defect.
Critical: System/Feature is not usable due to the defect

Steps to Reproduce
------------------
 1. Configure pci-passthrough on your interface (eno1)
  DATA0IF=eno1
  PHYSNET2='physnet2'
  COMPUTE=compute-0
  system host-lock ${COMPUTE}
  system datanetwork-add physnet2 vlan
  SRIOVIFUUID=`system host-if-list -a ${COMPUTE}|grep $DATA0IF|awk '{print $2}'`
  system host-if-modify -m 1500 -n sriov -c pci-passthrough -N 5 ${COMPUTE} ${SRIOVIFUUID}
  system interface-datanetwork-assign ${COMPUTE} sriov ${PHYSNET2}
  system host-unlock ${COMPUTE}

 2. Wait for stx-openstack has been re-applied successfully
  system application-list
 3. Create instance on pci-passthough port
  NET=test-sriov
  PHYSNET2='physnet2'
  openstack network segment range create physnet2-a --network-type vlan --physical-network $PHYSNET2 --minimum 400 --maximum 499 --shared
  openstack network create ${NET}-net --mtu 1500 --provider-network-type vlan --provider-physical-network $PHYSNET2
  openstack subnet create --network ${NET}-net --subnet-range 192.168.15.0/24 --ip-version 4 --dhcp ${NET}-subnet
  openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
  openstack flavor set --property "pci_passthrough:alias"="qat-c62x-pf:1" m1.medium.pci_passthrough
  neutron port-create ${NET}-net --name ${NET}-port-0 --binding:vnic_type direct
  openstack image create --file cirros-0.4.0-x86_64-disk.img --disk-format qcow2 --public ${NET}-image
  openstack server create --flavor m1.medium.pci_passthrough --image ${NET}-image --nic port-id=${NET}-port-0 ${NET}-vm0

 4. After creating instance is getting error state

Note: Similar like instance with SRIOV bug (https://bugs.launchpad.net/starlingx/+bug/1835318)

Expected Behavior
------------------
Instance created

Actual Behavior
----------------
Instance failed: Status ERROR
{u'message': u'No valid host was found. There are not enough hosts available.

Reproducibility
---------------
Reproducible/100%

System Configuration
--------------------
Multi-node system, Dedicated storage
Timestamp/Logs
--------------
Attached
compute-0:~$ lspci | grep -i eth
18:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
18:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
3d:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)
3d:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)

Test Activity
-------------
Feature Testing, Regression Testing

Revision history for this message
sathish subramanian (sathis5x) wrote :
Revision history for this message
sathish subramanian (sathis5x) wrote :

Log for helm override(nova)
system helm-override-show stx-openstack nova openstack
 passthrough_whitelist:
             type: multistring
             values:
   - '{"physical_network": "physnet2", "address": "0000:3d:00.0"}'
          - '{"address": "0000:02:00.0"}'
 name: compute-0

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → ChenjieXu (midone)
importance: Undecided → High
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Chenjie to review and check if there is something missing with the steps for the pci-pt case, similar to the sr-iov config (https://bugs.launchpad.net/starlingx/+bug/1835318)

Perhaps this is a case where a data network was not created (which is a known limitation)
https://bugs.launchpad.net/starlingx/+bug/1836313

It would be great to document the steps at: https://wiki.openstack.org/wiki/StarlingX/Networking#Useful_Networking_Commands

Changed in starlingx:
status: New → Triaged
tags: added: stx.2.0
Revision history for this message
sathish subramanian (sathis5x) wrote :

Hi Chenjie,
Any suggestion for PCI-passthrough interface config and VM creation steps.

Revision history for this message
sathish subramanian (sathis5x) wrote :

Observation: I haven't connected PCI-e card into my host. Due to that instance is got failing? hence pci-passthrough:alias set to that flavor is not behaving properly.

Revision history for this message
ChenjieXu (midone) wrote :

HI Sathish,

There are some mistakes with the steps as following:
1. -N is used to configure sriov VF numbers and should not be used here
  system host-if-modify -m 1500 -n sriov -c pci-passthrough -N 5 ${COMPUTE} ${SRIOVIFUUID}
  the command should be
  system host-if-modify -m 1500 -n pcipass -c pci-passthrough ${COMPUTE} ${SRIOVIFUUID}
2. The following commands are used for SR-IOV:
   neutron port-create ${NET}-net --name ${NET}-port-0 --binding:vnic_type direct
   openstack server create --flavor m1.medium.pci_passthrough --image ${NET}-image --nic port-id=${NET}-port-0 ${NET}-vm0
   You should use below command:
   openstack server create --flavor m1.medium.pci_passthrough --image cirros --wait test-pci

Revision history for this message
ChenjieXu (midone) wrote :

Hi all,

The pci alias is not configured by StarlingX. I will submit a patch to fix this. For now you can workaround this by helm override like following:
cat > nova-overrides.yaml <<EOF
conf:
 nova:
  DEFAULT:
    debug: True
  pci:
    alias:
        type: multistring
        values:
        - '{"vendor_id": "8086", "product_id": "1572","device_type":"type-PCI","name": "h210-1"}'
        - '{"vendor_id": "8086", "product_id": "1572","device_type":"type-PCI","name": "h210-2"}'
EOF
system helm-override-update stx-openstack nova openstack --values nova-overrides.yaml
system application-apply stx-openstack
system application-list
nova_schedular_pod_id=$(kubectl get pods –n openstack | grep nova-schedular)
kubectl exec –it $nova_schedular_pod_id –n openstack
in container): cat /etc/nova/nova.conf | less

You can retrieve the vendor_id, product_id from the directory: /sys/bus/pci/devices/$PCI_ADDRESS

Revision history for this message
Matt Peters (mpeters-wrs) wrote :

The pci alias fields are populated by the sysinv helm plugin (see sysinv/helm/nova.py:NovaHelm:_get_pci_alias).

The alias information is based on the default device list/types for QAT and GPU. Currently Ethernet devices are not populated in the alias entry, and I'm not sure it is required. The flavor pci_passthrough:alias should not be required for launching a port/sriov interface.

Can you please confirm the configuration and expected behaviour?

Revision history for this message
sathish subramanian (sathis5x) wrote :
Download full text (4.3 KiB)

Thanks for your valuable comments and find the below observation.

1. PCI-passthrough interface configuration
 DATA0IF=eno1
 PHYSNET2='physnet2'
 system datanetwork-add $PHYSNET2 vlan
 system host-if-modify -m 1500 -n pcipass -c pci-passthrough ${COMPUTE} ${DATA0IFUUID}
 system interface-datanetwork-assign ${COMPUTE} pcipass ${PHYSNET2}

2. Updated nova.conf
 cat > nova-overrides.yaml <<EOF
 conf:
  nova:
   DEFAULT:
     debug: True
   pci:
     alias:
         type: multistring
         values:
         - '{"vendor_id": "8086", "product_id": "37cd","device_type":"type-PCI","name": "intel-X722-pf"}'
         - '{"vendor_id": "8086", "product_id": "37cd","device_type":"type-PCI","name": "intel-X722-vf"}'
 EOF
 system helm-override-update stx-openstack nova openstack --values nova-overrides.yaml
 system application-apply stx-openstack
 system application-list

3. To finding pci device, vender id
 [sysadmin@controller-0 script(keystone_admin)]$ lspci -nn | grep Eth
 18:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
 18:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
 3d:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 10GBASE-T [8086:37d2] (rev 09)
 3d:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 10GBASE-T [8086:37d2] (rev 09)
 3d:02.0 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.1 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.2 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.3 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.4 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.5 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.6 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 3d:02.7 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:37cd] (rev 09)
 af:00.0 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]
 af:00.1 Ethernet controller [0200]: Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]

4. Specific interface status check (UP) and It contains virtual function
 [sysadmin@controller-0 ~(keystone_admin)]$ ip link show eno1
 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
  link/ether a4:bf:01:54:82:cd brd ff:ff:ff:ff:ff:ff
  vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  vf 4 MAC 00:00:00:00:00:00, spoof che...

Read more...

Revision history for this message
sathish subramanian (sathis5x) wrote :

For PCI device,vendor id check:
controller-0:~$ cat /sys/bus/pci/devices/0000:3d:02.1/device
0x37cd
controller-0:~$ cat /sys/bus/pci/devices/0000:3d:02.1/vendor
0x8086

But attaching pci_passthrough:alias to flavor getting error state.

>> openstack flavor set --property "pci_passthrough:alias"="intel-X722-vf:1" m1.tiny

Revision history for this message
sathish subramanian (sathis5x) wrote :

Hi All,
When login into instance(vm) on compute-1, It doesn't matches device_id(0x37cd) in lspci below.
compute-1:~$ sudo virsh console 4

$ lspci -nn
00:00.0 Class 0600: 8086:1237
00:01.0 Class 0601: 8086:7000
00:01.1 Class 0101: 8086:7010
00:01.2 Class 0c03: 8086:7020
00:01.3 Class 0680: 8086:7113
00:02.0 Class 0300: 1013:00b8
00:03.0 Class 0200: 1af4:1000
00:04.0 Class 0100: 1af4:1001
00:05.0 Class 00ff: 1af4:1002

Revision history for this message
sathish subramanian (sathis5x) wrote :

Hi All,
Sorry for the update above, without noticed interface has been mapped with data network(physnet2 assigned to data2, pcipassthrough both interface on compute-1).
Hence the instance got active state. It means instance is created on data interface only, not an pci-passthrough interface.

Issue still persist on instance state as error and followed above mentioned steps.

Please find the log for system interface, instance log.

Whether I have added vendor,device id mapping into yaml is correct?

Revision history for this message
sathish subramanian (sathis5x) wrote :

Hi Chenjie,
Please let me know, if missed any configuration to fix this issue.

Revision history for this message
ChenjieXu (midone) wrote :

Hi Sathish,

The yaml file should be:
cat > nova-overrides.yaml <<EOF
conf:
 nova:
  DEFAULT:
    debug: True
  pci:
    alias:
        type: multistring
        values:
        - '{"vendor_id": "8086", "product_id": "37d2","device_type":"type-PF","name": "intel-X722-pf"}'
        - '{"vendor_id": "8086", "product_id": "37cd","device_type":"type-VF","name": "intel-X722-vf"}'
EOF

"pci_passthrough:alias" should be added to the flavor.

Revision history for this message
ChenjieXu (midone) wrote :

Hi Sathish,

To check whether the NIC has been passed to the VM successfully. You need to use an image which contains the driver required by the passed NIC. You can try the following image:
   http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img

To ssh the VM, you need to create a security group which allows ICMP, TCP, UDP to the VM. You also need to create a keypair. You can refer below commands:
   mdkir -p /home/sysadmin/.ssh/
   vi /home/sysadmin/.ssh/id_rsa
   openstack keypair create key1 --private-key /home/sysadmin/.ssh/id_rsa
   openstack security group create security1
   openstack security group rule create --ingress --protocol icmp --remote-ip 0.0.0.0/0 security1
   openstack security group rule create --ingress --protocol tcp --remote-ip 0.0.0.0/0 security1
   openstack security group rule create --ingress --protocol udp --remote-ip 0.0.0.0/0 security1
   openstack server create --image ubuntu --flavor m1.medium.pci_passthrough --network public-net0 --security-group security1 --key-name key1 test-pci
   (public-net0 is a normal network. You will need an IP allocated from public-net0 to ssh the VM)

Then you can ssh the VM through dhcp namespace:
   ip netns
   sudo ip netns exec $UUID bash
   ssh -i /home/sysadmin/.ssh/id_rsa ubuntu@192.168.101.208

To verify the NIC has been passed, you need to verify the driver has been loaded in the VM:
   lsmod | grep $DRIVER

If the driver has been loaded, you can find a NIC which has the same MAC address as the passed NIC:
   ip link

The PCI address will change when you use below command to check. But it should has the same description as in the host.
   lspci

Revision history for this message
ChenjieXu (midone) wrote :

Hi Matt,

Based on my testing, the flavor pci_passthrough:alias should not be required for passing a VF to the VM. But the flavor pci_passthrough:alias should be required for passing a physical NIC to the VM.

Revision history for this message
sathish subramanian (sathis5x) wrote :
Download full text (3.4 KiB)

Hi Chenjie,

Working scenario-1:
 Note:Used existing network(public-net0) mapped with physnet0(vlan) for instance creation.
 1. Configure helm chart in nova (controller-0)
 echo '8' > /sys/class/net/eno1/device/sriov_numvfs
 system helm-override-update stx-openstack nova openstack --values nova-overrides.yaml
 system helm-override-show stx-openstack nova openstack
 system application-list

 2. Instance creation used pci passthrough alias set that flavor
 mkdir -p /home/sysadmin/.ssh/
 vi /home/sysadmin/.ssh/id_rsa
 openstack keypair create key1 --private-key /home/sysadmin/.ssh/id_rsa
 openstack security group create security1
 openstack security group rule create --ingress --protocol icmp --remote-ip 0.0.0.0/0 security1
 openstack security group rule create --ingress --protocol tcp --remote-ip 0.0.0.0/0 security1
 openstack security group rule create --ingress --protocol udp --remote-ip 0.0.0.0/0 security1

 openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
 openstack flavor set --property "pci_passthrough:alias"="intel-X722-pf:1" m1.medium.pci_passthrough
 openstack image create --file xenial-server-cloudimg-amd64-disk1.img --disk-format qcow2 --public ${NET}-image
 openstack server create --image ${NET}-image --flavor m1.medium.pci_passthrough --network public-net0 --security-group security1 --key-name key1 ${NET}-vm0

 3. No error, Instance is created successfully with active state.
 Note: To ssh the VM, getting No route to host Error.
  controller-0:~/standard_script$ openstack network list
  | cb438bc0-0b9e-44bd-9639-4caf62a8bd8b | public-net0 | 335a7149-a4c3-45df-bf9d-746cb7cf7ab3 |

  compute-1:~$ sudo ip netns
   qdhcp-cb438bc0-0b9e-44bd-9639-4caf62a8bd8b (id: 4)
  compute-1:~$ sudo ip netns exec qdhcp-cb438bc0-0b9e-44bd-9639-4caf62a8bd8b bash
  compute-1:/home/sysadmin# ssh -i /home/sysadmin/.ssh/id_rsa ubuntu@192.168.101.234
  ssh: connect to host 192.168.101.234 port 22: No route to host

Failure scenario-2:
 1. Configure PCI-passthrough interface
 DATA0IF=eno1
 PHYSNET2='physnet2'
 system datanetwork-add $PHYSNET2 vlan
 system host-if-modify -m 1500 -n pcipass -c pci-passthrough ${COMPUTE} ${SRIOVIFUUID}
 system interface-datanetwork-assign ${COMPUTE} pcipass ${PHYSNET2}

 2. Update nova helm chart
 system helm-override-update stx-openstack nova openstack --values nova-overrides.yaml
 system application-apply stx-openstack
 system application-list
 3. Create network with(physnet2) pci-pt configured interface and instance with use flavor of pci passthrough alias set
 openstack network create ${NET}-net --mtu 1500 --provider-network-type vlan --provider-physical-network $PHYSNET2
 openstack subnet create --network ${NET}-net --subnet-range 192.168.15.0/24 --ip-version 4 --dhcp ${NET}-subnet
 openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
 openstack flavor set --property "pci_passthrough:alias"="intel-X722-pf:1" m1.medium.pci_passthrough
 openstack image create --file xenial-server-cloudimg-amd64-disk1.img --disk-format qcow2 --public ${NET}-image
 openstack server create --image ${NET}-image --flavor m1.medium.pci_passthrough --network public-net0 --securi...

Read more...

Revision history for this message
ChenjieXu (midone) wrote :

Hi Sathish,

We need to configure interface(eno1) with pci-passthrough by following command:
   system host-if-modify -m 1500 -n pcipass -c pci-passthrough ${COMPUTE} ${ENO1UUID}
   system interface-datanetwork-assign ${COMPUTE} pcipass ${PHYSNET2}

Some steps are wrong in scenario-1 and scenario-2. To test pci passthrough, you need to do following things:
1. Configure interface(eno1) with pci-passthrough and don't create VF.
2. Update nova helm chart
3. Do not create network, subnet on physnet2 which is bound to eno1. Because this physical NIC will be passed to VM and you should not use this physical NIC as a provider network which will be used by multiple VMs.
4. Create flavor and set property "pci_passthrough:alias". When you create VM using flavor with property "pci_passthrough:alias", Nova will know this VM shall be passed a physical NIC selected by the alias. PCI passthrough is configured by this way not creating network.
4. Create VM by following command:
   openstack server create --image ${NET}-image --flavor m1.medium.pci_passthrough --network public-net0 --security-group security1 --key-name key1 ${NET}-vm0
   Please make sure public-net0 are used instead of ${NET}-net. In scenario-2, I guess you use ${NET}-net to create VM which makes VM fail to launch. (Though the network in your command is public-net0). The created VM will have 2 NICs. One is created from public-net0 and is used for ssh. The other one is the passed NIC eno1.

It's strange to get no route to host Error. Please collect the outputs of following commands:
   ifconfig
   route -n
   ping 192.168.101.234
   Please also try to create a normal VM using image ubuntu, security group security1 and keypair key1. By doing this, we can check whether no route error is not related to pci passthrough or not.

Revision history for this message
ChenjieXu (midone) wrote :

Hi Matt,

I find another way to pass through SR-IOV capable physical NIC to VM. This new way doesn't require to configure "PCI alias". The key point is to create a port whose vnic_type is direct-physical. The following link can be referenced:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/networking_guide/sr-iov-support-for-virtual-networking

However if we try to pass through a physical NIC which doesn't support SR-IOV, we may still need to configure "PCI alias". Because I don't have a physical NIC which doesn't support SR-IOV on my server, I can't test passing such NIC to VM by creating port whose vnic_type is direct-physical.

Revision history for this message
ChenjieXu (midone) wrote :
Download full text (3.4 KiB)

Hi Sathish,

Could you please also try to pass through SR-IOV capable physical NIC to VM by creating port whose vnic_type is direct-physical? The following commands can be referenced:

1. Use system command to configure interface. One is used for PCI PASSTHROUGH and the other is a normal interface.
   export COMPUTE=controller-0
   PHYSNET0='physnet0'
   PHYSNET1='physnet1'
   system host-lock ${COMPUTE}
   system datanetwork-add ${PHYSNET0} vlan
   system datanetwork-add ${PHYSNET1} vlan
   system host-if-list -a ${COMPUTE}
   system host-if-modify -m 1500 -n pcipass -c pci-passthrough ${COMPUTE} ${DATA0IFUUID}
   system host-if-modify -m 1500 -n data1 -c data ${COMPUTE} ${DATA1IFUUID}
   system interface-datanetwork-assign ${COMPUTE} ${DATA0IFUUID} ${PHYSNET0}
   system interface-datanetwork-assign ${COMPUTE} ${DATA1IFUUID} ${PHYSNET1}
   system interface-datanetwork-list ${COMPUTE}
   system host-unlock ${COMPUTE}
   # make sure stx-openstack has been re-applied successfully
   system application-list

2. Create keypair and security group
   mkdir -p /home/sysadmin/.ssh/
   vi /home/sysadmin/.ssh/id_rsa
   openstack keypair create key1 --private-key /home/sysadmin/.ssh/id_rsa
   openstack security group create security1
   openstack security group rule create --ingress --protocol icmp --remote-ip 0.0.0.0/0 security1
   openstack security group rule create --ingress --protocol tcp --remote-ip 0.0.0.0/0 security1
   openstack security group rule create --ingress --protocol udp --remote-ip 0.0.0.0/0 security1

3. Create networks and subnets. Upload the ubuntu image.
   export OS_CLOUD=openstack_helm
   ADMINID=`openstack project list | grep admin | awk '{print $2}'`
   PHYSNET0='physnet0'
   PHYSNET1='physnet1'
   PUBLICNET0='public-net0'
   PUBLICNET1='public-net1'
   PUBLICSUBNET0='public-subnet0'
   PUBLICSUBNET1='public-subnet1'
   openstack network segment range create ${PHYSNET0}-a --network-type vlan --physical-network ${PHYSNET0} --minimum 400 --maximum 499 --private --project ${ADMINID}
   openstack network segment range create ${PHYSNET1}-a --network-type vlan --physical-network ${PHYSNET1} --minimum 500 --maximum 599 --private --project ${ADMINID}
   openstack network create --project ${ADMINID} --provider-network-type=vlan --provider-physical-network=${PHYSNET0} --provider-segment=400 ${PUBLICNET0}
   openstack network create --project ${ADMINID} --provider-network-type=vlan --provider-physical-network=${PHYSNET1} --provider-segment=500 ${PUBLICNET1}
   openstack subnet create --project ${ADMINID} ${PUBLICSUBNET0} --network ${PUBLICNET0} --subnet-range 192.168.101.0/24
   openstack subnet create --project ${ADMINID} ${PUBLICSUBNET1} --network ${PUBLICNET1} --subnet-range 192.168.102.0/24
   wget http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img
   openstack image create --container-format bare --disk-format qcow2 --file xenial-server-cloudimg-amd64-disk1.img ubuntu
   openstack image list

4. Create PF port whose vnic_type is direct-physical
   net_id=`neutron net-show ${PUBLICNET0} | grep "\ id\ " | awk '{ print $4 }'`
   port_id=`neutron port-create $net_id --name pf-port --binding:vnic_type ...

Read more...

Revision history for this message
sathish subramanian (sathis5x) wrote :

Hi Chenjie,
Thanks for your support.

1. Configure interface(eno1) with pci-passthrough and don't create VF.
 Note: With out creating VF helpchart apply failed. Then create VF on eno1 and applied stx-openstack
  echo '8' > /sys/class/net/eno1/device/sriov_numvfs
2. Update nova helm chart
3. Used exiting public net, But not created subnet on physnet2
4. Create flavor and set property "pci_passthrough:alias".
5. Created VM successfully
       openstack server create --image ${NET}-image --flavor m1.medium.pci_passthrough --network public-net0 --security-group security1 --key-name key1 ${NET}-vm0

Instance ip: public-net0=192.168.101.243

About no route to host Error, please find the attachment log.

Revision history for this message
ChenjieXu (midone) wrote :

Hi Sathish,

No route to host error should not be related to PCI PASSTHROUGH. According to the following bug, your 2+2+2 system has a bug that VM can't running correctly. The console will freeze when log into the VM using command "virsh console $num". It means the VM is not running and ping will fail.
https://bugs.launchpad.net/starlingx/+bug/1835575

Could you please try creating a normal VM and ssh into it?

Revision history for this message
ChenjieXu (midone) wrote :

Hi Sathish,

Could you please also try pass through SR-IOV capable physical NIC to VM by creating port whose vnic_type is direct-physical? Based on the discussion with Matt in email, this way should be used by StarlingX.

Changed in starlingx:
status: Triaged → Incomplete
Revision history for this message
sathish subramanian (sathis5x) wrote :

Yes Chenjie. This bug sighting is for instance error, not for instance ping.

Also getting expected test behaviour, So recommend to close this bug.

Update for when creating instance with vnic-type port and not adding flavor --property "pci_passthrough:alias", It gets instance ERROR state.
openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
Note: Without adding pci_passthrough:alias to flavor
#openstack flavor set --property "pci_passthrough:alias"="intel-X722-pf:1" m1.medium.pci_passthrough

openstack port create --vnic-type direct-physical --network public-net0 ${NET}-Port0
openstack server create --flavor m1.medium.pci_passthrough --nic port-id=${NET}-Port0 --image ${NET}-image --security-group security1 --key-name key1 ${NET}-vm0

Revision history for this message
ChenjieXu (midone) wrote :

Hi Sathish,

The second way should be used by StarlingX. Based on that you still get error on second way, we need to figure out what's wrong in your steps. What's more, physical NIC i210 should be tested to see whether the second way can cover all physical NICs or not.

For your testing on the second way, have your configured the public-net0 as described in comment 20?
https://bugs.launchpad.net/starlingx/+bug/1836682/comments/20

Revision history for this message
Yang Liu (yliu12) wrote :

I think this LP mixed up the Ethernet devices and QAT devices.
- The system host-if-modify + --vnic-type in neutron are for Ethernet devices.
- And for QAT devices, the devices are automatically populated in system host-device-list, and the corresponding pci alias should also be automatically added to nova helm override. Then user can set the nova flavor to use the QAT device.

Revision history for this message
ChenjieXu (midone) wrote :

Agree with Yang. This bug should test:
- The system host-if-modify + --vnic-type in neutron are for Ethernet devices.

Revision history for this message
Paulina Flores (paulina-flores) wrote :

Hi everyone,

Thanks for your collaboration and efforts. I've run this test case with all the information gathered in here and have marked it as passed. I got the PCI alias from the nova.conf document inside the compute nova pod and set that property for the flavour, and the instance spawned correctly in the corresponding compute.

As such, here are the steps used:

Steps:
1. Ssh to host compute and configure VF on eno1 port
 echo '8' > /sys/class/net/eno1/device/sriov_numvfs
2. Lock host machine
3. Configure interface (eno1) as pci-pt
  DATA0IF=eno1
  PHYSNET0='physnet0'
  SPL=/tmp/tmp-system-port-list
  SPIL=/tmp/tmp-system-host-if-list

  for COMPUTE in compute-0 compute-1; do
    echo "Configuring interface for: $COMPUTE"
    set -ex
    system host-port-list ${COMPUTE} --nowrap > ${SPL}
    system host-if-list -a ${COMPUTE} --nowrap > ${SPIL}
    DATA0PCIADDR=$(cat $SPL | grep $DATA0IF |awk '{print $8}')
    DATA0PORTUUID=$(cat $SPL | grep ${DATA0PCIADDR} | awk '{print $2}')
    DATA0PORTNAME=$(cat $SPL | grep ${DATA0PCIADDR} | awk '{print $4}')
    DATA0IFUUID=$(cat $SPIL | awk -v DATA0PORTNAME=$DATA0PORTNAME '($12 ~ DATA0PORTNAME) {print $2}')
    system host-if-modify -m 1500 -n pcipass -c pci-passthrough ${COMPUTE} ${DATA0IFUUID}
    system interface-datanetwork-assign ${COMPUTE} pcipass ${PHYSNET0}
    set +ex
  done

4. Find physical and vitual device IDs using lspci -nn | grep Eth
5. Unlock host machine
6. Wait for stx-openstack application status is complete
7. Create an instance using pcipassthrough alias set to flavor
 mkdir -p /home/sysadmin/.ssh/
 vi /home/sysadmin/.ssh/id_rsa
 openstack keypair create key1 --private-key /home/sysadmin/.ssh/id_rsa
 openstack security group create security1
 openstack security group rule create --ingress --protocol icmp --remote-ip 0.0.0.0/0 security1
 openstack security group rule create --ingress --protocol tcp --remote-ip 0.0.0.0/0 security1
 openstack security group rule create --ingress --protocol udp --remote-ip 0.0.0.0/0 security1

        kubectl -n openstack get pod | grep nova-compute
        kubectl -n openstack exec -it ${nova-compute-agent} bash
        cd /etc/nova/
        cat nova.conf

        [pci]
        alias = {"vendor_id": "8086", "product_id": "0435", "name": "qat-dh895xcc-pf"}
        alias = {"vendor_id": "8086", "product_id": "0443", "name": "qat-dh895xcc-vf"}
        alias = {"vendor_id": "8086", "product_id": "37c8", "name": "qat-c62x-pf"}
        alias = {"vendor_id": "8086", "product_id": "37c9", "name": "qat-c62x-vf"}

 openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
 openstack flavor set --property "pci_passthrough:alias"="${alias}:0" m1.medium.pci_passthrough
 openstack image create --file xenial-server-cloudimg-amd64-disk1.img --disk-format qcow2 --public ${NET}-image
        openstack server create --image ${NET}-image --flavor m1.medium.pci_passthrough --network public-net0 --security-group security1 --key-name key1 ${NET}-vm0

Revision history for this message
ChenjieXu (midone) wrote :

Hi all,

The steps have been documented in the following page:
https://wiki.openstack.org/wiki/StarlingX/Networking#Useful_Networking_Commands

Based on the discussion in StarlingX bi-weekly networking sub-project meeting on 08/22/2019, I will test the way binding a port with vnic_type direct_physical with I210 NIC which doesn't support SR-IOV.

Revision history for this message
ChenjieXu (midone) wrote :

Hi Matt,

I have finished my testing with I210 NIC and the result shows that this way should be only used for SR-IOV physical function. And the blueprint "SR-IOV physical functions assignment with Neutron port" also indicates the same thing.
https://blueprints.launchpad.net/nova/+spec/sriov-pf-passthrough-neutron-port

For this limitation, which way do you suggest?
1. Record as a known limitation and document how to pass through NICs which don't support SR-IOV. Like below: users need to override helm with PCI alias like following:
 cat > nova-overrides.yaml <<EOF
   conf:
   nova:
     DEFAULT:
       debug: True
     pci:
       alias:
           type: multistring
           values:
           - '{"vendor_id": "8086", "product_id": "37d2","device_type":"type-PCI","name": "pci-pass"}'
   EOF
  system helm-override-update stx-openstack nova openstack --values nova-overrides.yaml

2. Generate the helm override automatically. If this way is chosen, do you think this should be fixed in stx 2.0 or stx 3.0?

Revision history for this message
ChenjieXu (midone) wrote :

Hi all,

As aligned with Matt by email, the first way has been adopted and the limitation has been documented in the below page:
https://wiki.openstack.org/wiki/StarlingX/Networking#Useful_Networking_Commands

Changed in starlingx:
status: Incomplete → Invalid
tags: added: stx.docs
Changed in starlingx:
status: Invalid → In Progress
Changed in starlingx:
status: In Progress → Invalid
Changed in starlingx:
status: Invalid → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to docs (master)

Reviewed: https://review.opendev.org/c/starlingx/docs/+/853359
Committed: https://opendev.org/starlingx/docs/commit/bc0870eade7842a968e37da3742a3571e9adaca1
Submitter: "Zuul (22348)"
Branch: master

commit bc0870eade7842a968e37da3742a3571e9adaca1
Author: Thales Elero Cervi <email address hidden>
Date: Tue Aug 16 15:59:13 2022 -0300

    PCI-PT configuration when SR-IOV is not available (stx 7.0, stx8, ds7)

    There is a known limitation [1] and NICs that do not support SR-IOV
    require a different procedure [2] when configuring PCI-PT.

    This change adds a note on checking SR-IOV support for the target NIC,
    when configuring PCI-Passthrough for it, and adds the necessary
    steps for the configuration to work properly whit this type of NIC.

    For completeness, it also duplicates the PCI-PT example for when
    configuring PCI SRIOV Ethernet Interfaces, with the necessary
    changes to the procedure.

    [1] https://bugs.launchpad.net/starlingx/+bug/1836682
    [2] https://wiki.openstack.org/wiki/StarlingX/Networking#Useful_Networking_Commands

    Partial-bug: 1836682

    Signed-off-by: Thales Elero Cervi <email address hidden>
    Change-Id: I7258ab34cb7ce69a2f4b82c682f72d9467d95c70

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to docs (r/stx.7.0)

Fix proposed to branch: r/stx.7.0
Review: https://review.opendev.org/c/starlingx/docs/+/861840

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to docs (r/stx.7.0)

Reviewed: https://review.opendev.org/c/starlingx/docs/+/861840
Committed: https://opendev.org/starlingx/docs/commit/c4aaec1c73856332f7deb08f4cc0a5523061892e
Submitter: "Zuul (22348)"
Branch: r/stx.7.0

commit c4aaec1c73856332f7deb08f4cc0a5523061892e
Author: Thales Elero Cervi <email address hidden>
Date: Tue Aug 16 15:59:13 2022 -0300

    CI-PT configuration when SR-IOV is not available (stx 7.0, stx8, ds7)

    There is a known limitation [1] and NICs that do not support SR-IOV
    require a different procedure [2] when configuring PCI-PT.

    This change adds a note on checking SR-IOV support for the target NIC,
    when configuring PCI-Passthrough for it, and adds the necessary
    steps for the configuration to work properly whit this type of NIC.

    For completeness, it also duplicates the PCI-PT example for when
    configuring PCI SRIOV Ethernet Interfaces, with the necessary
    changes to the procedure.

    Fix merge conflict.

    [1] https://bugs.launchpad.net/starlingx/+bug/1836682
    [2] https://wiki.openstack.org/wiki/StarlingX/Networking#Useful_Networking_Commands

    Partial-bug: 1836682

    Signed-off-by: Thales Elero Cervi <email address hidden>
    Change-Id: I7258ab34cb7ce69a2f4b82c682f72d9467d95c70
    (cherry picked from commit bc0870eade7842a968e37da3742a3571e9adaca1)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.