arm64 vm boot failed when set num_pcie_ports to 28

Bug #1865120 reported by norman shen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Kevin Zhao

Bug Description

We are testing OpenStack on Phytium,FT2000PLUS

root@compute01:~# lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 16
NUMA node(s): 8
Model name: Phytium,FT2000PLUS
CPU max MHz: 2200.0000
CPU min MHz: 1000.0000
BogoMIPS: 3600.00
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31
NUMA node4 CPU(s): 32-39
NUMA node5 CPU(s): 40-47
NUMA node6 CPU(s): 48-55
NUMA node7 CPU(s): 56-63
Flags: fp asimd evtstrm crc32

The problem we initially met are we are not able to attach to more than 2 volumes (virtio-blk) if config drive enabled. We somehow work around the problem by using scsi-bus instead.

But we are still interesting to make plug more than 2 virtio-blk devices possible, and after some investigation I think `num_pcie_ports` might be too small (looks like it default to 9 if unspecified), and `pcie-root` does not allow hot plugging, and `pcie-root-port` does not allow more than 1 slots, so the only way I am thinking to mitigate the problem is to increase this option to maximum.

But the current problem is vms with previously working images failed to boot and when I try to virsh console, I only saw the uefi shell console.

Maybe this is not a bug for `code`, but I definitely think it is necessary to improve the doc and make it easier to understand these terms. I am glad to provide to additional details if asked. thanks

Revision history for this message
norman shen (jshen28) wrote :
Download full text (12.5 KiB)

xml for the problematic vm is shown below

virsh # dumpxml 52
<domain type='kvm' id='52'>
  <name>instance-000002f6</name>
  <uuid>809a8d51-8ad0-4472-9479-8d0554a66265</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="18.2.3"/>
      <nova:name>sw-test2</nova:name>
      <nova:creationTime>2020-02-17 05:56:08</nova:creationTime>
      <nova:flavor name="ecs_1C2G40G">
        <nova:memory>2048</nova:memory>
        <nova:disk>40</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="b4653e7431264ebf99809dc8efe9e49b">admin</nova:user>
        <nova:project uuid="5d51ef16f6644af2971560618c93f071">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="d3fe4150-af9f-47af-b3a8-93a31bfc2ebe"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1024</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='aarch64' machine='virt-2.11'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/instance-000002f6_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <gic version='2'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-aarch64</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='457eb676-33da-42ec-9a8c-9293d545c337'/>
      </auth>
      <source protocol='rbd' name='nova.vms/809a8d51-8ad0-4472-9479-8d0554a66265_disk'>
        <host name='172.16.22.67' port='6789'/>
        <host name='172.16.22.68' port='6789'/>
        <host name='172.16.22.69' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <...

Kevin Zhao (kevin-zhao)
Changed in nova:
assignee: nobody → Kevin Zhao (kevin-zhao)
status: New → Confirmed
Revision history for this message
Kevin Zhao (kevin-zhao) wrote :
Revision history for this message
norman shen (jshen28) wrote :

Hi thank you for reply. I am definitely missing something here, but I am wondering why guest failed to boot if I change this value to something other than 0? thanks.

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

I have a similar issue. seems aarch64 qemu only support when num_pcie_ports=15. guest failed to boot when use num_pcie_ports=16. Following is what i get

when using num_pcie_ports=16, guest OS is stuck at UEFI page with following message

```
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
     BLK1: Alias(s):
          VenHw(F9B94AE2-8BA6-409B-9D56-B9B417F53CB3)
     BLK0: Alias(s):
          VenHw(8047DB4B-7E9C-4C0C-8EBC-DFBBAACACE8F)
Press ESC in 5 seconds to skip startup.nsh or any other key to continue.
```

when using num_pcie_ports=15, the guest OS can be boot successfully, i check the uefi manage, found following

```
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
      FS0: Alias(s):HD1b:;BLK2:
          PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)/HD(1,GPT,8AD1EB5A-1F33-44F8-8A74-EEF37D4802F5,0x800,0x113000)
     BLK6: Alias(s):
          VenHw(F9B94AE2-8BA6-409B-9D56-B9B417F53CB3)
     BLK5: Alias(s):
          VenHw(8047DB4B-7E9C-4C0C-8EBC-DFBBAACACE8F)
     BLK1: Alias(s):
          PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)
     BLK3: Alias(s):
          PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)/HD(2,GPT,1F802CD3-DD0B-4DA7-BF90-B928E443E0F0,0x113800,0x4000)
     BLK4: Alias(s):
          PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0)/HD(3,GPT,18DB5C2F-CE88-43D6-8A0F-011E4A1FEE29,0x117800,0x3EE800)
     BLK0: Alias(s):
          PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x0)

Press ESC in 1 seconds to skip startup.nsh or any other key to continue.
Shell>
```

seems when using num_pcie_ports=16, the kvm can not find disks.

the libvirt xml file is attached

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

aarch-16.xml

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

btw, i environment is

* centos 7 4.18.0-193.28.1.el7.aarch64
* libvirt-daemon-4.5.0-36.el7_9.3.aarch64
* qemu-kvm-ev-2.12.0-44.1.el7_8.1.aarch64
* CPU: FT/2000

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.