VM with single GPU flavor gets two GPUs assigned
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Description
===========
Some VMs with the requested single PCI device (GPU) got provisioned with actual two GPUs attached.
Steps to reproduce
==================
Deploy a GPU-assigned VM with the following Heat template:
$ openstack stack template show vmaas-p1220-kvm1
description: Template to create a single VM for a self service project in CCCC OpenStack
heat_template_
outputs:
instance_ip:
description: The IP address of the deployed instance
value:
get_attr:
- server_1
- first_address
instance_name:
description: Name of the instance
value:
get_attr:
- server_1
- name
parameters:
flavor:
default: vmaas.p9.
description: Type of instance (flavor) to be used
label: Flavor
type: string
image:
default: rhel7.6alt-ppc64le
description: Image to be used for compute instance
label: Image name or ID
type: string
instance_
default: p1220-kvm1-boot
description: Name of instance boot volume
label: Instance disk name
type: string
instance_
default: '200'
description: Size of instance boot volume
label: Instance disk size
type: string
instance_ip:
default: AAA.BB.CC.DD
description: IP address of compute instance
label: Instance IP address
type: string
instance_name:
default: p1220-kvm1
description: Name of compute instance
label: Instance name
type: string
key:
default: ''
description: Name of existing ssh key-pair to be used for compute instance
label: Key name
type: string
project_vlan:
default: '1220'
description: Project VLAN to attach instance to
label: Network name or ID
type: string
resources:
cloud_
properties:
cloud_config:
- content: ==cloud_
encoding: b64
owner: root:root
path: /cloud-config.sh
type: OS::Heat:
cloud_
==cloud_
type: OS::Heat:
cloud_config_run:
properties:
parts:
- config:
- config:
type: OS::Heat:
server_1:
depends_on:
- cloud_config_run
- volume_1
properties:
block_
- boot_index: 0
volume_id:
config_drive: true
flavor:
get_param: flavor
key_name:
get_param: key
metadata:
Flavor:
Image:
Project: XXXXXXXX
Submitter: Portal/ZZZZZZZ
name:
get_param: instance_name
networks:
- fixed_ip:
network:
- ''
- - v
- get_param: project_vlan
user_data:
user_
type: OS::Nova::Server
volume_1:
properties:
image:
get_param: image
metadata:
Project: XXXXXXXX
name:
get_param: instance_
size:
get_param: instance_
type: OS::Cinder::Volume
And flavor:
$ openstack flavor show 02542b5c-
+------
| Field | Value |
+------
| OS-FLV-
| OS-FLV-
| access_project_ids | 4fcaf92a3fa148b
| disk | 0 |
| id | 02542b5c-
| name | vmaas.p9.
| os-flavor-
| properties | aggregate_
| ram | 65536 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 32 |
+------
Expected result
===============
Get a VM assigned with the single PCI device (GPU)
Actual result
=============
Got a VM with two GPUs attached.
p1220-kvm1 ~]$ lspci
0000:00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
0000:00:02.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
0000:00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
0000:00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
0000:00:06.0 VGA compatible controller: Device 1234:1111 (rev 02)
0001:00:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
0002:00:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
$ nvidia-smi
Fri Sep 11 11:28:21 2020
+------
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|======
| 0 Tesla V100-SXM2... On | 00000001:00:01.0 Off | 0 |
| N/A 26C P0 38W / 300W | 0MiB / 32510MiB | 0% Default |
+------
| 1 Tesla V100-SXM2... On | 00000002:00:01.0 Off | 0 |
| N/A 28C P0 39W / 300W | 0MiB / 32510MiB | 0% Default |
+------
+------
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|======
| No running processes found |
+------
# virsh dumpxml instance-00003268
<domain type='kvm' id='24'>
<name>
<uuid>
<metadata>
<nova:instance xmlns:nova="http://
<nova:package version="18.2.1"/>
<
<
<nova:flavor name="vmaas.
<
<nova:owner>
<nova:user uuid="cc738b417
</nova:owner>
</nova:
</metadata>
<memory unit='KiB'
<currentMemory unit='KiB'
<vcpu placement=
<cputune>
<shares>
</cputune>
<resource>
<partition>
</resource>
<os>
<type arch='ppc64le' machine=
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='host-
<topology sockets='1' cores='8' threads='4'/>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy=
<timer name='rtc' tickpolicy=
</clock>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<disk type='network' device='cdrom'>
<driver name='qemu' type='raw' cache='none' discard='unmap'/>
<auth username=
<secret type='ceph' uuid='514c9fca-
</auth>
<source protocol='rbd' name='nova/
<host name='10.0.0.11' port='6789'/>
<host name='10.0.0.12' port='6789'/>
<host name='10.0.0.13' port='6789'/>
</source>
<target dev='sda' bus='scsi'/>
<readonly/>
<alias name='scsi0-
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none' discard='unmap'/>
<auth username=
<secret type='ceph' uuid='046a66b2-
</auth>
<source protocol='rbd' name='cinder-
<host name='10.0.0.11' port='6789'/>
<host name='10.0.0.12' port='6789'/>
<host name='10.0.0.13' port='6789'/>
</source>
<target dev='vda' bus='virtio'/>
<
<alias name='virtio-
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
<controller type='scsi' index='0' model='
<alias name='scsi0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</controller>
<controller type='usb' index='0' model='qemu-xhci'>
<alias name='usb'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<model name='spapr-
<target index='0'/>
<alias name='pci.0'/>
</controller>
<controller type='pci' index='1' model='pci-root'>
<model name='spapr-
<target index='1'/>
<alias name='pci.1'/>
</controller>
<controller type='pci' index='2' model='pci-root'>
<model name='spapr-
<target index='2'/>
<alias name='pci.2'/>
</controller>
<interface type='bridge'>
<mac address=
<source bridge='br-int'/>
<virtualport type='openvswitch'>
<parameters interfaceid=
<
<target dev='tapbb1b7bf
<model type='virtio'/>
<mtu size='9000'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/0'/>
<log file='/
<target type='spapr-
<model name='spapr-vty'/>
</target>
<alias name='serial0'/>
<address type='spapr-vio' reg='0x30000000'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<log file='/
<target type='serial' port='0'/>
<alias name='serial0'/>
<address type='spapr-vio' reg='0x30000000'/>
</console>
<input type='tablet' bus='usb'>
<alias name='input0'/>
<address type='usb' bus='0' port='1'/>
</input>
<input type='keyboard' bus='usb'>
<alias name='input1'/>
<address type='usb' bus='0' port='2'/>
</input>
<input type='mouse' bus='usb'>
<alias name='input2'/>
<address type='usb' bus='0' port='3'/>
</input>
<graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<video>
<model type='vga' vram='16384' heads='1' primary='yes'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</video>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0004' bus='0x04' slot='0x00' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0004' bus='0x05' slot='0x00' function='0x0'/>
</source>
<alias name='hostdev1'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
</hostdev>
<memballoon model='virtio'>
<stats period='10'/>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
<panic model='pseries'/>
</devices>
<seclabel type='dynamic' model='apparmor' relabel='yes'>
<label>
<imagelabel
</seclabel>
<seclabel type='dynamic' model='dac' relabel='yes'>
<label>
<imagelabel
</seclabel>
</domain>
Environment
===========
1. Canonical OpenStack Rocky on ppc64le
dpkg -l | grep nova
ii nova-api-os-compute 2:18.2.
ii nova-common 2:18.2.
ii nova-conductor 2:18.2.
ii nova-consoleauth 2:18.2.
ii nova-novncproxy 2:18.2.
ii nova-placement-api 2:18.2.
ii nova-scheduler 2:18.2.
ii python-novaclient 2:11.0.
ii python3-nova 2:18.2.
2. Which hypervisor did you use?
QEMU-KVM ppc64le version 2.11
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
# dpkg -l | egrep "libvirt|kvm|nova"
ii libvirt-clients 4.0.0-1ubuntu8.13 ppc64el Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.13 ppc64el Virtualization daemon
ii libvirt-
ii libvirt-
ii libvirt0:ppc64el 4.0.0-1ubuntu8.13 ppc64el library for interfacing with different virtualization systems
ii nova-api-metadata 2:18.2.
ii nova-common 2:18.2.
ii nova-compute 2:18.2.
ii nova-compute-kvm 2:18.2.
ii nova-compute-
ii python3-libvirt 4.0.0-1 ppc64el libvirt Python 3 bindings
ii python3-nova 2:18.2.
ii python3-novaclient 2:11.0.
ii qemu-kvm 1:2.11+
Logs & Configs
==============
Logs and configs attached
This is weird, I only see 0004:05:00.0 being allocated for GPU passthrough as tracked in the pci_devices SQL query, but for some reason the guest itself is using both this PCI address and 0004:04:00.0