Activity log for bug #2008146

Date Who What changed Old value New value Message
2023-02-23 01:22:55 Andy Wu bug added bug
2023-02-23 01:24:12 Andy Wu description Tested this on node with Nvidia Tesla A10 card with vGPU software: nvidia-vgpu-ubuntu-525_525.85.07_amd64.deb channel : yoga/stable OS: jammy After attaching vGPU driver to nova-compute-nvidia-vgpu and reboot the node, the nova-compute-nvidia-vgpu unit is active with status : Unit is ready: NVIDIA GPU found; installed NVIDIA software: 525.85.07 Execute nvidia-smi on the node confirms driver is intalled successfully ubuntu@ps6-rb2-n1:~$ sudo nvidia-smi Thu Feb 23 01:20:24 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.07 Driver Version: 525.85.07 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 On | 00000000:25:00.0 Off | 0 | | 0% 33C P8 22W / 150W | 0MiB / 23028MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ However juju run-action --wait nova-compute-nvidia-vgpu/0 list-vgpu-types does not return anything ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types unit-nova-compute-nvidia-vgpu-5: UnitId: nova-compute-nvidia-vgpu/5 id: "346" results: output: "" status: completed Inside the node, gpu card bus info is 25:00.0 ubuntu@ps6-rb2-n1:~$ lspci -nn | grep -i Nvidia 25:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1) But no virtual functions are created cd /sys/bus/pci/devices/0000\:25\:00.0/ ls | grep virtfn I need create virtual funciton manually /usr/lib/nvidia/sriov-manage -e 0000:25:00.0 after that I can see virtual functions ls | grep virtfn virtfn0 virtfn1 virtfn10 virtfn11 Re-run list-vpu-types ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types unit-nova-compute-nvidia-vgpu-5: UnitId: nova-compute-nvidia-vgpu/5 id: "348" results: output: |- nvidia-588, 0000:25:02.3, NVIDIA A10-1B, num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 nvidia-589, 0000:25:02.3, NVIDIA A10-2B, num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 nvidia-590, 0000:25:02.3, NVIDIA A10-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 nvidia-591, 0000:25:02.3, NVIDIA A10-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12 Tested this on node with Nvidia Tesla A10 card with vGPU software: nvidia-vgpu-ubuntu-525_525.85.07_amd64.deb channel : yoga/stable OS: jammy After attaching vGPU driver to nova-compute-nvidia-vgpu and reboot the node, the nova-compute-nvidia-vgpu unit is active with status : Unit is ready: NVIDIA GPU found; installed NVIDIA software: 525.85.07 Execute nvidia-smi on the node confirms driver is intalled successfully However juju run-action --wait nova-compute-nvidia-vgpu/0 list-vgpu-types does not return anything ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types unit-nova-compute-nvidia-vgpu-5:   UnitId: nova-compute-nvidia-vgpu/5   id: "346"   results:     output: ""   status: completed Inside the node, gpu card bus info is 25:00.0    ubuntu@ps6-rb2-n1:~$ lspci -nn | grep -i Nvidia    25:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1) But no virtual functions are created   cd /sys/bus/pci/devices/0000\:25\:00.0/   ls | grep virtfn I need create virtual funciton manually    /usr/lib/nvidia/sriov-manage -e 0000:25:00.0 after that I can see virtual functions   ls | grep virtfn    virtfn0    virtfn1    virtfn10    virtfn11 Re-run list-vpu-types ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types unit-nova-compute-nvidia-vgpu-5:   UnitId: nova-compute-nvidia-vgpu/5   id: "348"   results:     output: |-       nvidia-588, 0000:25:02.3, NVIDIA A10-1B, num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24       nvidia-589, 0000:25:02.3, NVIDIA A10-2B, num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12       nvidia-590, 0000:25:02.3, NVIDIA A10-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24       nvidia-591, 0000:25:02.3, NVIDIA A10-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
2023-02-23 01:26:27 Andy Wu summary charm does not create vgpu functions charm does not create gpu virtual functions
2023-02-23 01:28:55 Andy Wu summary charm does not create gpu virtual functions can not list vgpu types
2023-02-23 01:29:28 Andy Wu summary can not list vgpu types charm does not create gpu virtual functions
2023-02-23 08:39:23 Alex Kavanagh charm-nova-compute-nvidia-vgpu: status New Incomplete
2023-02-23 15:58:39 Andy Wu attachment added LP2008146-log.tar https://bugs.launchpad.net/charm-nova-compute-nvidia-vgpu/+bug/2008146/+attachment/5649606/+files/LP2008146-log.tar
2023-02-23 17:19:57 Andy Wu charm-nova-compute-nvidia-vgpu: status Incomplete New
2023-02-24 10:15:29 Alex Kavanagh summary charm does not create gpu virtual functions Charm doesn't initialise the driver fully on nvidia-gpu versions >= 11.0 (was: charm does not create gpu virtual functions)
2023-02-24 10:15:33 Alex Kavanagh charm-nova-compute-nvidia-vgpu: status New Triaged
2023-02-24 10:15:37 Alex Kavanagh charm-nova-compute-nvidia-vgpu: importance Undecided Medium
2023-03-02 08:55:09 DUFOUR Olivier bug added subscriber Canonical Field Critical
2023-03-08 15:49:21 Junien F bug added subscriber The Canonical Sysadmins
2023-03-08 15:49:25 Junien F bug added subscriber Junien Fridrick
2023-03-10 23:09:26 Billy Olsen summary Charm doesn't initialise the driver fully on nvidia-gpu versions >= 11.0 (was: charm does not create gpu virtual functions) Charm doesn't initialise SRIOV gpu devices on nvidia-gpu versions >= 11.0 (was: charm does not create gpu virtual functions)
2023-03-11 00:39:27 Nobuto Murata bug added subscriber Nobuto Murata