2023-02-23 01:24:12 |
Andy Wu |
description |
Tested this on node with Nvidia Tesla A10 card with vGPU software: nvidia-vgpu-ubuntu-525_525.85.07_amd64.deb
channel : yoga/stable
OS: jammy
After attaching vGPU driver to nova-compute-nvidia-vgpu and reboot the node, the nova-compute-nvidia-vgpu unit is active with status : Unit is ready: NVIDIA GPU found; installed NVIDIA software: 525.85.07
Execute nvidia-smi on the node confirms driver is intalled successfully
ubuntu@ps6-rb2-n1:~$ sudo nvidia-smi
Thu Feb 23 01:20:24 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.07 Driver Version: 525.85.07 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10 On | 00000000:25:00.0 Off | 0 |
| 0% 33C P8 22W / 150W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
However juju run-action --wait nova-compute-nvidia-vgpu/0 list-vgpu-types does not return anything
ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types
unit-nova-compute-nvidia-vgpu-5:
UnitId: nova-compute-nvidia-vgpu/5
id: "346"
results:
output: ""
status: completed
Inside the node, gpu card bus info is 25:00.0
ubuntu@ps6-rb2-n1:~$ lspci -nn | grep -i Nvidia
25:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1)
But no virtual functions are created
cd /sys/bus/pci/devices/0000\:25\:00.0/
ls | grep virtfn
I need create virtual funciton manually
/usr/lib/nvidia/sriov-manage -e 0000:25:00.0
after that I can see virtual functions
ls | grep virtfn
virtfn0
virtfn1
virtfn10
virtfn11
Re-run list-vpu-types
ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types
unit-nova-compute-nvidia-vgpu-5:
UnitId: nova-compute-nvidia-vgpu/5
id: "348"
results:
output: |-
nvidia-588, 0000:25:02.3, NVIDIA A10-1B, num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-589, 0000:25:02.3, NVIDIA A10-2B, num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-590, 0000:25:02.3, NVIDIA A10-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-591, 0000:25:02.3, NVIDIA A10-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12 |
Tested this on node with Nvidia Tesla A10 card with vGPU software: nvidia-vgpu-ubuntu-525_525.85.07_amd64.deb
channel : yoga/stable
OS: jammy
After attaching vGPU driver to nova-compute-nvidia-vgpu and reboot the node, the nova-compute-nvidia-vgpu unit is active with status : Unit is ready: NVIDIA GPU found; installed NVIDIA software: 525.85.07
Execute nvidia-smi on the node confirms driver is intalled successfully
However juju run-action --wait nova-compute-nvidia-vgpu/0 list-vgpu-types does not return anything
ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types
unit-nova-compute-nvidia-vgpu-5:
UnitId: nova-compute-nvidia-vgpu/5
id: "346"
results:
output: ""
status: completed
Inside the node, gpu card bus info is 25:00.0
ubuntu@ps6-rb2-n1:~$ lspci -nn | grep -i Nvidia
25:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1)
But no virtual functions are created
cd /sys/bus/pci/devices/0000\:25\:00.0/
ls | grep virtfn
I need create virtual funciton manually
/usr/lib/nvidia/sriov-manage -e 0000:25:00.0
after that I can see virtual functions
ls | grep virtfn
virtfn0
virtfn1
virtfn10
virtfn11
Re-run list-vpu-types
ubuntu@ps6-infra1:~$ juju run-action --wait nova-compute-nvidia-vgpu/5 list-vgpu-types
unit-nova-compute-nvidia-vgpu-5:
UnitId: nova-compute-nvidia-vgpu/5
id: "348"
results:
output: |-
nvidia-588, 0000:25:02.3, NVIDIA A10-1B, num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-589, 0000:25:02.3, NVIDIA A10-2B, num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-590, 0000:25:02.3, NVIDIA A10-1Q, num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-591, 0000:25:02.3, NVIDIA A10-2Q, num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12 |
|