GPU Driver extension issue (NVIDIA)

Bug #1866407 reported by Joseph Salisbury
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
walinuxagent (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

The Nvidia GPU driver cannot be installed for VM size Standard_NC6 but it is correctly installed for Standard_NV6. This is happening on Ubuntu Server 18.04 LTS.

To repro this issue:

1. Create a Virtual Machine with image Ubuntu Server 18.04 LTS and size Standard_NC6

2. Add extension NvidiaGpuDriver and wait for it to fail

3. Connect to the VM and try to install following package sudo apt install -y xubuntu-desktop

4. You'll see how it is unable to install it and it suggests to run apt --fix-broken install but it doesn't work either.

It should work accordingly to this doc https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-Linux

This may be related to an older bug for an older Nvidia version: bug 1753796

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The following can be used as a work around:

For new VM:
Build VM without Nvidia Extension
sudo apt-get -o Dpkg::Options::="--force-overwrite" install -y nvidia-440
Apply Nvidia Extension for Linux to VM
Continue with setup

For existing VM with failed installation:
sudo apt-get -o Dpkg::Options::="--force-overwrite" install -y nvidia-440
sudo /var/lib/waagent/Microsoft.HpcCompute.NvidiaGpuDriverLinux-1.2.0.0/scripts/enable.sh
 Reboot
Continue with remaining setup

Revision history for this message
Chris Newcomer (cnewcomer) wrote :

The issue I found is that the walinuxagent Extension installs the 16.04 cuda drivers on Ubuntu 18.04. The issue here is the driver has changed a bit with the inclusion of libglx-mesa0 in Ubuntu 18.04, where it was not previously in 16.04.

This causes the failure to install nvidia-driver-* package. It should have the "/usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0" library in the nvidia-driver-* package for 16.04, but not for 18.04 since this is already included in the libglx-mesa0 package.

If you install the 18.04 Nvidia drivers, using the following instructions, it will succeed:
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork

affects: linux-azure (Ubuntu) → walinuxagent (Ubuntu)
Revision history for this message
Chris Newcomer (cnewcomer) wrote :

Adding repository files from each instance

Revision history for this message
Chris Newcomer (cnewcomer) wrote :
Changed in walinuxagent (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.