There are failed logs after resume from hibernation in NV6 (GPU passthrough size) VM in Azure:
[ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() failed: 0x5
[ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() failed: 0x5
This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel.
E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6 vcpus, 56 GiB memory) VM in East US 2.
2. Make sure the in-kernel open-source nouveau driver is loaded, or blacklist the nouveau driver and install the official Nvidia GPU driver (please follow https://docs.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup : "Install GRID drivers on NV or NVv3-series VMs" -- the most important step to run the "./NVIDIA-Linux-x86_64-grid.run".)
3. Run hibernation from serial console
# systemctl hibernate
4. After hibernation finishes, start VM and check dmesg
# dmesg|grep fail
Without the patch, we see the error "hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() failed: 0x5" during hibernation when the VM has the Nvidia GPU driver loaded, and after hibernation the GPU driver can no longer receive any MSI/MSI-X interrupts when we check /proc/interrupts.
With the patch, we should no longer see the error, and the GPU driver should still receive interrupts after hibernation.
There are failed logs after resume from hibernation in NV6 (GPU passthrough size) VM in Azure: 0001-0000- 3130-4445313346 32: hv_irq_unmask() failed: 0x5 0001-0000- 3130-4445313346 32: hv_irq_unmask() failed: 0x5
[ 1432.153730] hv_pci 47505500-
[ 1432.167910] hv_pci 47505500-
This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel.
How reproducible:
100%
Steps to Reproduce: /bugs.launchpad .net/ubuntu/ +source/ linux-azure/ +bug/1880032/ comments/ 14 )
1. Start a Standard_NV6 VM in Azure and enable hibernation properly (please refer to https:/
E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6 vcpus, 56 GiB memory) VM in East US 2.
2. Make sure the in-kernel open-source nouveau driver is loaded, or blacklist the nouveau driver and install the official Nvidia GPU driver (please follow https:/ /docs.microsoft .com/en- us/azure/ virtual- machines/ linux/n- series- driver- setup : "Install GRID drivers on NV or NVv3-series VMs" -- the most important step to run the "./NVIDIA- Linux-x86_ 64-grid. run".)
3. Run hibernation from serial console
# systemctl hibernate
4. After hibernation finishes, start VM and check dmesg
# dmesg|grep fail
Actual results: 0001-0000- 3130-4445313346 32: hv_irq_unmask() failed: 0x5 0001-0000- 3130-4445313346 32: hv_irq_unmask() failed: 0x5
[ 1432.153730] hv_pci 47505500-
[ 1432.167910] hv_pci 47505500-
And /proc/interrupts shows that the GPU interrupts are no longer happening.
Expected results:
No failed logs, and the GPU interrupt should still happen after hibernation.
BUG FIX: /lkml.org/ lkml/2020/ 9/4/1268.
I made a fix here: https:/
Without the patch, we see the error "hv_pci 47505500- 0001-0000- 3130-4445313346 32: hv_irq_unmask() failed: 0x5" during hibernation when the VM has the Nvidia GPU driver loaded, and after hibernation the GPU driver can no longer receive any MSI/MSI-X interrupts when we check /proc/interrupts.
With the patch, we should no longer see the error, and the GPU driver should still receive interrupts after hibernation.