[linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Medium
|
Unassigned | ||
Groovy |
Fix Released
|
Medium
|
Unassigned |
Bug Description
[Impact]
There are failed logs after resume from hibernation in NV6 (GPU passthrough size) VM in Azure:
[ 1432.153730] hv_pci 47505500-
[ 1432.167910] hv_pci 47505500-
This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel.
[Test Case]
How reproducible:
100%
Steps to Reproduce:
1. Start a Standard_NV6 VM in Azure and enable hibernation properly (please refer to https:/
E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6 vcpus, 56 GiB memory) VM in East US 2.
2. Make sure the in-kernel open-source nouveau driver is loaded, or blacklist the nouveau driver and install the official Nvidia GPU driver (please follow https:/
3. Run hibernation from serial console
# systemctl hibernate
4. After hibernation finishes, start VM and check dmesg
# dmesg|grep fail
Actual results:
[ 1432.153730] hv_pci 47505500-
[ 1432.167910] hv_pci 47505500-
And /proc/interrupts shows that the GPU interrupts are no longer happening.
Expected results:
No failed logs, and the GPU interrupt should still happen after hibernation.
[Regression Potential]
The fix touches the pci-hyperv and can compromise the hyper-v guest drivers. However the change is focuses on the execution path used for hibernation that is still not officially supported.
[Other info]
BUG FIX:
I made a fix here: https:/
Without the patch, we see the error "hv_pci 47505500-
With the patch, we should no longer see the error, and the GPU driver should still receive interrupts after hibernation.
CVE References
Changed in linux-azure (Ubuntu Focal): | |
status: | New → In Progress |
Changed in linux-azure (Ubuntu Groovy): | |
status: | New → Fix Committed |
status: | Fix Committed → In Progress |
description: | updated |
Changed in linux-azure (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in linux-azure (Ubuntu Groovy): | |
importance: | Undecided → Medium |
Changed in linux-azure (Ubuntu): | |
status: | New → Invalid |
Changed in linux-azure (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux-azure (Ubuntu Groovy): | |
status: | In Progress → Fix Committed |
Changed in linux-azure (Ubuntu Groovy): | |
status: | Fix Committed → Fix Released |
The fix is in the PCI tree now:
"PCI: hv: Fix hibernation in case interrupts are not re-create" ( /git.kernel. org/pub/ scm/linux/ kernel/ git/lpieralisi/ pci.git/ commit/ ?h=pci/ hv&id=915cff7f3 8c5e4d47f187f80 49245afc2cb3e50 3 )
https:/