Comment 108 for bug 1803179

Revision history for this message
In , josh.farwell (josh.farwell-linux-kernel-bugs) wrote :

Karol, I tried your kernel patch and the results were very promising. No more kernel lockups!!

My computer is a Gigabyte Aero 15X v8 with an i7-8750H and a GTX 1070. I am running Fedora 28 with kernel version 4.18.

I have been experiencing the hard kernel lockups when nouveau is loaded and the GPU has entered D3 power state. Running `lspci` or trying to suspend the machine locks it up, as do other programs such as the Power Manager in GNOME or the Steam client. Trying to unload the nouveau driver once it has been loaded also results in a kernel lockup. I can use the acpi_osi="Windows 2009" workaround but then the nouveau driver seems to never put the card into the low power state.

My use case is infrequent CUDA and gaming, so my desire is to use the proprietary drivers when I need them and turn the card off when I don't. I am trying to use nouveau as a workaround to power off the card, as the older methods (bbswitch) also give me kernel lockups. I am using the current draw from tlp-stat to figure out when the card is on or off. Luckily, it draws almost an amp(!) so it's easy to tell.

With the PCIe link speed patch applied to nouveau, the kernel lockup issues disappear under certain conditions. If I load nouveau during boot and run X on the Intel card, the card never turns off when it isn't in use, and xorg-x11-drv-nouveau reports a crash after a while. However, if I load nouveau *after* X has started up, it does power down the card and seems to be stable.

Suspend and resume works. Unloading the nouveau module works. Running lspci works.

I am getting some interesting results when I run lspci. Instead of a hard kernel lockup, the lspci stops and "thinks" for a moment corresponding with an increase in current draw. This is indicating to me that the card is turning back on when something tries to get a response from it. After some seconds, nouveau will turn the card off again.

I can dynamically load and unload both the nvidia module and nouveau, which makes this suitable workaround for me. I am curious if setting the link speed to 8.0 would make bbswitch work, and may try it as an experiment.