Nvidia driver (proprietary) fails to load after suspend

Bug #1712384 reported by Brain
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-375 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I'm running Ubuntu 16.04.3 on Asus UX501JW. The laptop has the NVidia Optimus technology (Intel HD4600 and NVidia GM107M (GeForce GTX 960M)). The latest nvidia-384 driver and bumblebee are installed.

After boot-up everything works fine. Until when the laptop gets suspended by closing the lid and then wakes up. Then the syslog is flooded with the below errors and the battery starts draining with 20W as the kernel is trying to reload the failing module. A reboot "cures" it (until the next suspend).

Interestingly - when suspending via the keyboard "Zzz" button this does not happen.

This error appeared recently, a month or a couple ago.
I have been using this set-up for a few years and this wasn't a problem.

The same problem happens with nvidia-375 and older drivers.

Greatly appreciated if anybody could help me to shed some light on this.

Aug 22 17:26:26 glorybook kernel: [46572.508501] pci_bus 0000:02: Allocating resources
Aug 22 17:26:26 glorybook kernel: [46572.508514] pci_bus 0000:3b: Allocating resources
Aug 22 17:26:26 glorybook kernel: [46572.508540] pci_bus 0000:3c: Allocating resources
Aug 22 17:26:26 glorybook kernel: [46572.508633] pci_bus 0000:3d: Allocating resources
Aug 22 17:26:26 glorybook kernel: [46572.710716] pcieport 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
Aug 22 17:26:26 glorybook kernel: [46572.710795] pci_bus 0000:3f: busn_res: can not insert [bus 3f-ff] under [bus 00-7e] (conflicts with (null) [bus 00-7e])
Aug 22 17:26:26 glorybook kernel: [46572.710813] pci 0000:3f:00.0: [10de:139b] type 00 class 0x030200
Aug 22 17:26:26 glorybook kernel: [46572.710830] pci 0000:3f:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
Aug 22 17:26:26 glorybook kernel: [46572.710844] pci 0000:3f:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
Aug 22 17:26:26 glorybook kernel: [46572.710858] pci 0000:3f:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
Aug 22 17:26:26 glorybook kernel: [46572.710865] pci 0000:3f:00.0: reg 0x24: [io 0x0000-0x007f]
Aug 22 17:26:26 glorybook kernel: [46572.710874] pci 0000:3f:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
Aug 22 17:26:26 glorybook kernel: [46572.710882] pci 0000:3f:00.0: Max Payload Size set to 256 (was 128, max 256)
Aug 22 17:26:26 glorybook kernel: [46572.711040] pci 0000:3f:00.0: System wakeup disabled by ACPI
Aug 22 17:26:26 glorybook kernel: [46572.711094] pcieport 0000:00:01.0: PCI bridge to [bus 3f-ff]
Aug 22 17:26:26 glorybook kernel: [46572.711101] pci_bus 0000:3f: busn_res: [bus 3f-ff] end is updated to 3f
Aug 22 17:26:26 glorybook kernel: [46572.711104] pci_bus 0000:3f: Allocating resources
Aug 22 17:26:26 glorybook kernel: [46572.711123] pci 0000:3f:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
Aug 22 17:26:26 glorybook kernel: [46572.711125] pci 0000:3f:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
Aug 22 17:26:26 glorybook kernel: [46572.711128] pci 0000:3f:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
Aug 22 17:26:26 glorybook kernel: [46572.711129] pci 0000:3f:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
Aug 22 17:26:26 glorybook kernel: [46572.711131] pci 0000:3f:00.0: BAR 0: no space for [mem size 0x01000000]
Aug 22 17:26:26 glorybook kernel: [46572.711133] pci 0000:3f:00.0: BAR 0: failed to assign [mem size 0x01000000]
Aug 22 17:26:26 glorybook kernel: [46572.711135] pci 0000:3f:00.0: BAR 6: assigned [mem 0xed080000-0xed0fffff pref]
Aug 22 17:26:26 glorybook kernel: [46572.711137] pci 0000:3f:00.0: BAR 5: assigned [io 0xe080-0xe0ff]
Aug 22 17:26:26 glorybook kernel: [46572.738219] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Aug 22 17:26:26 glorybook kernel: [46572.739214] NVRM: This is a 64-bit BAR mapped above 4GB by the system
Aug 22 17:26:26 glorybook kernel: [46572.739214] NVRM: BIOS or the Linux kernel, but the PCI bridge
Aug 22 17:26:26 glorybook kernel: [46572.739214] NVRM: immediately upstream of this GPU does not define
Aug 22 17:26:26 glorybook kernel: [46572.739214] NVRM: a matching prefetchable memory window.
Aug 22 17:26:26 glorybook kernel: [46572.739216] NVRM: This may be due to a known Linux kernel bug. Please
Aug 22 17:26:26 glorybook kernel: [46572.739216] NVRM: see the README section on 64-bit BARs for additional
Aug 22 17:26:26 glorybook kernel: [46572.739216] NVRM: information.
Aug 22 17:26:26 glorybook kernel: [46572.739224] nvidia: probe of 0000:01:00.0 failed with error -1
Aug 22 17:26:26 glorybook kernel: [46572.739246] nvidia 0000:3f:00.0: enabling device (0000 -> 0001)
Aug 22 17:26:26 glorybook kernel: [46572.739445] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Aug 22 17:26:26 glorybook kernel: [46572.739445] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:3f:00.0)
Aug 22 17:26:26 glorybook kernel: [46572.739446] NVRM: The system BIOS may have misconfigured your GPU.
Aug 22 17:26:26 glorybook kernel: [46572.739449] nvidia: probe of 0000:3f:00.0 failed with error -1
Aug 22 17:26:26 glorybook kernel: [46572.739490] NVRM: The NVIDIA probe routine failed for 2 device(s).
Aug 22 17:26:26 glorybook kernel: [46572.739490] NVRM: None of the NVIDIA graphics adapters were initialized!
Aug 22 17:26:26 glorybook kernel: [46572.739667] nvidia-nvlink: Unregistered the Nvlink Core, major device number 243
Aug 22 17:26:26 glorybook systemd[1]: Starting NVIDIA Persistence Daemon...
Aug 22 17:26:26 glorybook systemd[11864]: nvidia-persistenced.service: Failed at step EXEC spawning /usr/bin/nvidia-persistenced: No such file or directory
Aug 22 17:26:26 glorybook systemd[1]: nvidia-persistenced.service: Control process exited, code=exited status=203
Aug 22 17:26:26 glorybook systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 22 17:26:26 glorybook systemd[1]: nvidia-persistenced.service: Unit entered failed state.
Aug 22 17:26:26 glorybook systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Aug 22 17:26:26 glorybook kernel: [46572.760868] nvidia_modeset: Unknown symbol nvidia_register_module (err -2)
Aug 22 17:26:26 glorybook kernel: [46572.760882] nvidia_modeset: Unknown symbol nvidia_get_rm_ops (err -2)
Aug 22 17:26:26 glorybook kernel: [46572.760891] nvidia_modeset: Unknown symbol nvidia_unregister_module (err -2)
Aug 22 17:26:26 glorybook systemd-udevd[11859]: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
Aug 22 17:26:26 glorybook kernel: [46572.831578] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Aug 22 17:26:26 glorybook kernel: [46572.831837] NVRM: This is a 64-bit BAR mapped above 4GB by the system
Aug 22 17:26:26 glorybook kernel: [46572.831837] NVRM: BIOS or the Linux kernel, but the PCI bridge
Aug 22 17:26:26 glorybook kernel: [46572.831837] NVRM: immediately upstream of this GPU does not define
Aug 22 17:26:26 glorybook kernel: [46572.831837] NVRM: a matching prefetchable memory window.
Aug 22 17:26:26 glorybook kernel: [46572.831838] NVRM: This may be due to a known Linux kernel bug. Please
Aug 22 17:26:26 glorybook kernel: [46572.831838] NVRM: see the README section on 64-bit BARs for additional
Aug 22 17:26:26 glorybook kernel: [46572.831838] NVRM: information.
Aug 22 17:26:26 glorybook kernel: [46572.831843] nvidia: probe of 0000:01:00.0 failed with error -1
Aug 22 17:26:26 glorybook kernel: [46572.831853] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Aug 22 17:26:26 glorybook kernel: [46572.831853] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:3f:00.0)
Aug 22 17:26:26 glorybook kernel: [46572.831853] NVRM: The system BIOS may have misconfigured your GPU.
Aug 22 17:26:26 glorybook kernel: [46572.831855] nvidia: probe of 0000:3f:00.0 failed with error -1
Aug 22 17:26:26 glorybook kernel: [46572.831881] NVRM: The NVIDIA probe routine failed for 2 device(s).
Aug 22 17:26:26 glorybook kernel: [46572.831882] NVRM: None of the NVIDIA graphics adapters were initialized!
Aug 22 17:26:26 glorybook kernel: [46572.831999] nvidia-nvlink: Unregistered the Nvlink Core, major device number 243
Aug 22 17:26:26 glorybook systemd-udevd[11859]: Process '/sbin/modprobe nvidia-drm' failed with exit code 1.
Aug 22 17:26:26 glorybook kernel: [46572.899907] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Aug 22 17:26:26 glorybook kernel: [46572.900188] NVRM: This is a 64-bit BAR mapped above 4GB by the system
Aug 22 17:26:26 glorybook kernel: [46572.900188] NVRM: BIOS or the Linux kernel, but the PCI bridge
Aug 22 17:26:26 glorybook kernel: [46572.900188] NVRM: immediately upstream of this GPU does not define
Aug 22 17:26:26 glorybook kernel: [46572.900188] NVRM: a matching prefetchable memory window.
Aug 22 17:26:26 glorybook kernel: [46572.900189] NVRM: This may be due to a known Linux kernel bug. Please
Aug 22 17:26:26 glorybook kernel: [46572.900189] NVRM: see the README section on 64-bit BARs for additional
Aug 22 17:26:26 glorybook kernel: [46572.900189] NVRM: information.

Version.log:
Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17

Lspci output (after the error happened) attached.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1712384

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Brain (brain) wrote :

I'm sorry, the 'apport-collect 1712384' command does not work.

The "Collecting problem information" window appears, however the moving bar stops moving very soon and the window stops responding, while the python process is taking 100% CPU. Even after 5 minutes there is still no response (and still 100% CPU usage).

Console output (after breaking the collection process):

$ apport-collect 1712384
dpkg-query: no packages found matching linux
^CTraceback (most recent call last):
  File "/usr/share/apport/apport-gtk", line 597, in <module>
    app.run_argv()
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 658, in run_argv
    return self.run_update_report()
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 547, in run_update_report
    self.collect_info(ignore_uninstalled=True)
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 1028, in collect_info
    self.ui_pulse_info_collection_progress()
  File "/usr/share/apport/apport-gtk", line 463, in ui_pulse_info_collection_progress
    while Gtk.events_pending():
KeyboardInterrupt

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Brain (brain)
affects: linux (Ubuntu) → nvidia-graphics-drivers-375 (Ubuntu)
Revision history for this message
Brain (brain) wrote :

Installing the mainline kernel 4.13rc6 intermediately solved the problem until the recent Ubuntu update from approximately 4.9.2017. This also upgraded the 4.10 kernel, which introduced the issue again. Even though the running kernel is 4.13rc6.

It seems there is some unseen conflict or hidden dependency between the kernel package and the nvidia drivers.

Revision history for this message
Cruz Fernandez (cruz-fernandez) wrote :

I'm seeing a similar error on my Dell XPS 15 9550 with Geforce GTX 960M. I've installed nvidia-384 (384.111-0ubuntu0.17.10.1) from the propietary repository. It happens after suspending too!

Revision history for this message
Cruz Fernandez (cruz-fernandez) wrote :

Forgot to add, my Ubuntu installation is 17.10 version.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.