nvidia-390 causes kernel hang

Bug #1767932 reported by md_5
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-390 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Here is the hung task:

Apr 30 15:21:50 michael-desktop-ubuntu kernel: INFO: task nvidia-modeset:243 blocked for more than 120 seconds.
Apr 30 15:21:50 michael-desktop-ubuntu kernel: Tainted: P IOE 4.15.0-20-generic #21-Ubuntu
Apr 30 15:21:50 michael-desktop-ubuntu kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 30 15:21:50 michael-desktop-ubuntu kernel: nvidia-modeset D 0 243 2 0x80000000
Apr 30 15:21:50 michael-desktop-ubuntu kernel: Call Trace:
Apr 30 15:21:50 michael-desktop-ubuntu kernel: __schedule+0x297/0x8b0
Apr 30 15:21:50 michael-desktop-ubuntu kernel: schedule+0x2c/0x80
Apr 30 15:21:50 michael-desktop-ubuntu kernel: schedule_timeout+0x1cf/0x350
Apr 30 15:21:50 michael-desktop-ubuntu kernel: ? schedule_timeout+0x1cf/0x350
Apr 30 15:21:50 michael-desktop-ubuntu kernel: ? __slab_free+0x14d/0x2c0
Apr 30 15:21:50 michael-desktop-ubuntu kernel: __down+0x91/0xe0
Apr 30 15:21:50 michael-desktop-ubuntu kernel: down+0x41/0x50
Apr 30 15:21:50 michael-desktop-ubuntu kernel: ? down+0x41/0x50
Apr 30 15:21:50 michael-desktop-ubuntu kernel: nvkms_kthread_q_callback+0x65/0xe0 [nvidia_modeset]
Apr 30 15:21:50 michael-desktop-ubuntu kernel: _main_loop+0x76/0x140 [nvidia]
Apr 30 15:21:50 michael-desktop-ubuntu kernel: kthread+0x121/0x140
Apr 30 15:21:50 michael-desktop-ubuntu kernel: ? _raw_q_schedule+0x80/0x80 [nvidia]
Apr 30 15:21:50 michael-desktop-ubuntu kernel: ? kthread_create_worker_on_cpu+0x70/0x70
Apr 30 15:21:50 michael-desktop-ubuntu kernel: ret_from_fork+0x35/0x40

uname:
Linux michael-desktop-ubuntu 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Drivers:
ii libnvidia-cfg1-390:amd64 390.48-0ubuntu3 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-390 390.48-0ubuntu3 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-390:amd64 390.48-0ubuntu3 amd64 NVIDIA libcompute package
ii libnvidia-compute-390:i386 390.48-0ubuntu3 i386 NVIDIA libcompute package
ii libnvidia-decode-390:amd64 390.48-0ubuntu3 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-decode-390:i386 390.48-0ubuntu3 i386 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-390:amd64 390.48-0ubuntu3 amd64 NVENC Video Encoding runtime library
ii libnvidia-encode-390:i386 390.48-0ubuntu3 i386 NVENC Video Encoding runtime library
ii libnvidia-fbc1-390:amd64 390.48-0ubuntu3 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-fbc1-390:i386 390.48-0ubuntu3 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-390:amd64 390.48-0ubuntu3 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-gl-390:i386 390.48-0ubuntu3 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-ifr1-390:amd64 390.48-0ubuntu3 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
ii libnvidia-ifr1-390:i386 390.48-0ubuntu3 i386 NVIDIA OpenGL-based Inband Frame Readback runtime library
ii nvidia-compute-utils-390 390.48-0ubuntu3 amd64 NVIDIA compute utilities
ii nvidia-dkms-390 390.48-0ubuntu3 amd64 NVIDIA DKMS package
ii nvidia-driver-390 390.48-0ubuntu3 amd64 NVIDIA driver metapackage
ii nvidia-headless-no-dkms-390 390.48-0ubuntu3 amd64 NVIDIA headless metapackage - no DKMS
ii nvidia-kernel-common-390 390.48-0ubuntu3 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-390 390.48-0ubuntu3 amd64 NVIDIA kernel source package
ii nvidia-prime 0.8.8 all Tools to enable NVIDIA's Prime
ii nvidia-settings 390.42-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-utils-390 390.48-0ubuntu3 amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-390 390.48-0ubuntu3 amd64 NVIDIA binary Xorg driver

I can't reliably reproduce, but it happens fairly often after reboot (circa 2-5 minutes).

Revision history for this message
md_5 (md-5) wrote :

Apport report attached

Revision history for this message
md_5 (md-5) wrote :

Card is a GTX770, doesn't seem to be reported anywhere.

Revision history for this message
md_5 (md-5) wrote :

Nvidia bug report

Revision history for this message
md_5 (md-5) wrote :

Hang on nvidia-driver-396 396.18-0ubuntu0~gpu18.04.9
as well.

May 04 08:44:06 michael-desktop-ubuntu kernel: INFO: task nvidia-modeset:244 blocked for more than 120 seconds.
May 04 08:44:06 michael-desktop-ubuntu kernel: Tainted: P IOE 4.15.0-20-generic #21-Ubuntu
May 04 08:44:06 michael-desktop-ubuntu kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 04 08:44:06 michael-desktop-ubuntu kernel: nvidia-modeset D 0 244 2 0x80000000
May 04 08:44:06 michael-desktop-ubuntu kernel: Call Trace:
May 04 08:44:06 michael-desktop-ubuntu kernel: __schedule+0x297/0x8b0
May 04 08:44:06 michael-desktop-ubuntu kernel: schedule+0x2c/0x80
May 04 08:44:06 michael-desktop-ubuntu kernel: schedule_timeout+0x1cf/0x350
May 04 08:44:06 michael-desktop-ubuntu kernel: ? schedule_timeout+0x1cf/0x350
May 04 08:44:06 michael-desktop-ubuntu kernel: ? __slab_free+0x14d/0x2c0
May 04 08:44:06 michael-desktop-ubuntu kernel: ? ttwu_do_activate+0x7a/0x90
May 04 08:44:06 michael-desktop-ubuntu kernel: __down+0x91/0xe0
May 04 08:44:06 michael-desktop-ubuntu kernel: down+0x41/0x50
May 04 08:44:06 michael-desktop-ubuntu kernel: ? down+0x41/0x50
May 04 08:44:06 michael-desktop-ubuntu kernel: nvkms_kthread_q_callback+0x65/0xe0 [nvidia_modeset]
May 04 08:44:06 michael-desktop-ubuntu kernel: _main_loop+0x76/0x140 [nvidia]
May 04 08:44:06 michael-desktop-ubuntu kernel: kthread+0x121/0x140
May 04 08:44:06 michael-desktop-ubuntu kernel: ? _raw_q_schedule+0x80/0x80 [nvidia]
May 04 08:44:06 michael-desktop-ubuntu kernel: ? kthread_create_worker_on_cpu+0x70/0x70
May 04 08:44:06 michael-desktop-ubuntu kernel: ret_from_fork+0x35/0x40

Revision history for this message
md_5 (md-5) wrote :

Same on 396.24-0ubuntu0~gpu18.04.1

May 04 22:17:05 michael-desktop-ubuntu kernel: INFO: task nvidia-modeset:245 blocked for more than 120 seconds.
May 04 22:17:05 michael-desktop-ubuntu kernel: Tainted: P IOE 4.15.0-20-generic #21-Ubuntu
May 04 22:17:05 michael-desktop-ubuntu kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 04 22:17:05 michael-desktop-ubuntu kernel: nvidia-modeset D 0 245 2 0x80000000
May 04 22:17:05 michael-desktop-ubuntu kernel: Call Trace:
May 04 22:17:05 michael-desktop-ubuntu kernel: __schedule+0x297/0x8b0
May 04 22:17:05 michael-desktop-ubuntu kernel: schedule+0x2c/0x80
May 04 22:17:05 michael-desktop-ubuntu kernel: schedule_timeout+0x1cf/0x350
May 04 22:17:05 michael-desktop-ubuntu kernel: ? schedule_timeout+0x1cf/0x350
May 04 22:17:05 michael-desktop-ubuntu kernel: ? __slab_free+0x14d/0x2c0
May 04 22:17:05 michael-desktop-ubuntu kernel: ? ttwu_do_activate+0x7a/0x90
May 04 22:17:05 michael-desktop-ubuntu kernel: __down+0x91/0xe0
May 04 22:17:05 michael-desktop-ubuntu kernel: down+0x41/0x50
May 04 22:17:05 michael-desktop-ubuntu kernel: ? down+0x41/0x50
May 04 22:17:05 michael-desktop-ubuntu kernel: nvkms_kthread_q_callback+0x65/0xe0 [nvidia_modeset]
May 04 22:17:05 michael-desktop-ubuntu kernel: _main_loop+0x76/0x140 [nvidia]
May 04 22:17:05 michael-desktop-ubuntu kernel: kthread+0x121/0x140
May 04 22:17:05 michael-desktop-ubuntu kernel: ? _raw_q_schedule+0x80/0x80 [nvidia]
May 04 22:17:05 michael-desktop-ubuntu kernel: ? kthread_create_worker_on_cpu+0x70/0x70
May 04 22:17:05 michael-desktop-ubuntu kernel: ret_from_fork+0x35/0x40

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-390 (Ubuntu):
status: New → Confirmed
Revision history for this message
Aaahh Ahh (woohoomoo2u) wrote :

Cannot confirm but when using nvidia through nvidia-prime, I get kernel hangs at seemingly random times. Does not occur with intel or without nvidia drivers

Revision history for this message
Kirill Romanov (djaler1) wrote :
Download full text (5.5 KiB)

Same shit on GTX 1050 Ti

Apr 26 09:19:47 juno kernel: [ 75.364483] nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
Apr 26 09:19:47 juno kernel: [ 75.365464] BUG: unable to handle kernel paging request at ffff975f5b029100
Apr 26 09:19:47 juno kernel: [ 75.366130] IP: evo_wait+0x5d/0x130 [nouveau]
Apr 26 09:19:47 juno kernel: [ 75.366780] PGD 1333e067 P4D 1333e067 PUD 0
Apr 26 09:19:47 juno kernel: [ 75.367423] Oops: 0002 [#1] SMP PTI
Apr 26 09:19:47 juno kernel: [ 75.368067] Modules linked in: ccm cmac bnep nouveau ttm binfmt_misc nls_iso8859_1 arc4 hid_multitouch dell_wmi dell_smbios_wmi wmi_bmof mxm_wmi dell_wmi_descriptor snd_hda_codec_realtek snd_hda_codec_generic intel_rapl dell_laptop dell_smbios_smm dell_smbios x86_pkg_temp_thermal dcdbas intel_powerclamp coretemp iwlmvm dell_smm_hwmon kvm_intel mac80211 kvm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep irqbypass crct10dif_pclmul snd_pcm crc32_pclmul ghash_clmulni_intel pcbc snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device snd_timer joydev idma64 btusb input_leds virt_dma btrtl btbcm serio_raw btintel
Apr 26 09:19:47 juno kernel: [ 75.371468] snd cfg80211 bluetooth soundcore mei_me intel_lpss_pci processor_thermal_device intel_soc_dts_iosf mei ecdh_generic shpchp intel_pch_thermal intel_lpss int3403_thermal wmi int3402_thermal int340x_thermal_zone intel_hid tpm_crb sparse_keymap acpi_pad int3400_thermal mac_hid acpi_thermal_rel sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbhid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse r8169 ahci drm mii libahci i2c_hid hid video
Apr 26 09:19:47 juno kernel: [ 75.374295] CPU: 7 PID: 74 Comm: kworker/7:1 Tainted: G W 4.15.0-20-generic #21-Ubuntu
Apr 26 09:19:47 juno kernel: [ 75.375203] Hardware name: Dell Inc. Inspiron 15 7000 Gaming/065C71, BIOS 1.5.3 01/25/2018
Apr 26 09:19:47 juno kernel: [ 75.376127] Workqueue: pm pm_runtime_work
Apr 26 09:19:47 juno kernel: [ 75.377076] RIP: 0010:evo_wait+0x5d/0x130 [nouveau]
Apr 26 09:19:47 juno kernel: [ 75.378006] RSP: 0018:ffffaec001b8fc10 EFLAGS: 00010216
Apr 26 09:19:47 juno kernel: [ 75.378934] RAX: ffff975ea0329000 RBX: 000000002eb40060 RCX: 0000000000000000
Apr 26 09:19:47 juno kernel: [ 75.380028] RDX: 000000002eb40040 RSI: 0000000000000007 RDI: ffff975ebf5e2880
Apr 26 09:19:47 juno kernel: [ 75.381160] RBP: ffffaec001b8fc38 R08: 0000000000000067 R09: 0000000000000000
Apr 26 09:19:47 juno kernel: [ 75.382332] R10: ffffaec00205fd10 R11: 0000000000000065 R12: ffff975eac7ee308
Apr 26 09:19:47 juno kernel: [ 75.383281] R13: ffff975ea77b82b0 R14: 0000000000000020 R15: ffff975eac7ee3a8
Apr 26 09:19:47 juno kernel: [ 75.384222] FS: 0000000000000000(0000) GS:ffff975ebf5c0000(0000) knlGS:0000000000000000
Apr 26 09:19:47 juno kernel: [ 75.385171] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 26 09:19:47 juno kernel: [ 75.386117] ...

Read more...

Revision history for this message
md_5 (md-5) wrote :

Kirill, looks to me like you are using the open source nouveau driver.
This is for the proprietary binary nvidia driver.

Revision history for this message
Daniel Cox (danielpcox) wrote :

I had this exact problem today (same message in `dmesg`) which I found while investigating a hang of anything CUDA-related, appearing out of nowhere after my setup had been working for a while.

I was able to fix it by adding `acpi=ht` (or `acpi=off`) to my GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, running `sudo update-grub`, and rebooting.

I've got two nVidia 1080tis in this box, and I'm using the nvidia-396 driver.

Revision history for this message
Jason Priest (justaperson) wrote :

Getting this "task kworker blocked for more than 120 seconds" with nvidia-drivers-390 on Ubuntu 18.04 (Kernel 4.15.0.36-generic). I have a GTX 1070Ti and GTX 770 installed.

Revision history for this message
John Stowers (nzjrs) wrote :
Download full text (25.8 KiB)

I get this every couple of days on our CI - from processes which access the GPU. nvidia-driver 396.24.02. Here are some dmesg warning from various failures

[Mon Nov 5 18:43:29 2018] INFO: task kworker/4:2:25281 blocked for more than 120 seconds.
[Mon Nov 5 18:43:29 2018] Tainted: P OE 4.4.0-127-generic #153~14.04.1-Ubuntu
[Mon Nov 5 18:43:29 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Nov 5 18:43:29 2018] kworker/4:2 D ffff88026a5cbb68 0 25281 2 0x00000000
[Mon Nov 5 18:43:29 2018] Workqueue: events os_execute_work_item [nvidia]
[Mon Nov 5 18:43:29 2018] ffff88026a5cbb68 0000000000000036 ffff880827175a00 ffff88026a5cc000
[Mon Nov 5 18:43:29 2018] ffff88082370a768 0000000000000002 0000000000000000 ffff880827175a00
[Mon Nov 5 18:43:29 2018] ffff88026a5cbb80 ffffffff81818105 7fffffffffffffff ffff88026a5cbc28
[Mon Nov 5 18:43:29 2018] Call Trace:
[Mon Nov 5 18:43:29 2018] [<ffffffff81818105>] schedule+0x35/0x80
[Mon Nov 5 18:43:29 2018] [<ffffffff8181aafb>] schedule_timeout+0x23b/0x2d0
[Mon Nov 5 18:43:29 2018] [<ffffffff810b73bf>] ? enqueue_entity+0x3af/0xbe0
[Mon Nov 5 18:43:29 2018] [<ffffffff81819d85>] __down_common+0xa6/0xf9
[Mon Nov 5 18:43:29 2018] [<ffffffff81819df5>] __down+0x1d/0x1f
[Mon Nov 5 18:43:29 2018] [<ffffffff810c88e1>] down+0x41/0x50
[Mon Nov 5 18:43:29 2018] [<ffffffffc0724d97>] os_acquire_mutex+0x37/0x40 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffffc0cdb9fc>] _nv031564rm+0x5c/0x120 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffffc0b33978>] ? _nv007828rm+0x38/0x120 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffffc0d62ad4>] ? _nv001065rm+0x84/0xe0 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffffc0d663f9>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffff811e3701>] ? kmem_cache_alloc+0x191/0x200
[Mon Nov 5 18:43:29 2018] [<ffffffffc0725101>] ? os_execute_work_item+0x1/0x70 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffffc0725146>] ? os_execute_work_item+0x46/0x70 [nvidia]
[Mon Nov 5 18:43:29 2018] [<ffffffff81099716>] ? process_one_work+0x156/0x400
[Mon Nov 5 18:43:29 2018] [<ffffffff8109a0fa>] ? worker_thread+0x11a/0x480
[Mon Nov 5 18:43:29 2018] [<ffffffff81099fe0>] ? rescuer_thread+0x310/0x310
[Mon Nov 5 18:43:29 2018] [<ffffffff8109f5d8>] ? kthread+0xd8/0xf0
[Mon Nov 5 18:43:29 2018] [<ffffffff81817b52>] ? __schedule+0x2a2/0x820
[Mon Nov 5 18:43:29 2018] [<ffffffff8109f500>] ? kthread_park+0x60/0x60
[Mon Nov 5 18:43:29 2018] [<ffffffff8181be75>] ? ret_from_fork+0x55/0x80
[Mon Nov 5 18:43:29 2018] [<ffffffff8109f500>] ? kthread_park+0x60/0x60
[Mon Nov 5 18:43:29 2018] INFO: task kworker/4:1:3562 blocked for more than 120 seconds.
[Mon Nov 5 18:43:29 2018] Tainted: P OE 4.4.0-127-generic #153~14.04.1-Ubuntu
[Mon Nov 5 18:43:29 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Nov 5 18:43:29 2018] kworker/4:1 D ffff880104db3b68 0 3562 2 0x00000000
[Mon Nov 5 18:43:29 2018] Workqueue: events os_execute_work_item [nvidia]
[Mon Nov 5 18:43:29 2018] ffff880104db3b68 ffffffff81817b46 ffff88044e1f8f00 ffff880104db400...

Revision history for this message
John Stowers (nzjrs) wrote :

BTW: Linux lb-santi 4.4.0-127-generic #153~14.04.1-Ubuntu SMP Sat May 19 14:00:03 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.