task kworker blocked for more than 120 seconds nouveau

Bug #1788044 reported by David R. Hedges
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Medium
Unassigned
xserver-xorg-video-nouveau (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I was using chromium when the whole system GUI stopped responding at 15:42. This corresponds to the system journal at that point:

Aug 20 15:42:56 dh3930 kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Aug 20 15:42:56 dh3930 kernel: nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
Aug 20 15:42:56 dh3930 kernel: nouveau 0000:01:00.0: fifo: channel 15: killed
Aug 20 15:42:56 dh3930 kernel: nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
Aug 20 15:42:56 dh3930 kernel: nouveau 0000:01:00.0: compiz[7682]: channel 15 killed!
Aug 20 15:45:50 dh3930 kernel: INFO: task kworker/u24:4:14623 blocked for more than 120 seconds.
Aug 20 15:45:50 dh3930 kernel: Not tainted 4.15.0-32-generic #35-Ubuntu
Aug 20 15:45:50 dh3930 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 20 15:45:50 dh3930 kernel: kworker/u24:4 D 0 14623 2 0x80000000
Aug 20 15:45:50 dh3930 kernel: Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]
Aug 20 15:45:50 dh3930 kernel: Call Trace:
Aug 20 15:45:50 dh3930 kernel: __schedule+0x291/0x8a0
Aug 20 15:45:50 dh3930 kernel: schedule+0x2c/0x80
Aug 20 15:45:50 dh3930 kernel: schedule_timeout+0x1cf/0x350
Aug 20 15:45:50 dh3930 kernel: ? nvif_object_ioctl+0x47/0x50 [nouveau]
Aug 20 15:45:50 dh3930 kernel: ? nouveau_bo_rd32+0x2a/0x30 [nouveau]
Aug 20 15:45:50 dh3930 kernel: ? nv84_fence_read+0x2e/0x30 [nouveau]
Aug 20 15:45:50 dh3930 kernel: ? nouveau_fence_no_signaling+0x2a/0x80 [nouveau]
Aug 20 15:45:50 dh3930 kernel: dma_fence_default_wait+0x1c7/0x260
Aug 20 15:45:50 dh3930 kernel: ? dma_fence_release+0xa0/0xa0
Aug 20 15:45:50 dh3930 kernel: dma_fence_wait_timeout+0x3e/0xf0
Aug 20 15:45:50 dh3930 kernel: drm_atomic_helper_wait_for_fences+0x63/0xc0 [drm_kms_helper]
Aug 20 15:45:50 dh3930 kernel: nv50_disp_atomic_commit_tail+0x55/0x3b10 [nouveau]
Aug 20 15:45:50 dh3930 kernel: nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
Aug 20 15:45:50 dh3930 kernel: process_one_work+0x1de/0x410
Aug 20 15:45:50 dh3930 kernel: worker_thread+0x32/0x410
Aug 20 15:45:50 dh3930 kernel: kthread+0x121/0x140
Aug 20 15:45:50 dh3930 kernel: ? process_one_work+0x410/0x410
Aug 20 15:45:50 dh3930 kernel: ? kthread_create_worker_on_cpu+0x70/0x70
Aug 20 15:45:50 dh3930 kernel: ret_from_fork+0x35/0x40

The 'blocked for more than 120 seconds' message and call trace repeated every ~121 seconds until I rebooted. At that point, the following additional line appeared with the 'blocked for more than 120 seconds' message:

Aug 20 16:01:51 dh3930 kernel: nouveau 0000:01:00.0: chromium-browse[14187]: failed to idle channel 20 [chromium-browse[14187]]

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-32-generic 4.15.0-32.35
ProcVersionSignature: Ubuntu 4.15.0-32.35-generic 4.15.18
Uname: Linux 4.15.0-32-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: david 2460 F.... pulseaudio
 /dev/snd/controlC0: david 2460 F.... pulseaudio
CurrentDesktop: Unity:Unity7:ubuntu
Date: Mon Aug 20 16:07:40 2018
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=9e4f3d6a-f1b3-40c0-8c97-97d861a7ce11
InstallationDate: Installed on 2016-10-24 (664 days ago)
InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2)
MachineType: System manufacturer System Product Name
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.15.0-32-generic root=UUID=311cc681-166d-47bd-847d-f41c81578c1a ro rootflags=subvol=@ quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-32-generic N/A
 linux-backports-modules-4.15.0-32-generic N/A
 linux-firmware 1.173.1
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-05-01 (110 days ago)
dmi.bios.date: 05/06/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 4701
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: P9X79
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4701:bd05/06/2014:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnP9X79:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
David R. Hedges (p14nd4) wrote :
Revision history for this message
David R. Hedges (p14nd4) wrote :

Possible duplicate of #1723250 and/or #1723245

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you have a way to reproduce this bug, or was it a one time event?

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: New → Confirmed
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Rick Sayre (whorfin) wrote :
Download full text (20.2 KiB)

This is the closest, most recent report I could find with something which also appears kernel related
Kernel 6.5.0-27 -- works fine
Kernel 6.5.0-28 -- graphics hard-hangs - sddm never displays, vtty can't be activated
   shh works, sddm and x11 processes can not be killed, reboot hangs, hard power cycle required

I see reports like this literally for years, yet seemingly no resolution. Now I'm bit, in a way suspiciously kernel version related
Common internet "use nvidia drivers" wisdom is unworkable as old nvidia hardware is not supported on 22.04
04:00.0 VGA compatible controller: NVIDIA Corporation MCP89 [GeForce 320M] (rev a2)

Again, for me, this is with 22.04, kernel 6.5.0-28

All the various combinations of nouveau flags to kernel boot which have helped reduce garbage or sddm troubles don't help.

Here's a relevant boot log:
Apr 20 21:06:29 Konnekt sddm-greeter[922]: Adding view for "LVDS-1" QRect(0,0 1280x800)
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000216004 put 0000216088 ib_get 00000007 ib_put 00000007 state 800081a4 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 000021608c put 0000217b7c ib_get 00000009 ib_put 0000000a state 80000000 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000218ff0 put 000021932c ib_get 0000000d ib_put 00000012 state 80000000 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000219a48 put 0000219a64 ib_get 0000001f ib_put 00000024 state 80000000 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000219a64 put 0000219a7c ib_get 00000021 ib_put 00000024 state 80000024 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000219a7c put 0000219a8c ib_get 00000023 ib_put 00000024 state 80000000 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000219a8c put 0000219ccc ib_get 00000025 ib_put 0000002a state 80000024 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000219de8 put 0000219df8 ib_get 0000002f ib_put 00000030 state 40000004 (err: INVALID_MTHD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: CACHE_ERROR - ch 3 [sddm-greeter[922]] subc 0 mthd 0000 data 20000000
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0000219ef0 put 000021a2f0 ib_get 00000033 ib_put 00000038 state 80000000 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: fifo: DMA_PUSHER - ch 3 [sddm-greeter[922]] get 0020b10960 put 0020b10ac8 ib_get 00000042 ib_put 00000044 state 80000000 (err: INVALID_CMD) push 00400040
Apr 20 21:06:29 Konnekt kernel: nouveau 0000:04:00.0: ...

Revision history for this message
Rick Sayre (whorfin) wrote (last edit ):

I have found a fix.
First I tried adding "nouveau.modeset=0" to bootflags, which got me to boot but no greeter.
sddm appeared to be happily running but it failed to start the display server
restarting it did not help
"startx" worked, which gave me hope

It appeared the session "seat" had been determined by systemd to have "CanGraphical" false

Then I found this:
https://github.com/mikhailnov/systemd/commit/c00bf275fdfbad3a9db8934b5e266b6abbdb8443

There's a heuristic hack which sets CanGraphical when the kernel starts w/o graphics, but only if this is done via "nomodeset" rather than the driver-specific ".modeset" flags

So I just added "nomodeset" to boot flags, and now not only does 6.5.0-28 boot and land at a working greeter again, but all the weird sddm graphics bugs i'd been encountering since 6.5 are gone.

Hope this helps someone else... And maybe this kernel issue will get fixed, some day

(update - this breaks backlight control)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.