Xorg X11 nouveau video driver for NVIDIA graphics chipsets locks up (hangs)

Bug #1723758 reported by Metta Crawler on 2017-10-15
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Nouveau Xorg driver
Unknown
Critical
Fedora
Confirmed
Critical
xserver-xorg-video-nouveau (Ubuntu)
Undecided
Unassigned

Bug Description

This is an intermittent issue. Rumored to be fixed in Ubuntu 17.04 according to comments in bug 1645375

Oct 15 11:36:26 lakshmi kernel: nouveau 0000:01:00.0: timeout at /build/linux-Pcn0xK/linux-4.4.0/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chang84.c:111/g84_fifo_chan_engine_fini()!

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: xserver-xorg-video-nouveau 1:1.0.12-1build2
ProcVersionSignature: Ubuntu 4.4.0-97.120-generic 4.4.87
Uname: Linux 4.4.0-97-generic x86_64
.tmp.unity_support_test.0:

.tmp.unity_support_test.1:

ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0'
CompositorUnredirectFSW: true
Date: Sun Oct 15 12:27:28 2017
DistUpgraded: Fresh install
DistroCodename: xenial
DistroVariant: ubuntu
DkmsStatus:
 iscsitarget, 1.4.20.3+svn502, 4.4.0-96-generic, x86_64: installed
 iscsitarget, 1.4.20.3+svn502, 4.4.0-97-generic, x86_64: installed
ExtraDebuggingInterest: Yes, including running git bisection searches
GraphicsCard:
 NVIDIA Corporation GT218 [GeForce 210] [10de:0a65] (rev a2) (prog-if 00 [VGA controller])
   Subsystem: Micro-Star International Co., Ltd. [MSI] GT218 [GeForce 210] [1462:2011]
InstallationDate: Installed on 2016-01-13 (641 days ago)
InstallationMedia: Ubuntu-Server 15.10 "Wily Werewolf" - Release amd64 (20151021)
MachineType: System manufacturer System Product Name
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-97-generic root=/dev/mapper/vg0-lv0 ro
SourcePackage: xserver-xorg-video-nouveau
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/05/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3029
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M4A89GTD-PRO/USB3
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3029:bd07/05/2012:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM4A89GTD-PRO/USB3:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer
version.compiz: compiz 1:0.9.12.2+16.04.20160823-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.76-1~ubuntu16.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 17.0.7-0ubuntu0.16.04.2
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 17.0.7-0ubuntu0.16.04.2
version.xserver-xorg-core: xserver-xorg-core 2:1.18.4-0ubuntu0.6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.10.1-1ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.7.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1.2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.12-1build2
xserver.bootTime: Thu Oct 12 15:08:58 2017
xserver.configfile: default
xserver.devices:
 input Power Button KEYBOARD, id 6
 input Power Button KEYBOARD, id 7
 input Microsoft Comfort Curve Keyboard 2000 KEYBOARD, id 8
 input Microsoft Comfort Curve Keyboard 2000 KEYBOARD, id 9
 input Microsoft Comfort Optical Mouse 1000 MOUSE, id 10
xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.18.4-0ubuntu0.6
xserver.video_driver: nouveau

Xorg hangs randomly with nouveau driver. It could be reproduced cometimes when playing video or starting libreoffice, but not limited to. If pressing Ctrl+Alt+Backspace, monitor goes to sleep immediately. Alt+Sysrq combinations are usually working and also ssh.

System journal contains:
 kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 16 [soffice.bin[30009]] subc 5 mthd 0184 data beef0201

... and many similar lines with changing numbers after subc, mthd and data.
Folowed by:
/usr/libexec/gdm-x-session[11881]: QXcbConnection: XCB error: 3 (BadWindow), sequence: 55765, resource id: 100663298, major code: 18 (ChangeProperty), minor code: 0

and
kernel: nouveau 0000:01:00.0: gr: TRAP_CCACHE 00000001 [FAULT]
kernel: nouveau 0000:01:00.0: gr: TRAP_CCACHE 000e0080 00000000 00000000 00000000 00000000 00000004 00000000
kernel: nouveau 0000:01:00.0: gr: 00200000 [] ch 16 [001eb0f000 soffice.bin[30009]] subc 3 class 8597 mthd 13bc data 00000054
kernel: nouveau 0000:01:00.0: fb: trapped read at 002027ff00 on channel 16 [1eb0f000 soffice.bin[30009]] engine 00 [PGRAPH] client 05 [CCACHE] subclient 00 [CB] reason 00.......
kernel: nouveau 0000:01:00.0: gr: PGRAPH TLB flush idle timeout fail
kernel: nouveau 0000:01:00.0: gr: PGRAPH_STATUS 00000503 [BUSY DISPATCH CTXPROG CCACHE_PREGEOM]
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS0: 00000008 [CCACHE]
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS1: 00000000 []
kernel: nouveau 0000:01:00.0: gr: PGRAPH_VSTATUS2: 00000000 []
(EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/libexec/Xorg (mieqEnqueue+0x253) [0x578753]
(EE) 1: /usr/libexec/Xorg (QueuePointerEvents+0x52) [0x44f352]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x30eb) [0x7f1f83f13dfb]
(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x3855) [0x7f1f83f15035]
(EE) 4: /usr/libexec/Xorg (DPMSSupported+0xe8) [0x4769c8]
(EE) 5: /usr/libexec/Xorg (xf86SerialModemClearBits+0x2b2) [0x49fe62]
(EE) 6: /lib64/libc.so.6 (__restore_rt+0x0) [0x7f1f8df6fb1f]
(EE) 7: /lib64/libc.so.6 (ioctl+0x5) [0x7f1f8e033705]
(EE) 8: /lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f1f8f32f508]
(EE) 9: /lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f1f8f33208b]
(EE) 10: /lib64/libdrm_nouveau.so.2 (nouveau_bo_wait+0xbc) [0x7f1f88a2637c]
(EE) 11: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x75f9) [0x7f1f88c3ed19]
(EE) 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x801d) [0x7f1f88c400ed]
(EE) 13: /usr/libexec/Xorg (DRI2SwapBuffers+0x1c8) [0x569268]
(EE) 14: /usr/libexec/Xorg (DRI2GetParam+0xb7c) [0x56ae0c]
(EE) 15: /usr/libexec/Xorg (SendErrorToClient+0x2df) [0x4369bf]
(EE) 16: /usr/libexec/Xorg (remove_fs_handlers+0x453) [0x43a9e3]
(EE) 17: /lib64/libc.so.6 (__libc_start_main+0xf0) [0x7f1f8df5b580]
(EE) 18: /usr/libexec/Xorg (_start+0x29) [0x424ce9]
(EE) 19: ? (?+0x29) [0x29]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause. It is a victim.
(EE) [mi] EQ overflow continuing. 100 events have been dropped.
(EE)

What hardware? What kernel version? What mesa version?

Just forgot to mention, I am running xorg-x11-drv-nouveau-1.0.12-1.fc23.x86_64
 version (latest from fedora 23 repository).

Hardware:
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. EN210 SILENT
        Flags: bus master, fast devsel, latency 0, IRQ 45
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at fe000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nouveau
        Kernel modules: nouveau

Kernel:
Linux marek.grepo.lan 4.3.3-301.fc23.x86_64 #1 SMP Fri Jan 15 14:03:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Mesa:
mesa-dri-drivers-11.1.0-2.20151218.fc23.i686

It is the truth the problem arised after mesa or kernel update, not after Xorg update. I am not sure but I give higher probability to mesa. Kernel updated on Jan 12-th (to 4.3.3-300 which I was running when problem arised first time), mesa on Jan 16-th. I am sure I had not this problem before Jan 12-th. I am not sure whether I had it between Jan 12-th and Jan 16-th.

My machine locks up several times a week. From /var/log/messages I see:

Apr 25 16:48:36 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00040060
Apr 25 16:48:36 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00140010
Apr 25 16:48:36 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 2001a020
Apr 25 16:48:36 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data ff636880
Apr 25 16:48:36 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00000002
...
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: Xorg[1248]: failed to idle channel 2 [Xorg[1248]]
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00040060
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data beef0201
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00140010
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00000000
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 2001a020
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data ff636881
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00000002
Apr 25 16:48:51 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: CACHE_ERROR - ch 2 [Xorg[1248]] subc 0 mthd 0000 data 00000000
Apr 25 16:49:06 jschmitt-dt kernel: nouveau 0000:02:00.0: Xorg[1248]: failed to idle channel 2 [Xorg[1248]]

Sometimes I can still ssh to the machine, other times I cannot. When I can ssh to the machine, I can sometimes use this to bring back the machine to a working state:

$ systemctl isolate rescue.target

and then

$ systemctl isolate graphical.target

The only other option I have is to physically reset the machine.

$ cat /etc/redhat-release
Fedora release 22 (Twenty Two)
$ uname -a
Linux jschmitt-dt 4.4.6-201.fc22.x86_64 #1 SMP Wed Mar 30 18:30:16 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

xorg-x11-drv-nouveau-1.0.11-2.fc22.x86_64
xorg-x11-drv-nouveau-debuginfo-1.0.11-2.fc22.x86_64

Apr 26 16:54:45 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
Apr 26 16:54:45 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 08b8 data 00040314
Apr 26 16:54:57 jschmitt-dt audit[14074]: <audit-1326> auid=1620 uid=1620 gid=1620 ses=1 pid=14074 comm="chromium-browse" exe="/usr/lib64/chromium-browser/chromium-browser" sig=0 arch=c000003e syscall=273 compat=0 ip=0x7fac2cfcd4f4 code=0x50000
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[1253]] get 00200355b4 put 00200355e0 ib_get 00000021 ib_put 000002dc state 80004244 (err: INVALID_CMD) push 00406040
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: fifo: DMA_PUSHER - ch 2 [Xorg[1253]] get 002003566c put 0020035698 ib_get 00000023 ib_put 000002dc state 80004244 (err: INVALID_CMD) push 00406040
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 0234 data 0004488c
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 023c data 003048b0
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 0240 data 000002f8
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD]
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 0234 data 0004488c
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 023c data 003048b0
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE]
Apr 26 16:55:21 jschmitt-dt kernel: nouveau 0000:02:00.0: gr: 00100000 [] ch 2 [000fb21000 Xorg[1253]] subc 2 class 502d mthd 0240 data 000002f7
...

Apr 26 17:01:33 jschmitt-dt kernel: nouveau 0000:02:00.0: Xorg[1253]: failed to idle channel 2 [Xorg[1253]]
Apr 26 17:01:48 jschmitt-dt kernel: nouveau 0000:02:00.0: Xorg[1253]: failed to idle channel 2 [Xorg[1253]]

Created attachment 1153011
nouveau error messages

grep nouveau /var/log/messages

Download full text (15.1 KiB)

This is getting worse and worse.

systemd: kdm.service: State 'stop-sigterm' timed out. Killing.
systemd: kdm.service: Main process exited, code=killed, status=9/KILL
kernel: nouveau 0000:02:00.0: Xorg[13210]: failed to idle channel 3 [Xorg[13210]]
kernel: nouveau 0000:02:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/fifo/chang84.c:111/g84_fifo_chan_engine_fini()!
kernel: nouveau 0000:02:00.0: fifo: channel 3 [Xorg[13210]] unload timeout
kernel: nouveau 0000:02:00.0: Xorg[13210]: failed to idle channel 3 [Xorg[13210]]
kernel: nouveau 0000:02:00.0: Xorg[13210]: failed to idle channel 2 [Xorg[13210]]
kernel: nouveau 0000:02:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/fifo/chang84.c:111/g84_fifo_chan_engine_fini()!
kernel: nouveau 0000:02:00.0: fifo: channel 2 [Xorg[13210]] unload timeout
kernel: nouveau 0000:02:00.0: Xorg[13210]: failed to idle channel 2 [Xorg[13210]]
systemd: Stopped The KDE login manager.

kernel: nouveau 0000:02:00.0: NVIDIA G98 (298e80a2)
kernel: usb 2-5.3: new full-speed USB device number 4 using ehci-pci
kernel: nouveau 0000:02:00.0: bios: version 62.98.75.00.07
kernel: nouveau 0000:02:00.0: fb: 256 MiB GDDR3
kernel: usb 8-2: New USB device found, idVendor=045e, idProduct=0780
kernel: usb 8-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
kernel: usb 8-2: Product: Comfort Curve Keyboard 3000
kernel: usb 8-2: Manufacturer: Microsoft
kernel: input: Microsoft Comfort Curve Keyboard 3000 as /devices/pci0000:00/0000:00:1d.2/usb8/8-2/8-2:1.0/0003:045E:0780.0002/input/input6
kernel: [TTM] Zone kernel: Available graphics memory: 12342504 kiB
kernel: [TTM] Zone dma32: Available graphics memory: 2097152 kiB
kernel: [TTM] Initializing pool allocator
kernel: [TTM] Initializing DMA pool allocator
kernel: nouveau 0000:02:00.0: DRM: VRAM: 256 MiB
kernel: nouveau 0000:02:00.0: DRM: GART: 1048576 MiB
kernel: nouveau 0000:02:00.0: DRM: TMDS table version 2.0
kernel: nouveau 0000:02:00.0: DRM: DCB version 4.0
kernel: nouveau 0000:02:00.0: DRM: DCB outp 00: 02000386 0f200010
kernel: nouveau 0000:02:00.0: DRM: DCB outp 01: 02000332 00020010
kernel: nouveau 0000:02:00.0: DRM: DCB outp 02: 040113a6 0f200010
kernel: nouveau 0000:02:00.0: DRM: DCB outp 03: 04011342 00020010
kernel: nouveau 0000:02:00.0: DRM: DCB conn 00: 00005046
kernel: nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
kernel: usb 2-5.3: New USB device found, idVendor=0a12, idProduct=0001
kernel: usb 2-5.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
kernel: [drm] Driver supports precise vblank timestamp query.
kernel: hid-generic 0003:045E:0780.0002: input,hidraw1: USB HID v1.11 Keyboard [Microsoft Comfort Curve Keyboard 3000] on usb-0000:00:1d.2-2/input0
kernel: input: Microsoft Comfort Curve Keyboard 3000 as /devices/pci0000:00/0000:00:1d.2/usb8/8-2/8-2:1.1/0003:045E:0780.0003/input/input7
kernel: nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
kernel: tsc: Refined TSC clocksource calibration: 2800.095 MHz
kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x285c9a5e278, max_idle_ns: 440795291489 ns
kernel: hid-generic 0003:045E:0780.0003: inpu...

Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

I think I faced this exactly same crash today on Slackware 14.2 x86 system.

Chromium was loading some page while whole X freezes. i was able to SSH in to the system but only restart really fixed the system.

OS info :
Slackware 14.2

kernel : 4.4.14 #2 SMP Fri Jun 24 13:38:27 CDT 2016 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux

GPU: 01:00.0 VGA compatible controller: NVIDIA Corporation G84GL [Quadro FX 370] (rev a1)

Packet / library info & versions
kernel-huge-4.4.14-x86_64-1
mesa-11.2.2-x86_64-1
xf86-video-nouveau-1.0.12-x86_64-1
xorg-server-1.18.3-x86_64-2

I attach some xorg log and kernel log

Created attachment 126000
xorg-log after crash

Created attachment 126001
Kernel log when crash

I got the same problem here with randomly occurring freezes (only mouse pointer can be moved but I can still ssh into it).

Environment:
- Debian Jessie
- newest backports kernel (which is 4.7.2 currently, it also happened with 4.6*)
- Dual monitor setup with 8800 GTS 320MB

Logs will be attached from dmesg and Xorg.0.log

Created attachment 126923
dmesg output for Debian Jessie + 4.7.2

Created attachment 126924
Xorg.0.log for Debian Jessie + 4.7.2

Hi,

I've got the same issue with 4.4 (kernel) not with 4.1.15 when screensaver fire up.
How can I help, with trace/logfile ?

01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)

Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01c8 data beef0201
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01c4 data beef0201
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01c0 data beef0201
Nov 5 08:27:01 wizz kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 11 [flux[21781]] subc 6 mthd 01b8 data beef0201

With newer kernel I've got new messages... If it can help.

ov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux[28527]] get 0000000000 put 0000000000 ib_get 00000000 ib_put 00000002 state c0000000 (err: MEM_FAULT) push 00400040
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010000 on channel 13 [3eebf000 flux[28527]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux[28527]] get 0000000000 put 0000000000 ib_get 00000002 ib_put 00000004 state c0000000 (err: MEM_FAULT) push 00400040
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010010 on channel 13 [3eebf000 flux[28527]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 13 [flux[28527]] get 0000000000 put 0000000000 ib_get 00000004 ib_put 00000007 state c0000000 (err: MEM_FAULT) push 00400040
Nov 6 16:26:49 wizz kernel: nouveau 0000:01:00.0: fb: trapped read at 0020010020 on channel 13 [3eebf000 flux[28527]] engine 05 [PFIFO] client 08 [PFIFO_READ] subclient 00 [PUSHBUF] reason 0000000f [DMAOBJ_LIMIT]

Created attachment 127928
Kernel trace

I attached a kernel trace which may be related. I got this when:

1. Upgraded Fedora 24 to Fedora 25.

2. Disabled wayland for gdm.

3. Created script with export QSG_RENDER_LOOP=basic in profile.d.

4. Logged in as a first user to the kde session.

5. Pressed Ctrl+Alt+F1 to get gdm login screen.

6. Logged in as a second user to the kde session.

7. Both kde sessions were stuck, but as opposed to previous behaviour I was able to Ctrl+Alt+Backspace both sessions (maybe because of previous steps I did not use before).

8. After turning down second session got previously attached kernel trace.

Created attachment 128304
Extraction of crash info when using modesetting driver

I tried to switch to modesetting driver, but my X sessions are crashing also. I attached crash info recently.

Also QSG_RENDER_LOOP=basic was applied.

Is there any workaround available to avoid crashes? I do not need any 3D or anything, just stable 2D desktop.

Hi Guys,

I've got the same kind of issue.

Hang when starting up some software.
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) (an prety old card).
nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 12 [gwenview[13360]] subc 3 mthd 01a8 data beef0201
...
(only X/keyboard hang... ssh still working).

Cannot go to console... console didn't work with nouveau without fb.
(I had no more issue after recompile xscreensaver/flux).

Dominique

I had to replace my graphic card now as the old one died now. So could these problems arise from memory corruption in a dying card? I have now a GTX 550ti which has not shown these error messages yet.

This workarounded this issue by using nvidia 340 drivers. But after upgrade to Fedora 25 these drivers were not available for rather long period. Since there was nothing done in this bug for almost a yeat I tried to replace hardware.

Firstly I tried GT730. I ended up with the same behaviour. Intermittent GUI lockups, only mouse cursor moved. Just logs were different. If I remember well they were similar to bug 93629. It is opened also for almost a year. I suspect these bugs have something common. Just logs are different on different hardware. So I returned the GT730, sold the GT210 and bought AMD Radeon 6450 and the desktop is rock solid now.

I am not closing the bug because of other guys having same problems, but after solving, please, do not wait for my confirmation. I no longer have the nvidia hardware to test.

Thanks for interest and the hard work without specs from vendor.

xorg-x11-drv-nouveau-1.0.14-2.fc25.x86_64
xorg-x11-server-Xorg-1.19.3-1.fc25.x86_64

May 02 14:08:52 pzhukov-workstation.usersys.redhat.com kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 5 [gltext[2269]] subc 7 mthd 1f08 data 80000000
May 02 14:08:52 pzhukov-workstation.usersys.redhat.com kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 5 [gltext[2269]] subc 7 mthd 1f0c data 00000000
May 02 14:08:52 pzhukov-workstation.usersys.redhat.com kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 5 [gltext[2269]] subc 7 mthd 1efc data 000001bd
May 02 14:08:52 pzhukov-workstation.usersys.redhat.com kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 5 [gltext[2269]] subc 7 mthd 1f00 data be5553af

Created attachment 1275841
journalctl -b -1 | grep nouveau

Download full text (7.0 KiB)

ago 22 15:59:43 p50.tole.es kernel: nouveau 0000:01:00.0: Xwayland[28959]: nv50cal_space: -16
ago 22 15:59:43 p50.tole.es kernel: nouveau 0000:01:00.0: Xwayland[28959]: nv50cal_space: -16

This kills the system and sometimes it returns,but most of the times I need to kill it over ssh as it stops me from working anymore

(on 4.12.5-300.fc26.x86_64 ) Fedora 26

Also shows info on gnome-shell:

ago 24 11:35:28 p50.tole.es kernel: nouveau 0000:01:00.0: Xwayland[2308]: nv50cal_space: -16
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: kernel rejected pushbuf: Device or resource busy
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: krec 0 pushes 1 bufs 5 relocs 0
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: buf 00000000 00000003 00000004 00000004 0000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: buf 00000001 00000006 00000004 00000000 0000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: buf 00000002 0000003b 00000002 00000000 0000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: buf 00000003 00000031 00000004 00000004 0000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: buf 00000004 00000032 00000004 00000004 0000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: ch17: psh 00000000 000007d544 000007d664
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x20056080
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x000000e6
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000040
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000001
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x20046086
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000782
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x0000043a
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x08880000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x2002608c
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x000000cf
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000001
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x20056091
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000680
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000194
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x0000004d
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x00000000
ago 24 11:35:28 p50.tole.es org.gnome.Shell.desktop[2300]: nouveau: 0x0c0a2000
ago 24 11:35:28 p50.tole.es org.gnome.Shell....

Read more...

BTW, I can connect via ssh from phone to laptop and 'reboot' it, so it's just the graphics layer the one frozen (it tends to happen once per day at least)

The nouveau driver hangs with mythfrontend. My system is a desktop intel i7 onboard graphics running kde plasma and a separate user (seat) running mythfrontend directly on X (not kde) using an nvidia GT240 The allows me to use my desktop system at the same time the TV is running mythtv in an adjacent room. When the nouveau driver hangs it has no affect on the desktop system, the last frame or menu remains static on the TV even after restarting the mythfrontend application.

I can reliably cause the hang by stopping the video player using the remote control then trying to stop it again before it completes the first stop. I see the last frame of video and the remote no longer works. Restarting the mythfrontend application, as root on the desktop, blanks the screen then the last frame of video is shown on the TV even though the application has not started a player. Restarting should display a menu, but once hung the picture does not change.

I suspect the some internal registers or kernel data structures are not finished being cleaned up and the second stop does not allow this to complete.

Probably duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=979659 but still present in Fedora

Metta Crawler (metta-crawler) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: New → Confirmed
description: updated
Changed in nouveau:
importance: Unknown → Critical
status: Unknown → Confirmed
description: updated
Metta Crawler (metta-crawler) wrote :

To see if this has been fixed in newer versions I am now running after having installed:

sudo apt-get install xserver-xorg-video-nouveau-hwe-16.04 xserver-xorg-core-hwe-16.04 xserver-xorg-hwe-16.04 linux-generic-hwe-16.04 xserver-xorg-input-libinput-hwe-16.04

Try this only at your own risk.

Next I will test especially with google-earth and LibreOffice as they are involved in a lot of lock-ups.

Metta Crawler (metta-crawler) wrote :

Might not have changed a thing

Pre-hwe debs:

Oct 17 08:45:28 lakshmi kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0

With hwe debs:

Oct 17 09:47:37 lakshmi kernel: [ 5.538623] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0

Metta Crawler (metta-crawler) wrote :

According to the changelog for the hwe deb

apt changelog xserver-xorg-video-nouveau-hwe-16.04
New upstream release
Thu, 09 Mar 2017 11:46:23 +0200

The changelog for the default deb that comes with 16.04
apt changelog xserver-xorg-video-nouveau
... last update on ...
Thu, 03 Mar 2016 15:44:44 +0200

Plus Fedora 26 has the same "nouveau 1.3.1 20120801" version in the dmesg output. Probably someone did not increment that version message as part of the 1:1.0.14-0ubuntu1 release.

Metta Crawler (metta-crawler) wrote :

No lock-ups after 6 days of using HWE kernel and Xorg.

This includes the weekend when the activity level is high enough to usually cause one or more lockups in the same day.

Changed in fedora:
importance: Unknown → Critical
status: Unknown → Confirmed
Metta Crawler (metta-crawler) wrote :

I'm almost on day 12 of using HWE kernel and Xorg now. No lock-ups.

$ lsb_release -d
Description: Ubuntu 16.04.3 LTS

$ uname -a
Linux lakshmi 4.10.0-37-generic #41~16.04.1-Ubuntu SMP Fri Oct 6 22:42:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ uptime
 07:16:04 up 11 days, 21:29, 10 users, load average: 1.04, 0.86, 0.84

Metta Crawler (metta-crawler) wrote :
Download full text (7.4 KiB)

Xorg crashed after 13 days 20 hours and 28 minutes of running. The root cause is unknown to me; I can only trace it back to gnome-session-binary running out of file descriptors and crashing on a SIGSEGV which cascades until a nouveau driver message about receiving a signal occurs.

I don't know if it's relevant or not that I was running for about a day with "gconf2:amd64 3.2.6-3ubuntu6" installed and it required a reboot which I had not yet done.

Start of crash:

Oct 30 22:15:33 lakshmi gnome-session[6021]: gnome-session-binary[6021]: WARNING: Failed to receive system inhibitor fd: dup: Too many open files
Oct 30 22:15:33 lakshmi gnome-session-binary[6021]: WARNING: Failed to receive system inhibitor fd: dup: Too many open files

Repeats over 6004 times (systemd-journald suppressed excess messages):

$ fgrep -c 'lakshmi gnome-session-binary[6021]: WARNING: Failed to receive system inhibitor fd: dup: Too many open files' gui-out-of-filedesc-crash
6004

After that:

Oct 30 22:17:20 lakshmi gnome-session[6021]: gnome-session-binary[6021]: WARNING: Failed to receive system inhibitor fd: dup: Too many open files
Oct 30 22:17:20 lakshmi gnome-session-binary[6021]: WARNING: Failed to receive system inhibitor fd: dup: Too many open files
Oct 30 22:17:30 lakshmi gnome-session-binary[6021]: GLib-GIO-CRITICAL: g_unix_fd_list_get: assertion 'G_IS_UNIX_FD_LIST (list)' failed
Oct 30 22:17:30 lakshmi kernel: gnome-session-b[6021]: segfault at 8 ip 000000000041e931 sp 00007ffc731d92e0 error 4 in gnome-session-binary[400
000+43000]

Oct 30 22:18:02 lakshmi kernel: gdbus[6030]: segfault at 8 ip 00007f3e970ecc9d sp 00007f3e87ffe900 error 6 in libglib-2.0.so.0.4800.2[7f3e970860
00+10f000]
Oct 30 22:18:03 lakshmi gnome-session[6021]: ICE default IO error handler doing an exit(), pid = 6312, errno = 11

... skip ahead ... appears the driver received a signal, the number of the signal is not clear to me ... by then lightdm had logged me out

Oct 30 22:18:15 lakshmi lightdm[5646]: pam_unix(lightdm:session): session closed for user mc
Oct 30 22:18:15 lakshmi lightdm[5646]: pam_kwallet(lightdm:session): pam_kwallet: pam_sm_close_session
Oct 30 22:18:15 lakshmi lightdm[5646]: pam_kwallet5(lightdm:session): pam_kwallet5: pam_sm_close_session
Oct 30 22:18:15 lakshmi lightdm[5646]: pam_kwallet(lightdm:setcred): pam_kwallet: pam_sm_setcred
Oct 30 22:18:15 lakshmi lightdm[5646]: pam_kwallet5(lightdm:setcred): pam_kwallet5: pam_sm_setcred
Oct 30 22:18:15 lakshmi pulseaudio[18405]: [pulseaudio] server-lookup.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-2MCP3ISjlb: Connection refused
Oct 30 22:18:15 lakshmi pulseaudio[18405]: [pulseaudio] main.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-2MCP3ISjlb: Connection refused
Oct 30 22:18:16 lakshmi pulseaudio[18405]: [pulseaudio] backend-ofono.c: Failed to register as a handsfree audio agent with ofono: org.freedesktop.DBus.Error.ServiceUnknown: The name org.ofono was not provided by any .service files
Oct 30 22:18:17 lakshmi kernel: ------------[ cut here ]------------
Oct 30 22:18:17 lakshmi kernel: WARNING: CPU: ...

Read more...

Metta Crawler (metta-crawler) wrote :

I haven't seen this issue in a long time.

I'm still using the same graphics card but in another motherboard.

Gigabyte AX370-Gaming 5 BIOS F22, AMD Ryzen 5 2400G, Linux 4.15.0-13-generic, Ubuntu 18.04

gianfilippo (gianfi) wrote :

Hello, this happens also in ubuntu 18.04.

May 26 09:59:20 shadow kernel: [ 60.880888] nouveau 0000:01:00.0: timeout
May 26 09:59:20 shadow kernel: [ 60.880933] WARNING: CPU: 7 PID: 81 at /build/linux-hwe-B83fOS/linux-hwe-4.18.0/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:207 gf100_vmm_flush_+0x15c/0x1a0 [nouveau]
May 26 09:59:20 shadow kernel: [ 60.880934] Modules linked in: rfcomm ccm cmac bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core arc4 snd_compress ac97_bus snd_pcm_dmaengine intel_rapl x86_pkg_temp_thermal snd_hda_intel intel_powerclamp coretemp snd_hda_codec snd_hda_core snd_hwdep hid_multitouch kvm_intel 8250_dw spi_pxa2xx_platform uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 nouveau snd_pcm kvm irqbypass videobuf2_common snd_seq_midi crct10dif_pclmul snd_seq_midi_event asus_nb_wmi btusb videodev crc32_pclmul btrtl asus_wmi wmi_bmof btbcm sparse_keymap media snd_rawmidi btintel i915 mxm_wmi ghash_clmulni_intel snd_seq pcbc bluetooth iwlmvm ttm ecdh_generic snd_seq_device
May 26 09:59:20 shadow kernel: [ 60.880963] aesni_intel drm_kms_helper mac80211 aes_x86_64 crypto_simd snd_timer cryptd glue_helper intel_cstate drm intel_rapl_perf iwlwifi rtsx_pci_ms snd i2c_algo_bit mei_me fb_sys_fops syscopyarea idma64 sysfillrect input_leds cfg80211 memstick serio_raw soundcore sysimgblt virt_dma mei processor_thermal_device intel_lpss_pci intel_soc_dts_iosf intel_lpss intel_pch_thermal int3403_thermal int3400_thermal wmi int340x_thermal_zone acpi_thermal_rel asus_wireless video mac_hid acpi_pad sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid rtsx_pci_sdmmc ahci rtsx_pci i2c_hid libahci hid pinctrl_sunrisepoint pinctrl_intel
May 26 09:59:20 shadow kernel: [ 60.880991] CPU: 7 PID: 81 Comm: kworker/7:1 Tainted: G W 4.18.0-20-generic #21~18.04.1-Ubuntu
May 26 09:59:20 shadow kernel: [ 60.880992] Hardware name: ASUSTeK COMPUTER INC. UX331UN/UX331UN, BIOS UX331UN.303 01/22/2018
May 26 09:59:20 shadow kernel: [ 60.880995] Workqueue: pm pm_runtime_work
May 26 09:59:20 shadow kernel: [ 60.881018] RIP: 0010:gf100_vmm_flush_+0x15c/0x1a0 [nouveau]
May 26 09:59:20 shadow kernel: [ 60.881019] Code: 5e 41 5f 5d c3 49 8b 7c 24 10 48 8b 5f 50 48 85 db 74 47 e8 a6 84 fa c9 48 89 da 48 89 c6 48 c7 c7 82 b8 fe c0 e8 74 28 9b c9 <0f> 0b eb bf 49 8b 7c 24 10 48 8b 5f 50 48 85 db 74 24 e8 7d 84 fa

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/251.

Changed in nouveau:
status: Confirmed → Unknown

I have had bug with locks up too.

июл 10 12:20:41 Home-WS kernel: nouveau 0000:01:00.0: gr: DATA_ERROR 00000012 [RT_LINEAR_WITH_ZETA]
июл 10 12:20:41 Home-WS kernel: nouveau 0000:01:00.0: gr: 00100000 [] ch 3 [001fa14000 systemd-logind[858]] subc 3 class 8597 mthd 0d78 data 00000004

I don't know it's new bag or same.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.