10de:0429 [Lenovo ThinkPad T61] frequent Xorg freezes with nouveau
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Nouveau Xorg driver |
Fix Released
|
High
|
|||
xserver-xorg-video-nouveau (Ubuntu) |
Fix Released
|
Low
|
Unassigned |
Bug Description
Binary package hint: xorg
Since switching to nouveau from -nvidia, I get frequent lockups. Mouse still moves, but screen doesn't update anymore.
WORKAROUND: nouveau.noaccel=1
ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: xorg 1:7.6~3ubuntu1
ProcVersionSign
Uname: Linux 2.6.38-1-generic x86_64
Architecture: amd64
DRM.card0.DVI.D.1:
status: disconnected
enabled: disabled
dpms: Off
modes:
edid-base64:
DRM.card0.LVDS.1:
status: connected
enabled: enabled
dpms: On
modes: 1680x1050 1680x1050 1680x1050 1400x1050 1280x1024 1280x960 1152x864 1024x768 800x600 640x480 720x400 640x400 640x350
edid-base64: AP/////
DRM.card0.VGA.1:
status: disconnected
enabled: disabled
dpms: Off
modes:
edid-base64:
Date: Tue Feb 1 14:18:29 2011
DistUpgraded: Yes, recently upgraded Log time: 2010-12-12 14:49:15.473692
DistroCodename: natty
DistroVariant: ubuntu
DkmsStatus:
EcryptfsInUse: Yes
GraphicsCard: Subsystem: Lenovo ThinkPad T61 [17aa:20d8]
MachineType: LENOVO 6459CTO
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcEnviron:
LANGUAGE=en_CA:en
PATH=(custom, user)
LANG=en_CA.UTF-8
LC_MESSAGES=
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
dmi.bios.date: 11/14/2008
dmi.bios.vendor: LENOVO
dmi.bios.version: 7LETC5WW (2.25 )
dmi.board.name: 6459CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.
dmi.modalias: dmi:bvnLENOVO:
dmi.product.name: 6459CTO
dmi.product.
dmi.sys.vendor: LENOVO
version.libdrm2: libdrm2 2.4.23-1ubuntu3
version.
version.
version.
version.
version.
In freedesktop.org Bugzilla #26980, Vvv-oktetlabs (vvv-oktetlabs) wrote : | #16 |
In freedesktop.org Bugzilla #26980, Vvv-oktetlabs (vvv-oktetlabs) wrote : | #17 |
Created attachment 33898
dmesg output
In freedesktop.org Bugzilla #26980, Vvv-oktetlabs (vvv-oktetlabs) wrote : | #18 |
Created attachment 33899
lspci output
In freedesktop.org Bugzilla #26980, Xavier (chantry-xavier) wrote : | #19 |
This seems related to two existing fedora bugs :
https:/
https:/
When using fedora nouveau code, it's usually advised to use fedora bug report.
People following this bug tracker might expect you to run latest git of all nouveau components :
http://
(two main reasons for that : 1) git is usually better and has more fixes 2) we do not know what code exactly nouveau is shipping)
As said in https:/
However, according to mwk, the data error is unlikely to be related to the hangs. Also he believes he might have the same problem using recent nouveau code. To confirm that, he would like to know the value of the 400700 register next time the machine hangs :
$ wget http://
$ gcc peek.c libio.c -lpciaccess -o peek
# ./peek 0x400700
In freedesktop.org Bugzilla #26980, Vvv-oktetlabs (vvv-oktetlabs) wrote : | #20 |
Sorry for misplaced report.
It happens again - ./peek 0x400700 output is
00400700: 00100001
So, is there chance this problem resolved in git driver?
In freedesktop.org Bugzilla #26980, Xavier (chantry-xavier) wrote : | #21 |
(In reply to comment #4)
> Sorry for misplaced report.
>
> It happens again - ./peek 0x400700 output is
> 00400700: 00100001
>
> So, is there chance this problem resolved in git driver?
>
17:53 < mwk> shining^: heh, ok, so the bug report looks like it could be related...
17:53 < mwk> his failing status is 00100001, mine is 01b00001....
17:54 < shining^> mwk: well thats different :)
17:54 < mwk> not so different
17:54 < mwk> mine includes all of his bits
17:54 < mwk> too bad I don't know what most of these bits mean...
17:55 < mwk> 00800000 is CUDA MP execution, 00000001 is the whole PGRAPH... and that's about it
17:57 < shining^> mwk: ok. and it also hangs regularly for you ?
17:57 < shining^> "usually once per 1-2 days"
17:57 < mwk> shining^: much more often when I'm playing some 3d on it
17:58 < shining^> ok
17:58 < mwk> possibly I could get more lockups out of it, and of the 00100001 kind, if I left X running for long
17:58 < shining^> I suppose the answer to that question is no then : "So, is there chance this problem resolved in git driver?"
17:59 < mwk> with current git? /me doubts that.
In freedesktop.org Bugzilla #26980, Vvv-oktetlabs (vvv-oktetlabs) wrote : | #22 |
Thanks. I called ./peek 0x400700 when X hangs another time, its output was:
00400700: 001e0001
Just in case if it helps somehow:
- when X hangs, mouse cursor still running
- since update to 2.6.32.9, it appears X hangs more often than before, and usually system can not be restored by killing X (X restarts, but screen remains the same as in moment when hang occurs)
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #23 |
This bug is a huge mystery and I'm starting to think it's a hw bug. One possibility is that it's triggered by a particular insn sequence that DDX uses. Could you please try http://
If anyone hits this bug, with the above patch or without, please compile pgtest from http://
In freedesktop.org Bugzilla #26980, Axel Beckert (xtaran) wrote : | #24 |
I have observed this bug also on Ubuntu 10.04 (Details below):
[…]
(II) NOUVEAU(0): Modeline "1152x864"x0.0 108.00 1152 1216 1344 1600 864 865 868 900 +hsync +vsync (67.5 kHz)
(II) NOUVEAU(0): Modeline "1400x1050"x0.0 156.00 1400 1504 1648 1896 1050 1053 1057 1099 -hsync +vsync (82.3 kHz)
(II) NOUVEAU(0): Modeline "1440x900"x0.0 106.50 1440 1520 1672 1904 900 903 909 934 -hsync +vsync (55.9 kHz)
(II) NOUVEAU(0): Modeline "1440x900"x0.0 136.75 1440 1536 1688 1936 900 903 909 942 -hsync +vsync (70.6 kHz)
(II) NOUVEAU(0): Modeline "1680x1050"x0.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync (65.3 kHz)
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
Backtrace:
0: /usr/bin/X (xorg_backtrace
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a2ac4]
2: /usr/bin/X (xf86PostMotion
3: /usr/lib/
4: /usr/bin/X (0x400000+0x6fca7) [0x46fca7]
5: /usr/bin/X (0x400000+0x11d1f3) [0x51d1f3]
6: /lib/libpthread
7: /lib/libc.so.6 (ioctl+0x7) [0x7f5186ff6157]
8: /lib/libdrm.so.2 (drmIoctl+0x28) [0x7f51855a75b8]
9: /lib/libdrm.so.2 (drmCommandWrit
10: /lib/libdrm_
11: /lib/libdrm_
12: /lib/libdrm_
13: /lib/libdrm_
14: /usr/lib/
15: /usr/lib/
16: /usr/bin/X (0x400000+0xd8a7b) [0x4d8a7b]
17: /usr/bin/X (0x400000+0x2ebb4) [0x42ebb4]
18: /usr/bin/X (0x400000+0x30c3c) [0x430c3c]
19: /usr/bin/X (0x400000+0x261aa) [0x4261aa]
20: /lib/libc.so.6 (__libc_
21: /usr/bin/X (0x400000+0x25d59) [0x425d59]
strace shows tons of this:
[…]
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
[…]
neper:~# uname -a
Linux neper 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 x86_64 GNU/Linux
neper:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 10.04 LTS
Release: 10.04
Codename: lucid
neper:~# dpkg -l | fgrep nouveau
ii libdrm-nouveau1 2.4.18-1ubuntu3 […]
ii xserver-
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #25 |
*** Bug 28320 has been marked as a duplicate of this bug. ***
In freedesktop.org Bugzilla #26980, Mark-hagger (mark-hagger) wrote : | #26 |
(In reply to comment #7)
> If anyone hits this bug, with the above patch or without, please compile pgtest
> from http://
> 0x400000 0x10000 after the lockup.
I can get this to hang almost at will, on a fresh Fedora 13 install, I've attached output from the peek run, hope it helps.
In freedesktop.org Bugzilla #26980, Mark-hagger (mark-hagger) wrote : | #27 |
Created attachment 36471
Xorg and peek output after hang
In freedesktop.org Bugzilla #26980, Greg Wilkins (gregw-wiltel) wrote : | #28 |
(In reply to comment #7)
> This bug is a huge mystery and I'm starting to think it's a hw bug.
I'm getting the same symptoms on a fresh install of ubuntu 10.04 on a brand new lenovo w510. X locks up about 2 or 3 times per day - screen is frozen, even if X is restarted.
2.6.32-
next time it happens, I'll ssh into the machine and gather more information.
In freedesktop.org Bugzilla #26980, Greg Wilkins (gregw-wiltel) wrote : | #29 |
Created attachment 36485
dmesg taken during X lockup
It happened again....
here is the dmesg output that I captured by ssh-ing in while X was in 100% CPU state.
I also thought I had captured and strace... but I made a mistake. I'll capture that next time. Is there anything else you need, this is happening every few hours for me?
In freedesktop.org Bugzilla #26980, Mark-hagger (mark-hagger) wrote : | #30 |
(In reply to comment #13)
> that next time. Is there anything else you need, this is happening every few
> hours for me?
See comment#7, Marcin says he wants output from "peek" after a hang has occurred.
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #31 |
Status update.
First, we don't need any additional dumps - Ben found a reliable way to reproduce this bug semi-instantly, so we can make any dumps we need.
But, we still have no idea what's causing this bug. And right now all developers hunting for this bug are either too busy with other things, or on vacation.
So, please no more dumps. You have four options:
- use other driver
- disable acceleration [nouveau.noaccel=1 on kernel command line]
- decide that rebooting every few hours isn't that bad after all
- debug and fix that bug on your own
Yes, we know the situation sucks. Sorry for that.
In freedesktop.org Bugzilla #26980, bedahr (grasch-simon-listens) wrote : | #32 |
Would you mind posting the method to reproduce the bug?
Thanks!
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #33 |
06:16:48 <darktama> I can reproduce the issue *very* quickly with http://
06:16:57 <darktama> the first post has an image, and a smaller thumbnail image
06:17:42 <darktama> if I scroll up and down for a bit (<1min) with the the thumbnail going off/on the screen, it'll hang
In freedesktop.org Bugzilla #26980, bedahr (grasch-simon-listens) wrote : | #34 |
Hm I don't have any problems with this site (have been scrolling for almost 2 minutes now without any issues).
I am using a 8600M GT, tough but I'd thought I'd test it because my X has hung twice in the last three days with the same symptoms as well...
In freedesktop.org Bugzilla #26980, Picogeyer (picogeyer) wrote : | #35 |
(In reply to comment #17)
> 06:16:48 <darktama> I can reproduce the issue *very* quickly with
> http://
> 06:16:57 <darktama> the first post has an image, and a smaller thumbnail image
> 06:17:42 <darktama> if I scroll up and down for a bit (<1min) with the the
> thumbnail going off/on the screen, it'll hang
I can't reproduce the bug with that site either using a GeForce 210, though my X hangs about twice a day.
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #36 |
Yeah, that's a funny thing about this testcase. Apparently it works for NVA3 and NVA5 chipsets, but not NVA8 which gt210 is based on. Maybe it's related to different TP configuration or something...
As for 8600M problem - this is almost certainly some other bug. Not all random hangs are instance of the particular bug. Does that hang give something suspicious in dmesg?
In freedesktop.org Bugzilla #26980, bedahr (grasch-simon-listens) wrote : | #37 |
I'll try to debug it the next time it happens...
In freedesktop.org Bugzilla #26980, Greg Wilkins (gregw-wiltel) wrote : | #38 |
Created attachment 36508
strace taken during 100% X lockup
here is an strace of the X process during the 100% cpu lockup. Looks like a loop to me.
I'm going to switch to the proprietary driver for a while and see if I get lockups with that. But I really want to run the open driver (it's so much more flexible), so I'm happy to switch back if you want me to try anything.
In freedesktop.org Bugzilla #26980, Greg Wilkins (gregw-wiltel) wrote : | #39 |
As per comment #15, I can confirm that setting [nouveau.noaccel=1 on kernel command line] does prevent lockups (at least for 3 days). It does result in a occasional strange display artefacts (eg black boxes left after tool tips close), but is entirely usable and causes less issues than using the proprietary driver.
In freedesktop.org Bugzilla #26980, Greg Wilkins (gregw-wiltel) wrote : | #40 |
squeak?
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #41 |
Well... the status is still the same. We [me and darktama / Ben Skeggs] both have no fucking idea what's
causing this problem.
It turns out, stuff is complicated. NVA3+ card introduced some sort of
microcontroller on-board that runs all the time and controls stuff like power
management. In all NVA3+ traces, we see a lot of places where blob is talking
to that controller and does stuff to PGRAPH. Atm we can only assume that
lockups are caused by us not doing this.
We have no idea what exactly that microcontroller does and how to talk to it.
Worse, this microcontroller needs microcode that is uploaded by the driver.
And judging by what I've seen already, ctxprogs were a walk in the park
compared to this new uc. The progs are huge, and there are shitloads of
opcodes to RE.
So - we're blocked again on REing some magical code. And we're talking real
code this time, as opposed to ctxprogs which were mostly just a list of regs
to copy...
Until we understand what's going on, status for NVA3+ cards stays the same -
we have absoluely no idea how to fix that bug and ETA is on the order of at
least months.
And for the record: I'm going to kill anyone who calls this new progs
"voodoo", "magic", or anything like that. We already decided on the name of
"fuc progs". And the PM microkontroller is not the only place where there are
used. Right now we know of the following:
- NV98+ cryptographic engine setup
- NV98+ cryptographic engine ctxprogs
- NV98+ video decoding ctxprogs
- NVA3+ unknown engine 104xxx [DMA copier?] ctxprogs
- NVA3+ PM microcontroller
- NVC0 PGRAPH ctxprogs
With such a huge list of users, REing fuc progs is a high priority for me.
Keep your fingers crossed. And find lots of RE-capable people to help, this is
going to be a long ride...
PS. Note that chipset ordering is a bit funny and it does NOT follow NVxx
numerical values. The saner ordering I've come up with is at
http://
matches order of adding new functionality. Also, for video decoding units,
NVA0 and NV98 should be swapped, ie. NV98+ video decoding does not include
NVA0. So NVA3+ cards are NVA3, NVA5, NVA8, NVAF, NVC0. But not NVAA/NVAC.
PPS. In other words, if you want your card to work with nouveau, don't buy NVA3+ cards yet. Or, following nvidia codenames, don't buy stuff with GT21x chipset in it. Nor MCP89, but that one is too damn rare anyway.
PPPS. If you think you're able to RE microcodes, contact me. Srsly. We need more people. Badly.
In freedesktop.org Bugzilla #26980, Greg Wilkins (gregw-wiltel) wrote : | #42 |
Marcin,
thanks for the update.
Sorry I can't help with the RE side of things (got my own open source projects keeping me 200% busy). But I am happy to help debug/test on my hardware.
Note that I still find using the nouveau driver without acceleration is much better than using the nvidea driver with acceleration.
In freedesktop.org Bugzilla #26980, Dmytro-poplavskiy (dmytro-poplavskiy) wrote : | #43 |
I also have the similar problem on GT240 on OpenSuse 11.3,
I hope the backtrace may help.
#0 0x00007fddfac33e87 in ioctl () from /lib64/libc.so.6
#1 0x00007fddf93ebc38 in drmIoctl (fd=10, request=1074291842, arg=0x7fffca3b0330) at xf86drm.c:184
#2 0x00007fddf93edf3b in drmCommandWrite (fd=<value optimized out>, drmCommandIndex
at xf86drm.c:2398
#3 0x00007fddf8dad07d in nouveau_bo_wait (bo=0x829b00, cpu_write=<value optimized out>, no_wait=<value optimized out>, no_block=<value optimized out>)
at nouveau_bo.c:385
#4 0x00007fddf8dad68e in nouveau_
#5 0x00007fddf8dac21a in nouveau_
#6 0x00007fddf8dac760 in nouveau_
#7 0x00007fddf7f0e7a5 in ?? () from /usr/lib64/
#8 0x00007fddf7f10272 in ?? () from /usr/lib64/
In freedesktop.org Bugzilla #26980, Mark Carey (careym) wrote : | #44 |
Adding a me too.
NVA8, GT218
[drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2
Fedora 13. Seems to happen reproducably after playing freeciv for 45 - 60 minutes.
[ 3407.261] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 3407.262]
Backtrace:
[ 3407.272] 0: /usr/bin/Xorg (xorg_backtrace
[ 3407.272] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x809b8f7]
[ 3407.272] 2: /usr/bin/Xorg (xf86PostMotion
[ 3407.272] 3: /usr/lib/
[ 3407.272] 4: /usr/lib/
[ 3407.272] 5: /usr/bin/Xorg (0x8047000+0x76c30) [0x80bdc30]
[ 3407.272] 6: /usr/bin/Xorg (0x8047000+
[ 3407.272] 7: (vdso) (__kernel_
[ 3407.272] 8: (vdso) (__kernel_
[ 3407.272] 9: /lib/libc.so.6 (ioctl+0x19) [0x74c0b9]
[ 3407.272] 10: /usr/lib/
[ 3407.272] 11: /usr/lib/
[ 3407.273] 12: /usr/lib/
[ 3407.273] 13: /usr/lib/
[ 3407.273] 14: /usr/lib/
[ 3407.273] 15: /usr/lib/
[ 3407.273] 16: /usr/lib/
[ 3407.273] 17: /usr/lib/
[ 3407.273] 18: /usr/lib/
[ 3407.273] 19: /usr/lib/
[ 3407.273] 20: /usr/bin/Xorg (0x8047000+0xe6f26) [0x812df26]
[ 3407.273] 21: /usr/bin/Xorg (0x8047000+
[ 3407.273] 22: /usr/bin/Xorg (miCompositeRec
[ 3407.273] 23: /usr/bin/Xorg (CompositeRects
[ 3407.273] 24: /usr/bin/Xorg (0x8047000+0xe036d) [0x812736d]
[ 3407.273] 25: /usr/bin/Xorg (0x8047000+0xdc494) [0x8123494]
[ 3407.273] 26: /usr/bin/Xorg (0x8047000+0x50a37) [0x8097a37]
[ 3407.273] 27: /usr/bin/Xorg (0x8047000+0x1b595) [0x8062595]
[ 3407.273] 28: /lib/libc.so.6 (__libc_
[ 3407.273] 29: /usr/bin/Xorg (0x8047000+0x1b181) [0x8062181]
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #45 |
Can we please stop with the "me too"s and useless backtraces already? We already know the bug affects anyone with NVA3+ cards and the backtraces only show the fallout of the fallout of the fallout of the original problem...
Also: if you get a DMA_PUSHER, that's another bug. This bug happens with total silence from the kernel - the card just hangs without telling us why.
Now, a status update. So far, I REd the instruction set and disaassembled a good chunk of that microcode. Sure enough, it pokes PGRAPH and watches for stuff on it. But the code doesn't yet fully make sense to me, and I'm temporarily away from my NVA5 card, so... could someone dump some registers for me?
I need a dump of the registers both before and after the hang. The interesting stuff is in 10axxx range, and ideally I'd want output of "./peek 10a000 1000". However, if the card doesn't like it and hangs or something upon that command, dump registers individually instead by doing "./peek X" for X being 10a008, 10a6fc, 10a4fc, 10a4f4, 10a714, 10a700, 10a704, 10a4dc, 10a690, 10a688.
Repeating that 2 or 3 times may be useful, esp. across different chipsets. But not TOO much, please.
In freedesktop.org Bugzilla #26980, Mark Carey (careym) wrote : | #46 |
Created attachment 37625
peek results from before lockup
In freedesktop.org Bugzilla #26980, Mark Carey (careym) wrote : | #47 |
Created attachment 37626
peek results after lockup
In freedesktop.org Bugzilla #26980, Mark Carey (careym) wrote : | #48 |
Created attachment 37627
Xorg.0.log from lockup
In freedesktop.org Bugzilla #26980, Richard-coe (richard-coe) wrote : | #49 |
I found this bug while researching my X locks up issue.
This solution does not address the hardware issue documented here, but
I want help anyone who is in a similar situation that does *NOT* have
the hardware lockup and that the noaccel=1 does not work.
In my case all you have to do is move a window and then immediately
click in another window. There may be other ways to reproduce this.
The only workaround is to restart the window manager or Xorg.
Investigating the issue I found this patch:
http://
Subject: Revert "dix: use the event mask of the grab for TryClientEvents."
I am using xorg-server-1.8.0 and found the issue was introduced in
xorg-server-1.6.3. The patch fix appears in xorg-server-1.8.2 and later
In freedesktop.org Bugzilla #26980, Bugs-sthias (bugs-sthias) wrote : | #50 |
./peek'ing my QUADRO NVS 140M results in all zeroes ("...") before and after the lockup, regardless of whether I peek the range "10a000 1000" or the single addresses. Am I doing anything wrong?
Things that cause lockups with nouveau appear to increasingly slow down the blob-driver, which because of this is also no option for me to use. After some time using KDE4/Skype/
If I can be of any help with testing, please let me know.
In freedesktop.org Bugzilla #26980, Marcin Kościelnicki (koriakin) wrote : | #51 |
(In reply to comment #34)
> ./peek'ing my QUADRO NVS 140M results in all zeroes ("...") before and after
> the lockup, regardless of whether I peek the range "10a000 1000" or the single
> addresses. Am I doing anything wrong?
Yes. you didn't listen when I said this bug is NVA3+ only. Your card is NV86.
Do you get any interesting messages in kernel around the lockup? This is almost certainly some other bug.
In freedesktop.org Bugzilla #26980, Bugs-sthias (bugs-sthias) wrote : | #52 |
(In reply to comment #35)
> Yes. you didn't listen when I said this bug is NVA3+ only. Your card is NV86.
>
> Do you get any interesting messages in kernel around the lockup? This is almost
> certainly some other bug.
OK, sorry for that. Yes, my kernel gives me
[29470.176297] [drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2
[29470.176975] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/5 Class 0x8297 Mthd 0x1560 Data 0x00000000:
[29470.176979] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_VALUE
[29470.176990] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/5 Class 0x8297 Mthd 0x1564 Data 0x00000000:
[29470.176993] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_BITFIELD
So, I think I'll try my luck with bug 26733 or open a new one, if they don't like my problem there either :)
In freedesktop.org Bugzilla #26980, Kavol (kavol) wrote : | #53 |
Created attachment 39004
./peek 10a000 1000 after and before the crash
07:00.0 VGA compatible controller: nVidia Corporation Device 0a65 (rev a2)
here are my dumps
note that "before" is actually after reboot - don't know how to capture right before the freeze
In freedesktop.org Bugzilla #26980, Marcin Slusarz (marcin-slusarz) wrote : | #54 |
*** Bug 30817 has been marked as a duplicate of this bug. ***
In freedesktop.org Bugzilla #26980, Jeffm-9 (jeffm-9) wrote : | #55 |
Created attachment 39589
Before and after peek 10a000 on ION2
Here are 3 more peeks, before and after, using a different chipset as you said that might be interesting to you.
01:00.0 VGA compatible controller: nVidia Corporation GT218 [ION] (rev a2) (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at ce000000 (64-bit, prefetchable) [size=32M]
I/O ports at dc00 [size=128]
Expansion ROM at fe980000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel driver in use: nouveau
I do see messages in the kernel, but as you mentioned it's a separate bug. They appear inconsistently during this failure.
[drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2
and sometimes also:
[drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/2 Class 0x502d Mthd 0x0240 Data 0x00000000:
[drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_VALUE
This is on openSUSE Factory, kernel 2.6.36-rc4
xorg-x11-
xorg-x11-
libdrm-
In freedesktop.org Bugzilla #26980, Johannes Obermayr (jobermayr) wrote : | #56 |
Jeff, please try whether it is valid with packages from home:jobermayr.
In freedesktop.org Bugzilla #26980, Jeffm-9 (jeffm-9) wrote : | #57 |
(In reply to comment #40)
> Jeff, please try whether it is valid with packages from home:jobermayr.
I'm afraid so. Same symptoms. Is another dump needed or are the ones I've provided already sufficient?
In freedesktop.org Bugzilla #26980, Alex Mayorga (alex-mayorga) wrote : | #58 |
(In reply to comment #3)
> However, according to mwk, the data error is unlikely to be related to the
> hangs. Also he believes he might have the same problem using recent nouveau
> code. To confirm that, he would like to know the value of the 400700 register
> next time the machine hangs :
> $ wget http://
> $ gcc peek.c libio.c -lpciaccess -o peek
> # ./peek 0x400700
Landed here from bug 33357 that might be a duplicate of this one.
For n00bs like me the commands have changed a bit, I managed to figure out the first one to be:
$ wget http://
The second one gives me the following:
$ gcc peek.c libio.c -lpciaccess -o peek
peek.c:1:1: error: expected identifier or ‘(’ before ‘<’ token
peek.c:3:13: warning: character constant too long for its type
peek.c:3:53: warning: multi-character character constant
peek.c:3:63: warning: multi-character character constant
peek.c:6:12: warning: character constant too long for its type
peek.c:6:32: warning: character constant too long for its type
peek.c:7:12: warning: character constant too long for its type
peek.c:7:29: warning: character constant too long for its type
peek.c:8:11: warning: character constant too long for its type
peek.c:8:29: warning: character constant too long for its type
peek.c:8:45: warning: character constant too long for its type
peek.c:9:11: warning: character constant too long for its type
peek.c:9:29: warning: character constant too long for its type
peek.c:9:46: warning: character constant too long for its type
peek.c:9:106: warning: character constant too long for its type
peek.c:12:9: warning: multi-character character constant
peek.c:12:26: warning: character constant too long for its type
peek.c:14:11: warning: multi-character character constant
peek.c:14:38: warning: character constant too long for its type
peek.c:14:66: warning: character constant too long for its type
peek.c:14:87: warning: character constant too long for its type
peek.c:15:11: warning: multi-character character constant
peek.c:15:26: warning: character constant too long for its type
peek.c:15:66: warning: character constant too long for its type
peek.c:15:80: warning: character constant too long for its type
peek.c:15:131: warning: multi-character character constant
peek.c:15:151: warning: multi-character character constant
peek.c:15:164: error: empty character constant
peek.c:16:27: warning: character constant too long for its type
peek.c:17:23: warning: character constant too long for its type
peek.c:17:37: error: empty character constant
peek.c:17:46: warning: character constant too long for its type
peek.c:18:15: warning: multi-character character constant
peek.c:18:46: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘this’
peek.c:18:56: warning: character constant too long for its type
peek.c:19:16: warning: character constant too long for its type
peek.c:19:1: error: stray ‘\305’ in program
peek.c:19:1: error: stray ‘\233’ in program
peek.c:20:14: warning: multi-character character constant
peek.c:21:10: warning: character constant too long for its type
peek.c:21:24: warning: character cons...
Marc Deslauriers (mdeslaur) wrote : | #1 |
- BootDmesg.txt Edit (69.7 KiB, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (2.0 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (3.7 KiB, text/plain; charset="utf-8")
- GdmLog.txt Edit (23.0 KiB, text/plain; charset="utf-8")
- GdmLog1.txt Edit (24.4 KiB, text/plain; charset="utf-8")
- GdmLog2.txt Edit (23.9 KiB, text/plain; charset="utf-8")
- Lspci.txt Edit (18.0 KiB, text/plain; charset="utf-8")
- Lsusb.txt Edit (658 bytes, text/plain; charset="utf-8")
- ProcCpuinfo.txt Edit (1.5 KiB, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (1.7 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (5.0 KiB, text/plain; charset="utf-8")
- UdevDb.txt Edit (120.9 KiB, text/plain; charset="utf-8")
- UdevLog.txt Edit (278.8 KiB, text/plain; charset="utf-8")
- XorgLog.txt Edit (33.8 KiB, text/plain; charset="utf-8")
- XorgLogOld.txt Edit (35.9 KiB, text/plain; charset="utf-8")
- Xrandr.txt Edit (4.2 KiB, text/plain; charset="utf-8")
- monitors.xml.txt Edit (603 bytes, text/plain; charset="utf-8")
- xdpyinfo.txt Edit (17.3 KiB, text/plain; charset="utf-8")
Id2ndR (id2ndr) wrote : | #2 |
I can confirm this trouble. It happen since the 31/01 to me.
I tried the package from stock repository and xorg-edgers PPA but I've got the same trouble.
It seams to happen quicker when launching virt-manager and firefox and reducing/restoring these windows.
I attach the backtrace of the latest crash
affects: | xorg (Ubuntu) → xserver-xorg-video-nouveau (Ubuntu) |
Thomas (t-hartwig) wrote : | #3 |
Same me on a Thinkpad W510.
System is still working, so you can reach by network, but display is locked and no interaction possible any more.
Thomas (t-hartwig) wrote : | #4 |
Added my Xorg.0.log. BTW tried an earlier kernel 2.6.37, but did freeze as well.
Thomas (t-hartwig) wrote : | #5 |
First one was the current session, now the one from the last freeze.
Thomas (t-hartwig) wrote : | #6 |
Probably related:
https:/
"nouveau.noaccel=1" to add as boot option suggested there. Will test and report...
Thomas (t-hartwig) wrote : | #7 |
nouveau.noaccel=1 seems to workaround this, had no freeze since yesterday. I can see no performance and 3D is not possible either.
Thomas (t-hartwig) wrote : | #8 |
Wanted to write: I can see no performance degrades from my daily work and 3D is not possible either.
In freedesktop.org Bugzilla #26980, =?ISO-8859-15?Q?Tiziano_M=FCller?= (tm-dev-zero) wrote : | #59 |
I'd say the current instructions to build the peek (and other utilities) are:
git clone git://0x04.
cd pgtest
make
Id2ndR (id2ndr) wrote : | #9 |
My system is more stable with latest updates. I don't know exactly which one hopelessly.
Marc Deslauriers (mdeslaur) wrote : | #10 |
I am still getting this 3-4 times a day.
Mariano Chavero (marianochavero) wrote : | #11 |
Same problem here... It gets always I hit the Ubuntu logo.
Bryce Harrington (bryce) wrote : | #12 |
Typically gpu lockup bugs all share the same symptoms (more or less) but can have completely different root causes. "Mouse moving but screen not updating" is nigh-universal to all GPU lockup bugs. Unfortunately we don't have the debugging tools available in the distro to distinguish freezes on -nouveau like we do for -intel.
So, it would *really* help if each person could file their bug report as a unique, separate bug report, rather than joining this bug report assuming it's the same issue. If you don't want to, that's ok, but we'll be focusing only on Marc's issue and once that's solved will close this bug report out.
Marc, did you attempt the nouveau.noaccel=1 workaround that Thomas mentioned? Also, the dmesg in your original bug report appears to be from a working session. What we need for analysis purposes is dmesg (and why not, /var/log/Xorg.0.log too) from when it is locked up; you will want to ssh into the box from another machine to do this.
@everyone else, again, please file a new bug report. In each case we're also going to need dmesg and /var/log/
Changed in xserver-xorg-video-nouveau (Ubuntu): | |
status: | New → Incomplete |
Marc Deslauriers (mdeslaur) wrote : | #13 |
OK, I will try the nouveau.noaccel=1 workaround and will report back.
Changed in xserver-xorg-video-nouveau (Ubuntu): | |
importance: | Undecided → High |
In freedesktop.org Bugzilla #26980, Rtguille (rtguille) wrote : | #60 |
I have a GT220 and sometimes freezes randomly:
[ 3675.146]
Backtrace:
[ 3675.153] 0: /usr/bin/X (xorg_backtrace
[ 3675.153] 1: /usr/bin/X (0x400000+0x63509) [0x463509]
[ 3675.153] 2: /lib64/libc.so.6 (0x34eca00000+
[ 3675.153] 3: /usr/lib64/
(DRI2CloseScree
[ 3675.153] 4: /usr/lib64/
(0x7fb95b809000
[ 3675.154] 5: /usr/bin/X (0x400000+0xa3b49) [0x4a3b49]
[ 3675.154] 6: /usr/bin/X (0x400000+0x15daec) [0x55daec]
[ 3675.154] 7: /usr/bin/X (0x400000+0x2193c) [0x42193c]
[ 3675.154] 8: /lib64/libc.so.6 (__libc_
[ 3675.154] 9: /usr/bin/X (0x400000+0x21449) [0x421449]
[ 3675.154] Segmentation fault at address 0x10
[ 3675.154]
Fatal server error:
[ 3675.154] Caught signal 11 (Segmentation fault). Server aborting
[ 3675.154]
[ 3675.154]
Please consult the Fedora Project support
at http://
for help.
[ 3675.154] Please also check the log file at "/var/log/
additional information.
[ 3675.154]
[ 3675.157] (II) NOUVEAU(0): NVLeaveVT is called.
That error is a bit old.
Ok, i read about the random lockus some time ago.
The interesting thing is that at some point i started to use (i still do) use
the nVIDIA (P)drivers, and to my surprise, it also has random gpu lockus (granted, it is another piece of software)
But when i put my new ati HD5670 and found to also random freeze...
I run F13.
I never had issues with Slackware 13.1 and nvidia(P).
The OtherOS never freezed. with any card.
For F13 i must use pcie_aspm=off, because of issues with the sata controller.
But pcie_aspm=off also seems to set the pcie bus into gen1, (it halves the
link speed and the de-emphasys, and also changes. I do not know if it is normal to pcie_aspm=off to do that and other thins that do.
For example the nvidia(p) report that the card is at gen1 speed. (the mb is Gen2 and the card is also Gen2) M4N72-E
Is expected for the nouvau driver to work with any pcie configuration, different link/speed aspm/no_aspm/ ?
I know that it may not be related, but i do not know if subtle different pcie configurations (or pcie driver bugs) may lead to the vga to behave in that way.
Thanks in advance.
Marc Deslauriers (mdeslaur) wrote : | #14 |
The nouveau.noaccel=1 workaround seems to have worked for me. With that, I have not had any crashes in the past couple of days.
Bryce, do you still need a dmesg from a crash?
In freedesktop.org Bugzilla #26980, Frédéric Crozat (fcrozat) wrote : | #61 |
Created attachment 43519
before and after peek on nv40 from Lenovo T410
Here is peek before and after free, on kernel 2.6.38rc5, on Lenovo T410 laptop with integrated nividia GPU:
01:00.0 VGA compatible controller: nVidia Corporation GT218 [NVS 3100M] (rev a2) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 2142
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at cc000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at ce000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 2000 [size=128]
[virtual] Expansion ROM at cd000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nouveau
In freedesktop.org Bugzilla #26980, Frepdesktop (frepdesktop) wrote : | #62 |
My Xorg was also blocking - only the mouse pointer keeps moving, but no other thing happends.
I solved the problem (for now), by removing the comment for the option NoAccel and setting the value to true. I'm using the xorg.conf generated with Xorg -configure with just that change, and everything seem to be working. I'm even using the composite from xfce4 (for the real transparent xfce4-terminal), and it's working ok - maybe not as fast as with accelaration.
For me it was very simple to reproduce the error. As soon as gdm was active (the default theme on debian - the one with the stars and the star rocket), a click on any menu would block everything but the mouse movement.
Here is the log for my X when it was blocking.
Anything else I can do to help solve this?
X.Org X Server 1.9.4
Release Date: 2011-02-04
[ 147.356] X Protocol Version 11, Revision 0
[ 147.356] Build Operating System: Linux 2.6.32.28-dsa-ia32 i686 Debian
[ 147.356] Current Operating System: Linux voyager 2.6.37-1-686 #1 SMP Tue Feb 15 18:21:50 UTC 2011 i686
[ 147.356] Kernel command line: BOOT_IMAGE=
[ 147.356] Build Date: 17 February 2011 01:25:01AM
[ 147.356] xorg-server 2:1.9.4-2 (Cyril Brulebois <email address hidden>)
[ 147.356] Current version of pixman: 0.21.4
[ 147.356] Before reporting problems, check http://
to make sure that you have the latest version.
[ 147.356] Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[ 147.356] (==) Log file: "/var/log/
[ 147.356] (==) Using config file: "/etc/X11/
[ 147.356] (==) Using system config directory "/usr/share/
[ 147.373] (==) ServerLayout "X.org Configured"
[ 147.373] (**) |-->Screen "Screen0" (0)
[ 147.373] (**) | |-->Monitor "Monitor0"
[ 147.374] (**) | |-->Device "Card0"
[ 147.374] (**) |-->Input Device "Mouse0"
[ 147.374] (**) |-->Input Device "Keyboard0"
[ 147.374] (==) Automatically adding devices
[ 147.374] (==) Automatically enabling devices
[ 147.452] (WW) The directory "/usr/share/
[ 147.452] Entry deleted from font path.
[ 147.541] (WW) The directory "/usr/share/
[ 147.541] Entry deleted from font path.
[ 147.541] (**) FontPath set to:
/usr/share/
/usr/share/
/usr/share/
/usr/share/
/usr/share/
/usr/share/
/var/lib/
built-ins,
/usr/share/
/usr/share/
/usr/share/
/usr/share/
/usr/share/
/usr/share/
/var/lib/
built-ins
[ 147.541] (**) ModulePath set to "/usr/lib/
[ 147.541] (WW) AllowEmptyInput is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' wi...
In freedesktop.org Bugzilla #26980, Timo Aaltonen (tjaalton) wrote : | #63 |
*** Bug 33357 has been marked as a duplicate of this bug. ***
In freedesktop.org Bugzilla #26980, Jordan Bradley (jordan-w-bradley) wrote : | #64 |
Is there anything the non-programmer can do t
In freedesktop.org Bugzilla #26980, Alex Mayorga (alex-mayorga) wrote : | #65 |
FWIW you can see a peek of my frozen nVidia Corporation GT216 [GeForce GT 230M] (rev a2) at https:/
In freedesktop.org Bugzilla #26980, Thomas Schwinge (tschwinge) wrote : | #66 |
Created attachment 43864
peek_10a000_
Marcin, in case you need further data, see the
peek_10a000_
names should be explanatory. I ran each of the invokations three times.
I will now switch to using nouveau.noaccel=1, but please tell if you need
further data or need something tested.
This system is a DELL PRECISION M4500, the graphics card's lspci output:
01:00.0 VGA compatible controller: nVidia Corporation GT216 [Quadro FX 880M] (rev a2)
I hit this while setting up the system, roughly one hour after finishing
a fresh Ubuntu 10.10 maverick installation, while browsing in Firfox some
web pages about how to get suspend / hibernate / resume working reliably
-- oh the joy of installing GNU/Linux systems on new hardware. :-)
In freedesktop.org Bugzilla #26980, Pav-s (pav-s) wrote : | #67 |
I have been using the git sources for two weeks, and updated to 2.6.39 yesterday:
since somewhere after 25c68aef4e6abcc
great work, thank you
In freedesktop.org Bugzilla #26980, Pav-s (pav-s) wrote : | #68 |
I am sorry, the last comment was probably unclear:
Hardware: Thinkpad T410, NVS 3100M (nva8?)
Kernel: vanilla 2.6.39+ from kernel.org + nouveau tree merged
In freedesktop.org Bugzilla #26980, Emil-l-velikov (emil-l-velikov) wrote : | #69 |
For anyone still experiencing this bug, here is a possible workaround (thanks to ColdFeetBob for pointing it out)
Modify* /xserver/mi/mieq.c and rebuild xserver
* Increase the "#define QUEUE_SIZE 512" to 1024, 2048 or 4096 [1]
Note that this is *not* a proper fix as you can see in the discussion [2], and it isn't recommended for unexperienced users
[1] http://
[2] https:/
In freedesktop.org Bugzilla #26980, Frédéric Crozat (fcrozat) wrote : | #70 |
I can confirm I didn't had any freeze using 2.6.39 kernel on Lenovo T410 for 3 days (I had several freezes per hour before), using GNOME Shell.
In freedesktop.org Bugzilla #26980, Ildar (ildar-users) wrote : | #71 |
(In reply to comment #54)
> I can confirm I didn't had any freeze using 2.6.39 kernel on Lenovo T410 for 3
> days (I had several freezes per hour before), using GNOME Shell.
Frederic, you forgot to mention that it's now terribly unstable. I have regular crashes like below, especially on: VT switching, Suspend-to-RAM, etc.
Details:
(II) NOUVEAU(0): NVLeaveVT is called.
Backtrace:
0: /usr/bin/Xorg (xorg_backtrace
1: /usr/bin/Xorg (0x400000+0x62449) [0x462449]
2: /lib64/
3: /lib64/libc.so.6 (0x7fc7e5f00000
4: /lib64/libc.so.6 (__libc_
5: /usr/lib64/
6: /usr/lib64/
7: /usr/bin/Xorg (0x400000+0xa4a20) [0x4a4a20]
8: /usr/bin/Xorg (ChangeWindowAt
9: /usr/bin/Xorg (0x400000+0x27c38) [0x427c38]
10: /usr/bin/Xorg (0x400000+0x2db61) [0x42db61]
11: /usr/bin/Xorg (0x400000+0x215ce) [0x4215ce]
12: /lib64/libc.so.6 (__libc_
13: /usr/bin/Xorg (0x400000+0x21179) [0x421179]
Segmentation fault at address (nil)
Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting
In freedesktop.org Bugzilla #26980, Skeggsb (skeggsb) wrote : | #72 |
(In reply to comment #55)
> (In reply to comment #54)
> > I can confirm I didn't had any freeze using 2.6.39 kernel on Lenovo T410 for 3
> > days (I had several freezes per hour before), using GNOME Shell.
>
> Frederic, you forgot to mention that it's now terribly unstable. I have regular
> crashes like below, especially on: VT switching, Suspend-to-RAM, etc.
>
> Details:
> (II) NOUVEAU(0): NVLeaveVT is called.
Not even remotely related. And from the backtrace, probably not nouveau's fault either.
>
> Backtrace:
> 0: /usr/bin/Xorg (xorg_backtrace
> 1: /usr/bin/Xorg (0x400000+0x62449) [0x462449]
> 2: /lib64/
> 3: /lib64/libc.so.6 (0x7fc7e5f00000
> 4: /lib64/libc.so.6 (__libc_
> 5: /usr/lib64/
> 6: /usr/lib64/
> 7: /usr/bin/Xorg (0x400000+0xa4a20) [0x4a4a20]
> 8: /usr/bin/Xorg (ChangeWindowAt
> 9: /usr/bin/Xorg (0x400000+0x27c38) [0x427c38]
> 10: /usr/bin/Xorg (0x400000+0x2db61) [0x42db61]
> 11: /usr/bin/Xorg (0x400000+0x215ce) [0x4215ce]
> 12: /lib64/libc.so.6 (__libc_
> 13: /usr/bin/Xorg (0x400000+0x21179) [0x421179]
> Segmentation fault at address (nil)
>
> Fatal server error:
> Caught signal 11 (Segmentation fault). Server aborting
In freedesktop.org Bugzilla #26980, Frédéric Crozat (fcrozat) wrote : | #73 |
I didn't forgot anything : nouveau is rock stable with 2.6.39.x kernel on my T410 (no issue with VT switch. I didn't test suspend-to-ram). Your issues are a different bug, I think (X crash, no gpu lockup).
In freedesktop.org Bugzilla #26980, List0570 (list0570) wrote : | #74 |
Uuggh, this seems to be a serious problem, even when using nvidia.ko.
Why? Because xorg seems to probe drivers on startup, and somehow managing to load the nouveau module even when I want to use the nvidia module. The nouveau module then fails to unload properly (use count is always at least 1, and stays 1 even after changing to runlevel 3 and killing xorg).
I could reliably hang the system just by logging out from kdm. Syslog is full of nouveau problems - why, given that I am using nvidia?
When renaming all instances of nouveau.ko under /lib/modules the hangs disappeared.
This has always been there and seems to have zilch effect:
> ls -l /etc/modprobe.
-rw-r--r-- 1 root root 18 2011-04-29 22:55 /etc/modprobe.
> cat /etc/modprobe.
blacklist nouveau
Symptoms are random freezes requiring a hardware reset, but always involving some graphics operation. I can detect no other hardware malfunction, and lockup behaviour is not characteristic of general hardware faults (e.g. ram, cpu, disk).
My reset button is really polished now ;-((
System details:
openSUSE 11.4 with all updates installed,
currently that is kernel kernel-
xorg-x11-
The problem seems worse with
kernel-
01:00.0 VGA compatible controller: nVidia Corporation GT216 [GeForce GT 220] (rev a2)
Mobo is Gigabyte GA-880GA-UD3H with
NB/SB: AMD 880G / SB850
USB 2.0 + 3.0, SATA 3Gb + 6Gb (prob not relevant)
The chipset also contains an integrated ATI Radeon HD 4250 (which works fine on the OSS radeon driver, with the limits of the driver).
Phenom II X6 1090T CPU
In freedesktop.org Bugzilla #26980, Ppaalanen (ppaalanen) wrote : | #75 |
(In reply to comment #58)
> Uuggh, this seems to be a serious problem, even when using nvidia.ko.
> Why? Because xorg seems to probe drivers on startup, and somehow managing to
> load the nouveau module even when I want to use the nvidia module. The nouveau
> module then fails to unload properly (use count is always at least 1, and stays
> 1 even after changing to runlevel 3 and killing xorg).
Your problem has nothing to do with this bug report.
You are (or the X server is) trying to use nouveau and nvidia drivers at the same time, which is known to cause havoc and should not be attempted. All the fallout you described fits perfectly.
The simple fix for you is to define Driver "nvidia" in your xorg.conf.
That prevents probing of any drivers and uses only what you want.
If you feel there is bad behavior in the X server, when it probes different drivers, you should file a bug against the X server. But first, make sure your distribution has not patched the X server driver probing code, e.g. by adding drivers not present in the original driver list. If they have, you should complain to your distribution.
In freedesktop.org Bugzilla #26980, Younes Manton (younes-m) wrote : | #76 |
On , <email address hidden> wrote:
> https:/
> --- Comment #58 from Volker Kuhlmann <email address hidden>> 2011-07-13
> 04:03:19 PDT ---
> Uuggh, this seems to be a serious problem, even when using nvidia.ko.
> Why? Because xorg seems to probe drivers on startup, and somehow managing
> to
> load the nouveau module even when I want to use the nvidia module. The
> nouveau
> module then fails to unload properly (use count is always at least 1, and
> stays
> 1 even after changing to runlevel 3 and killing xorg).
> I could reliably hang the system just by logging out from kdm. Syslog is
> full
> of nouveau problems - why, given that I am using nvidia?
> When renaming all instances of nouveau.ko under /lib/modules the hangs
> disappeared.
> This has always been there and seems to have zilch effect:
> > ls -l /etc/modprobe.
> -rw-r--r-- 1 root root 18 2011-04-29 22:55 /etc/modprobe.
> > cat /etc/modprobe.
> blacklist nouveau
> Symptoms are random freezes requiring a hardware reset, but always
> involving
> some graphics operation. I can detect no other hardware malfunction, and
> lockup
> behaviour is not characteristic of general hardware faults (eg ram, cpu,
> disk).
> My reset button is really polished now ;-((
> System details:
> openSUSE 11.4 with all updates installed,
> currently that is kernel kernel-
> xorg-x11-
> The problem seems worse with
> kernel-
> 01:00.0 VGA compatible controller: nVidia Corporation GT216 [GeForce GT
> 220]
> (rev a2)
> Mobo is Gigabyte GA-880GA-UD3H with
> NB/SB: AMD 880G / SB850
> USB 2.0 + 3.0, SATA 3Gb + 6Gb (prob not relevant)
> The chipset also contains an integrated ATI Radeon HD 4250 (which works
> fine on
> the OSS radeon driver, with the limits of the driver).
> Phenom II X6 1090T CPU
Delete nouveau.ko.
In freedesktop.org Bugzilla #26980, Arun Raghavan (arunraghavan) wrote : | #77 |
Using 3.0.4, this bug is very much still there (or at least the backgrace looks very similar). Happy to help provide more information or debug if pointed in the right direction.
(gdb) bt
#0 0x00007f5af04f3007 in ioctl ()
at ../sysdeps/
#1 0x00007f5aeea8f878 in drmIoctl (fd=9,
request=
at /usr/src/
#2 0x00007f5aeea91c2b in drmCommandWrite (
fd=<optimized out>, drmCommandIndex
data=<optimized out>, size=<optimized out>)
at /usr/src/
#3 0x00007f5aee44311d in nouveau_bo_wait (bo=0x1b75420,
cpu_
no_
at /usr/src/
#4 0x00007f5aee443703 in nouveau_
bo=0x1b75420, delta=0, size=<optimized out>, flags=4)
at /usr/src/
#5 0x00007f5aee64d2b5 in NVAccelDownloadM2MF (
dst_pitch=3740, dst=0x2c96644 "", h=20, w=1,
y=<optimized out>, x=237, pspix=0x29275f0)
at /usr/src/
#6 nouveau_
x=237, y=348, w=1, h=20, dst=0x2c96644 "",
dst_pitch=3740)
at /usr/src/
#7 0x00007f5aed9f81ee in exaCopyDirty (
migrate=
pValidSrc=
transfer=
at /usr/src/
#8 0x00007f5aed9fac41 in exaPrepareAcces
pPixmap=
at /usr/src/
#9 0x00007f5aeda061d0 in ExaPrepareCompo
height=20, width=1, yDst=31, xDst=234, yMask=0,
xMask=0, ySrc=0, xSrc=0, pDst=0x28a94e0, pMask=0x0,
pSrc=0x28a9e20, op=57 '9', pScreen=<optimized out>)
at /usr/src/
#10 ExaCheckComposite (op=57 '9', pSrc=0x28a9e20,
pMask=0x0, pDst=0x28a94e0, xSrc=0, ySrc=0, xMask=0,
yMask=0, xDst=234, yDst=31, width=1, height=20)
at /usr/src/
#11 0x00007f5aeda02189 in exaComposite (op=57 '9',
pSrc=0x28a9e20, pMask=0x0, pDst=0x28a94e0,
xSrc=<optimized out>, ySrc=<optimized out>, xMask=0,
yMask=0, xDst=<optimized out>, yDst=<optimized out>,
width=1, height=20)
at /usr/src/
#12 0x00000000004db50a in damageComposite (op=57 '9',
pSrc=0x28a9e20, pMask=0x0, pDst=0x28a94e0, xSrc=0,
ySrc=0, xMask=0, yMask=0, xDst=234, yDst=31, width=1,
height=20)
at /u...
In freedesktop.org Bugzilla #26980, Skeggsb (skeggsb) wrote : | #78 |
(In reply to comment #61)
> Using 3.0.4, this bug is very much still there (or at least the backgrace looks
> very similar). Happy to help provide more information or debug if pointed in
> the right direction.
Any GPU hang etc will have similar backtraces from X's point of view, and it's not usually useful in itself to see what's happening.
Did you have any output in your kernel log from when this occurred?
In freedesktop.org Bugzilla #26980, Arun Raghavan (arunraghavan) wrote : | #79 |
(In reply to comment #62)
[...]
> Did you have any output in your kernel log from when this occurred?
There was no output after the initial module load messages.
In freedesktop.org Bugzilla #26980, Skeggsb (skeggsb) wrote : | #80 |
(In reply to comment #63)
> (In reply to comment #62)
> [...]
> > Did you have any output in your kernel log from when this occurred?
>
> There was no output after the initial module load messages.
Are you able to install envytools (http://
Also, how new are all your userspace components (xf86-video-
In freedesktop.org Bugzilla #26980, Lithium-flower (lithium-flower) wrote : | #81 |
I experimented this bug not long ago - right now, Im not sure it is exactly the same, since it's not really random but it happens everytime I run Supertux.
I recovered the folloving kernel log lines rebooting from another partition after the freeze.
kernel 3.0.4, Arch Linux, x86_64, fully updated
nouveau-dri 7.11-2
lspci:
01:00.0 VGA compatible controller: nVidia Corporation GT215 [GeForce GT 240] (rev a2) (prog-if 00 [VGA controller])
kernel.log:
Sep 11 08:44:25 faye kernel: [ 1297.836589] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:25 faye kernel: [ 1298.301418] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44:27 faye kernel: [ 1299.629024] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x04400000 - Ch 2
Sep 11 08:44:27 faye kernel: [ 1300.956673] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x04400000 - Ch 2
Sep 11 08:44:27 faye kernel: [ 1300.956719] psmouse.c: Wheel Mouse at isa0060/
Sep 11 08:44:29 faye kernel: [ 1302.284342] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44:29 faye kernel: [ 1299.632344] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:31 faye kernel: [ 1302.960046] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:32 faye kernel: [ 1303.611794] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:34 faye kernel: [ 1303.615050] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:53 faye kernel: [ 1302.960046] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:53 faye kernel: [ 1305.614198] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44:53 faye kernel: [ 1307.613341] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:53 faye kernel: [ 1310.944310] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:53 faye kernel: [ 1307.613341] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:53 faye kernel: [ 1314.940910] [drm] nouveau 0000:01:00.0: PRAMIN flush timeout
Sep 11 08:44:53 faye kernel: [ 1310.941070] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:53 faye kernel: [ 1316.266913] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:53 faye kernel: [ 1312.943555] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:53 faye kernel: [ 1312.943555] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:55 faye kernel: [ 1318.264271] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x04c00000 - Ch 2
Sep 11 08:44:55 faye kernel: [ 1322.259717] psmouse.c: resync failed, issuing reconnect request
Sep 11 08:44:55 faye kernel: [ 1322.256898] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:55 faye kernel: [ 1322.256898] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:55 faye kernel: [ 1323.596565] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44...
In freedesktop.org Bugzilla #26980, Arun Raghavan (arunraghavan) wrote : | #82 |
(In reply to comment #64)
> (In reply to comment #63)
> > (In reply to comment #62)
> > [...]
> > > Did you have any output in your kernel log from when this occurred?
> >
> > There was no output after the initial module load messages.
>
> Are you able to install envytools
> (http://
> and run "nvapeek 0x400700 4" while the GPU is hung?
All I get is a '...'.
> Also, how new are all your userspace components (xf86-video-
> mesa etc)?
mesa - 7.11
libdrm - 2.4.26
xf86-video-nouveau - 0.0.16_pre20110801 (that's the Gentoo package, which is presumably a snapshot from that date)
For what it's worth, kernel's 3.0.4, and the GPU is a 9400M (PCI id 10de:0863).
In freedesktop.org Bugzilla #26980, Marcin Slusarz (marcin-slusarz) wrote : | #83 |
This bug was fixed in 2.6.39 ( http://
so I'm closing this bug report.
ColdFeetBob, Arun Raghavan: you are experiencing different bugs.
If you still can reproduce them please read http://
penalvch (penalvch) wrote : | #15 |
Marc Deslauriers, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://
If it remains an issue, could you please run the following command in the development release from a Terminal (Applications-
apport-collect 711434
tags: | added: bios-outdated-2.30 |
description: | updated |
summary: |
- [natty] frequent Xorg freezes with nouveau + [Lenovo ThinkPad T61] frequent Xorg freezes with nouveau |
summary: |
- [Lenovo ThinkPad T61] frequent Xorg freezes with nouveau + 10de:0429 [Lenovo ThinkPad T61] frequent Xorg freezes with nouveau |
Changed in xserver-xorg-video-nouveau (Ubuntu): | |
importance: | High → Low |
Changed in nouveau: | |
importance: | Unknown → High |
status: | Unknown → Fix Released |
Changed in xserver-xorg-video-nouveau (Ubuntu): | |
status: | Incomplete → Fix Released |
Created attachment 33897
Xorg log
I have observed X server hangs up few times (not too often, usually once per 1-2 days). System continue to work properly still accessible through network. kill -9 to X restore the system with loss of X session. strace to X process shows it hangs on ioctl to /dev/dri/card0:
ioctl(11, 0x40086485, 0x7fff61c01800) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(11, 0x40086485, 0x7fff61c01800) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(11, 0x40086485, 0x7fff61c01800) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
X log has message on event queue overflow and backtrace:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
Backtrace: +0x28) [0x49ea18] EventP+ 0xce) [0x478f9e] xorg/modules/ input/evdev_ drv.so (0x7feb9c7cd000 +0x516f) [0x7feb9c7d216f] libpthread. so.0 (0x3e9e200000+ 0xf0f0) [0x3e9e20f0f0] libdrm. so.2 (drmIoctl+0x23) [0x3eb3e03383] libdrm. so.2 (drmCommandWrit e+0x1b) [0x3eb3e0360b] libdrm_ nouveau. so.1 (0x7feb9fe68000 +0x2f1d) [0x7feb9fe6af1d] libdrm_ nouveau. so.1 (nouveau_ bo_map_ range+0xfc) [0x7feb9fe6b11c] libdrm_ nouveau. so.1 (0x7feb9fe68000 +0x2106) [0x7feb9fe6a106] libdrm_ nouveau. so.1 (nouveau_ pushbuf_ flush+0x29c) [0x7feb9fe6a49c] xorg/modules/ libexa. so (0x7feb9dc89000 +0x90e1) [0x7feb9dc920e1] xorg/modules/ libexa. so (0x7feb9dc89000 +0x939d) [0x7feb9dc9239d] 0x28d) [0x545e5d] xorg/modules/ libexa. so (0x7feb9dc89000 +0x8667) [0x7feb9dc91667] start_main+ 0xfd) [0x3e9d61eb1d]
0: /usr/bin/X (xorg_backtrace
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49e3e4]
2: /usr/bin/X (xf86PostMotion
3: /usr/lib64/
4: /usr/bin/X (0x400000+0x6be87) [0x46be87]
5: /usr/bin/X (0x400000+0x1171c3) [0x5171c3]
6: /lib64/
7: /lib64/libc.so.6 (ioctl+0x7) [0x3e9d6d6937]
8: /usr/lib64/
9: /usr/lib64/
10: /usr/lib64/
11: /usr/lib64/
12: /usr/lib64/
13: /usr/lib64/
14: /usr/lib64/
15: /usr/lib64/
16: /usr/bin/X (miCopyRegion+
17: /usr/bin/X (miDoCopy+0x44a) [0x54636a]
18: /usr/lib64/
19: /usr/bin/X (0x400000+0xd44b8) [0x4d44b8]
20: /usr/bin/X (0x400000+0x2b3bc) [0x42b3bc]
21: /usr/bin/X (0x400000+0x2c86c) [0x42c86c]
22: /usr/bin/X (0x400000+0x21e3a) [0x421e3a]
23: /lib64/libc.so.6 (__libc_
24: /usr/bin/X (0x400000+0x219f9) [0x4219f9]
I'm using Fedora 12 for x86-64, uname -a output is 9-67.fc12. x86_64 #1 SMP Sat Feb 27 09:26:40 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux drv-nouveau version 0.0.15 release 20.20091105gite 1c2efd. fc12
Linux lumos 2.6.32.
xorg-x11-
Ask me if I can provide more information or make more experiments.
Thanks, Victor