[GTX 580] X seems to hang a login prompt -- PFIFO errors in dmesg

Bug #925048 reported by Andy Whitcroft on 2012-02-01
72
This bug affects 18 people
Affects Status Importance Assigned to Milestone
Nouveau Xorg driver
Fix Released
Critical
xserver-xorg-video-nouveau (Ubuntu)
High
Unassigned

Bug Description

Machine boots to and displays the lightdm login banner, as soon as the cursor flashes in the password box output stops. The mouse pointer is still movable but nothing else works.

This was successfully running nouveau on an earlier version of ubuntu. The fault occurred after upgrade to precise.

[ 15.944425] [drm] nouveau 0000:03:00.0: PMFB2_SUBP0: 0x037f0040
[ 15.944430] [drm] nouveau 0000:03:00.0: PMFB2_SUBP1: 0x037f0000
[ 15.944435] [drm] nouveau 0000:03:00.0: PMFB3_SUBP0: 0x037f0000
[ 15.944440] [drm] nouveau 0000:03:00.0: PMFB3_SUBP1: 0x037f0040
[ 15.944445] [drm] nouveau 0000:03:00.0: PMFB4_SUBP0: 0x037f0040
[ 15.944449] [drm] nouveau 0000:03:00.0: PMFB4_SUBP1: 0x037f0000
[ 16.785866] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
[ 17.349142] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 17.349654] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 21.940164] [drm] nouveau 0000:03:00.0: PFIFO: read fault at 0x00080da000 [PAGE_SYSTEM_ONLY] from PGRAPH/GPC3/(unknown enum 0x0000000b) on channel 0x0000a21000
[ 21.940171] [drm] nouveau 0000:03:00.0: PFIFO: unknown status 0x40000000

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: xorg 1:7.6+10ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-12.21-generic 3.2.2
Uname: Linux 3.2.0-12-generic x86_64
ApportVersion: 1.91-0ubuntu1
Architecture: amd64
CompizPlugins: [core,bailer,detection,composite,opengl,decor,mousepoll,vpswitch,regex,animation,snap,expo,move,compiztoolbox,place,grid,gnomecompat,wall,ezoom,workarounds,staticswitcher,resize,fade,scale,session,unityshell]
Date: Wed Feb 1 19:19:29 2012
DistUpgraded: Log time: 2012-02-01 16:44:35.369830
DistroCodename: precise
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GlAlternative: lrwxrwxrwx 1 root root 24 Feb 7 2011 /etc/alternatives/gl_conf -> /usr/lib/mesa/ld.so.conf
GraphicsCard:
 NVIDIA Corporation GF110 [GeForce GTX 580] [10de:1080] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: eVga.com. Corp. Device [3842:1587]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha amd64 (20110207)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 003: ID 099a:7202 Zippy Technology Corp.
ProcEnviron:
 LC_CTYPE=en_GB.UTF-8
 LC_COLLATE=en_GB.UTF-8
 LANG=en_GB.UTF-8
 LC_MESSAGES=en_GB.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-12-generic root=UUID=375e3b8f-01ec-41de-80ee-6ab3210f6988 ro quiet splash nolapic_timer vt.handoff=7
SourcePackage: xorg
UpgradeStatus: Upgraded to precise on 2012-02-01 (0 days ago)
dmi.bios.date: 07/15/2011
dmi.bios.vendor: Intel Corp.
dmi.bios.version: LCPBG10J.86A.0548.2011.0715.2025
dmi.board.asset.tag: Base Board Asset Tag
dmi.chassis.type: 3
dmi.modalias: dmi:bvnIntelCorp.:bvrLCPBG10J.86A.0548.2011.0715.2025:bd07/15/2011:svn:pn:pvr:rvn:rn:rvr:cvn:ct3:cvr:
version.compiz: compiz 1:0.9.6+bzr20110929-0ubuntu8
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.30-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.12.0~git20110825.27395cb5-0ubuntu0sarvatt~natty
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental 7.12.0~git20110825.27395cb5-0ubuntu0sarvatt~natty
version.libgl1-mesa-glx: libgl1-mesa-glx 7.12.0~git20110825.27395cb5-0ubuntu0sarvatt~natty
version.xserver-xorg-core: xserver-xorg-core 2:1.11.3-0ubuntu9
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.99.901-1ubuntu3
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20111219.aacbd629-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.17.0-1ubuntu3
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1build2

Andy Whitcroft (apw) wrote :
Bryce Harrington (bryce) on 2012-02-01
affects: xorg (Ubuntu) → xserver-xorg-video-nouveau (Ubuntu)
Changed in xserver-xorg-video-nouveau (Ubuntu):
importance: Undecided → High
status: New → Confirmed
description: updated
description: updated
Bryce Harrington (bryce) on 2012-02-01
summary: - [nouveau] X seems to hang a login prompt -- PFIFO errors in dmesg
+ [GTX 580] X seems to hang a login prompt -- PFIFO errors in dmesg
Bryce Harrington (bryce) wrote :

Hitting nvc0_fifo_isr_vm_fault() at linux-3.2.0/drivers/gpu/drm/nouveau/nvc0_fifo.c:424 Only place in nouveau this gets called is from nvc0_fifo_isr(). The PFIFO: unknown status 0x40000000 message also comes from nvc0_fifo_isr(). Neither appear to be terminating faults. Unlikely to be good tho.

Both nvc0_fifo_isr_vm_fault() and nvc0_fifo_isr() appear to be unchanged compared with linux 3.0.0. Presumably these are just catching an error which occurs further up the stack? If that's so, then booting an earlier kernel would not make any difference.

Download full text (4.1 KiB)

Forwarding this bug from Ubuntu reporter Andy Whitcroft:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/925048

[Problem]
[GTX 580] X seems to hang a login prompt -- PFIFO errors in dmesg

[Original Description]
Machine boots to and displays the lightdm login banner, as soon as the cursor flashes in the password box output stops. The mouse pointer is still movable but nothing else works.

This was successfully running nouveau on an earlier version of ubuntu. The fault occurred after upgrade to precise.

[ 15.944425] [drm] nouveau 0000:03:00.0: PMFB2_SUBP0: 0x037f0040
[ 15.944430] [drm] nouveau 0000:03:00.0: PMFB2_SUBP1: 0x037f0000
[ 15.944435] [drm] nouveau 0000:03:00.0: PMFB3_SUBP0: 0x037f0000
[ 15.944440] [drm] nouveau 0000:03:00.0: PMFB3_SUBP1: 0x037f0040
[ 15.944445] [drm] nouveau 0000:03:00.0: PMFB4_SUBP0: 0x037f0040
[ 15.944449] [drm] nouveau 0000:03:00.0: PMFB4_SUBP1: 0x037f0000
[ 16.785866] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
[ 17.349142] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 17.349654] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 21.940164] [drm] nouveau 0000:03:00.0: PFIFO: read fault at 0x00080da000 [PAGE_SYSTEM_ONLY] from PGRAPH/GPC3/(unknown enum 0x0000000b) on channel 0x0000a21000
[ 21.940171] [drm] nouveau 0000:03:00.0: PFIFO: unknown status 0x40000000

DistroRelease: Ubuntu 12.04
Package: xorg 1:7.6+10ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-12.21-generic 3.2.2
Uname: Linux 3.2.0-12-generic x86_64
ApportVersion: 1.91-0ubuntu1
Architecture: amd64
CompizPlugins: [core,bailer,detection,composite,opengl,decor,mousepoll,vpswitch,regex,animation,snap,expo,move,compiztoolbox,place,grid,gnomecompat,wall,ezoom,workarounds,staticswitcher,resize,fade,scale,session,unityshell]
Date: Wed Feb 1 19:19:29 2012
DistUpgraded: Log time: 2012-02-01 16:44:35.369830
DistroCodename: precise
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GlAlternative: lrwxrwxrwx 1 root root 24 Feb 7 2011 /etc/alternatives/gl_conf -> /usr/lib/mesa/ld.so.conf
GraphicsCard:
NVIDIA Corporation GF110 [GeForce GTX 580] [10de:1080] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. Device [3842:1587]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha amd64 (20110207)
Lsusb:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 003: ID 099a:7202 Zippy Technology Corp.
ProcEnviron:
LC_CTYPE=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8
LANG=en_GB.UTF-8
LC_MESSAGES=en_GB.UTF-8ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-12-generic root=UUID=375e3b8f-01ec-41de-80ee-6ab3210f6988 ro quiet splash nolapic_timer vt.handoff=7
SourcePackage: xorg
UpgradeStatus: Upgraded to precise on 2012-02-01 (0 days ago)
dmi.bios.date: 07/15/2011
dmi.bios.vendor: Intel Corp.
dmi.bios.version: LCPBG10J.86A.0548.2011.0715.2025
dmi.board.asset.tag: Base Board ...

Read more...

Created attachment 56475
BootDmesg.txt

Created attachment 56476
CurrentDmesg.txt

Created attachment 56477
XorgLog.txt

Created attachment 56478
XorgLogOld.txt

Bryce Harrington (bryce) wrote :

Andy, I've forwarded this bug upstream to https://bugs.freedesktop.org/show_bug.cgi?id=45517 - please subscribe to the bug.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Confirmed → Triaged
Bryce Harrington (bryce) wrote :

The nvc0_fifo_isr() routine is essentially called from a signal handler so it's not possible for me to trace back in the kernel what triggers it to get called. I've reviewed recent git changes in drm/nouveau; nothing jumps out to me but I'm not a nouveau guru. I don't think testing an older kernel (like whatever you were running previously) would help, but certainly couldn't hurt.

There are no bug reports filed about this issue upstream, so I forwarded this one up. If we don't get a response, then it might be worthwhile for you to approach ben skeggs directly via IRC on #nouveau.

Bryce Harrington (bryce) wrote :

Testing of drm-next might be worthwhile. I didn't see anything in the upstream kernel tree's drm that leapt out at me as worth testing.

Changed in nouveau:
importance: Unknown → High
status: Unknown → Confirmed

Hello Bryce

Here are some tips that will help triage the issue

* Is the person using "3d" - try reproducing the issue without it
Disable AIGLX in the X server
(Re)move the file (or package providing it) "/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so"

* Try booting with "nouveau.nofbaccel=0"

* Amend the previous option to read "nouveau.noaccel=0"

Although before attempting any of the above I would recommend trying the git master branch (or as close as) of the following

* [1] xf86-video-nouveau
* [2] kernel
* [3] mesa

Note that those tips can be used for any case/issue

Cheers
Emil

[1] http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/
[2] http://cgit.freedesktop.org/nouveau/linux-2.6/
[3] http://cgit.freedesktop.org/mesa/mesa/

Maarten Lankhorst (mlankhorst) wrote :

Does the problem go away if you boot with the kernel from older ubuntu?

Maarten Lankhorst (mlankhorst) wrote :

Not sure yet how to fix it, but can be worked around by adding nouveau.noaccel=1 to command line.

Download full text (7.0 KiB)

Hello,

I've compiled linux-nouveau-3.6.0rc4 x86_64 (made a package for Arch). With this, the system is stable for 1 minute. Then, when I run compiz, or cairo-dock, it freezes..

Here are dmesg messages :

[ 0.000000] Linux version 3.6.0-1-nouveau (bioman@pumpkin) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 SMP PREEMPT Sun Sep 2 22:33:28 CEST 2012
[ 1.236259] fb: conflicting fb hw usage nouveaufb vs VESA VGA - removing generic driver
[ 1.237326] nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x0c8000a1
[ 1.237328] nouveau [ DEVICE][0000:01:00.0] Chipset: GF110 (NVC8)
[ 1.237329] nouveau [ DEVICE][0000:01:00.0] Family : NVC0
[ 1.238288] nouveau [ VBIOS][0000:01:00.0] checking PRAMIN for image...
[ 1.339740] nouveau [ VBIOS][0000:01:00.0] ... appears to be valid
[ 1.339741] nouveau [ VBIOS][0000:01:00.0] using image from PRAMIN
[ 1.339822] nouveau [ VBIOS][0000:01:00.0] BIT signature found
[ 1.339823] nouveau [ VBIOS][0000:01:00.0] version 70.10.17.00
[ 1.340038] nouveau [ MXM][0000:01:00.0] no VBIOS data, nothing to do
[ 1.362195] nouveau [ PFB][0000:01:00.0] RAM type: GDDR5
[ 1.362196] nouveau [ PFB][0000:01:00.0] RAM size: 1536 MiB
[ 1.378219] nouveau [ DRM][0000:01:00.0] VRAM: 1536MiB
[ 1.378221] nouveau [ DRM][0000:01:00.0] GART: 512MiB
[ 1.378224] nouveau [ DRM][0000:01:00.0] BIT BIOS found
[ 1.378226] nouveau [ DRM][0000:01:00.0] Bios version 70.10.17.00
[ 1.378229] nouveau [ DRM][0000:01:00.0] TMDS table version 2.0
[ 1.378231] nouveau [ DRM][0000:01:00.0] DCB version 4.0
[ 1.378233] nouveau [ DRM][0000:01:00.0] DCB outp 00: 02000300 00000000
[ 1.378235] nouveau [ DRM][0000:01:00.0] DCB outp 01: 01000302 00020030
[ 1.378237] nouveau [ DRM][0000:01:00.0] DCB outp 02: 04011380 00000000
[ 1.378239] nouveau [ DRM][0000:01:00.0] DCB outp 03: 08011382 00020030
[ 1.378241] nouveau [ DRM][0000:01:00.0] DCB outp 04: 02022362 00020010
[ 1.378243] nouveau [ DRM][0000:01:00.0] DCB conn 00: 00001030
[ 1.378246] nouveau [ DRM][0000:01:00.0] DCB conn 01: 00010130
[ 1.378248] nouveau [ DRM][0000:01:00.0] DCB conn 02: 00002261
[ 1.406991] nouveau [ DRM][0000:01:00.0] 0 available performance level(s)
[ 1.406993] nouveau [ DRM][0000:01:00.0] c: core 50MHz shader 101MHz memory 135MHz voltage 963mV fanspeed 40%
[ 1.421196] nouveau [ DRM][0000:01:00.0] MM: using COPY0 for buffer copies
[ 1.713751] nouveau [ DRM][0000:01:00.0] allocated 1920x1200 fb: 0x60000, bo ffff88031f77b400
[ 1.713938] fbcon: nouveaufb (fb0) is primary device
[ 1.741008] fb0: nouveaufb frame buffer device
[ 1.741010] [drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0
[ 1.741023] nouveau 0000:02:00.0: enabling device (0004 -> 0007)
[ 1.741572] nouveau [ DEVICE][0000:02:00.0] BOOT0 : 0x0c8000a1
[ 1.741574] nouveau [ DEVICE][0000:02:00.0] Chipset: GF110 (NVC8)
[ 1.741576] nouveau [ DEVICE][0000:02:00.0] Family : NVC0
[ 1.742577] nouveau [ VBIOS][0000:02:00.0] checking PRAMIN for image...
[ 1.752297] nouveau [ VBIOS][...

Read more...

Please,

Also fix those bugs for linux-nouveau 3.4.10

Best regards,

Eric

Hello,

Using external GTX 580 firmware helped !

Eric

Victor Zamanian (victorz) wrote :

I don't know if it's of any importance, but I thought I should mention that this still happens to me with an Nvidia GeForce GTX 580. I tried a while ago to boot up the (then) new Gnome 3 release Live CD, and it just froze immediately after automatically logging in and displaying the desktop. The mouse pointer was moving but nothing responded to clicks or drags, at all. I thought, then, that the Live CD was borked. I tried to re-download the image, try again, but it didn't work.

Then now with the newer versions of Ubuntu, I'm actually experiencing the same thing! I'm running Ubuntu GNOME Remix now, and I can't even boot into the installer without using a "nomodeset" argument to the boot line when booting. Then boot again with the same trick, install nvidia-* and all is fine after that.

Nouveau definitely doesn't look stable from my perspective. :-) Eagerly awaiting a solution to this. Thanks for your time.

Jochen (hajoeg) wrote :

I have the same problem but it occurs only rarely, may be once or twice from 100 log in. The only way to get out of it is a cold restart and eventually again two, three times. I have discovered it on two different machines, both are running on the GNOME 3 classic desktop. The problem was not existing under Lucid, it started with Precise LTS.

This bug persists. I consider it to be a pretty serious bug, too. Anyone with an nvc8 card who runs Gnome or Unity will have the card freeze up in seconds, although somewhat less frequently with KDE. Running firefox will immediately freeze the card. Ubuntu and Fedora both ship their liveCDs to enable nouveau by default and people cannot even run the installer for more than a minute, and most will have no idea what went wrong. Note that most nvc0 cards run just fine... I even have a nvc4 on another computer that runs perfectly, it is nvc8 that specifically has this problem, which very few (if any?) devs seem to posess. Feel free to google "nouveau gtx 580" to see that this is hitting a decent amount of people.

The card works just fine if using the extracted firmware, but this is a poor solution. I have been reading the various envytools/hwdocs on the fuc and been trying to investigate this but I have hit a wall, this issue is just too difficult for me to handle. I am pretty sure the solution is to do whatever Ben did to get the nvd7/nvd9 cards working, which looked like adding chipset specific firmware data to the loading code, but I don't know nearly enough to do this myself. If anyone has some advice, please let me know, I would really like to see this bug closed.

Created attachment 84279
kernel log of card freeze

Created attachment 84280
kernel log of card freeze

Another dmesg log of the freeze. Note that the read fault is not always at the same address.

Worth trying 3.11-rc6. A bunch of changes went into 3.11-rc1 related to register setup on nvc0+ cards.

These logs are from the nouveau git after those changes hit. I've been tracking the git changes pretty carefully and they looked promising but alas, didn't work.

This look like an older version of bug 54437. Mark it as a duplicate.

*** This bug has been marked as a duplicate of bug 54437 ***

*** Bug 45517 has been marked as a duplicate of this bug. ***

Changed in nouveau:
status: Confirmed → Invalid

(In reply to comment #3)
> The card works just fine if using the extracted firmware

Can you elaborate on how to do that? This sounds like a better solution than using the proprietary nvidia driver.

Changed in nouveau:
importance: High → Unknown
status: Invalid → Unknown
Changed in nouveau:
importance: Unknown → Critical
status: Unknown → Confirmed

*** Bug 81614 has been marked as a duplicate of this bug. ***

This bug is affecting me also, see the last duplicated bug. Any progress in fixing this? Maybe some help in testing (for ex.) required?

(In reply to comment #12)
> This bug is affecting me also, see the last duplicated bug. Any progress in
> fixing this? Maybe some help in testing (for ex.) required?

It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said that it does work just fine on his NVC8, although he has the less powerful versions. Could be something with high ROP/TPC/GPC counts not being handled. (Or multiple PARTs?)

That might actually be an interesting experiment -- before loading nouveau, mask out a bunch of the units and see if it helps. If it does, find the "breaking" point.

This is the code that computes that stuff:

http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.c#n1330

 priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16;
 priv->gpc_nr = nv_rd32(priv, 0x409604) & 0x0000001f;
 for (i = 0; i < priv->gpc_nr; i++) {
  priv->tpc_nr[i] = nv_rd32(priv, GPC_UNIT(i, 0x2608));
  priv->tpc_total += priv->tpc_nr[i];
 }

Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the per-GPC TPC counts).
Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it does, figure out which of the values matter and where the breaking points are.

If it doesn't help, perhaps the units need to be disabled a little harder, e.g. by setting 0x22584/0x22588.

(In reply to comment #13)

>
> It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said
> that it does work just fine on his NVC8, although he has the less powerful
> versions. Could be something with high ROP/TPC/GPC counts not being handled.
> (Or multiple PARTs?)
>
> That might actually be an interesting experiment -- before loading nouveau,
> mask out a bunch of the units and see if it helps. If it does, find the
> "breaking" point.
>
> This is the code that computes that stuff:
>
> http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.
> c#n1330
>
> priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16;
> priv->gpc_nr = nv_rd32(priv, 0x409604) & 0x0000001f;
> for (i = 0; i < priv->gpc_nr; i++) {
> priv->tpc_nr[i] = nv_rd32(priv, GPC_UNIT(i, 0x2608));
> priv->tpc_total += priv->tpc_nr[i];
> }
>
> Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the
> per-GPC TPC counts).
> Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it
> does, figure out which of the values matter and where the breaking points
> are.
>
> If it doesn't help, perhaps the units need to be disabled a little harder,
> e.g. by setting 0x22584/0x22588.

Can you describe more detailed what I need to do? I'm afraid I'm not so advanced at this moment to understand everything in your comment. Maybe not in comments but by e-mail <email address hidden>

*** Bug 81614 has been marked as a duplicate of this bug. ***

(In reply to comment #13)
> (In reply to comment #12)
> > This bug is affecting me also, see the last duplicated bug. Any progress in
> > fixing this? Maybe some help in testing (for ex.) required?
>
> It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said
> that it does work just fine on his NVC8, although he has the less powerful
> versions. Could be something with high ROP/TPC/GPC counts not being handled.
> (Or multiple PARTs?)
>
> That might actually be an interesting experiment -- before loading nouveau,
> mask out a bunch of the units and see if it helps. If it does, find the
> "breaking" point.
>
> This is the code that computes that stuff:
>
> http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.
> c#n1330
>
> priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16;
> priv->gpc_nr = nv_rd32(priv, 0x409604) & 0x0000001f;
> for (i = 0; i < priv->gpc_nr; i++) {
> priv->tpc_nr[i] = nv_rd32(priv, GPC_UNIT(i, 0x2608));
> priv->tpc_total += priv->tpc_nr[i];
> }
>
> Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the
> per-GPC TPC counts).
> Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it
> does, figure out which of the values matter and where the breaking points
> are.
>
> If it doesn't help, perhaps the units need to be disabled a little harder,
> e.g. by setting 0x22584/0x22588.

Heres the printed out values:
[ 3.185455] Rop nr: 6
[ 3.185457] Gpc nr: 4
[ 3.185460] Tpc nr for gpc 0: 4
[ 3.185463] Tpc nr for gpc 1: 4
[ 3.185466] Tpc nr for gpc 2: 4
[ 3.185469] Tpc nr for gpc 3: 4

I tried setting them all to 1, the card freezes pretty much immediately after logging into kwin (which is when I suspect opengl rendering starts), although oddly enough there was no read fault in the dmesg. I also tried setting them all to 2, and it froze pretty quickly too, and the machine became completely unrecoverable. Note that I also tried using the blob firmware with all values set to 2, so I think not having them at their natural amounts simply pisses the card off. Didn't try directly disabling stuff with 0x22584/0x22588, not entirely sure where I would do that even.

Created attachment 103833
dmesg from GPU lockup

I'm affected by this bug as well. Card is ASUS ENGTX580 DCII.

Distro: Arch Linux
X.Org: 1.16
mesa: 10.2.4
xf86-video-nouveau: 1.0.10

I have a Korean Monitor so there are some errors about missing EDID in the dmesg, but even without an xorg.conf nouveau detected the 2560x1440 resolution and Gnome 3 looked fine until it locked up.

The lockup happened, while I was typing a terminal in Gnome 3.12.2. Came out of nowhere. I attached my dmesg log.

I rebooted and got another lockup very quickly but this time without the long list of nouveau E[ PDISP] messages. Channel value is the same, but process is Xorg.bin not mutter-launch.

read fault at 0x4391800000 [PT_NOT_PRESENT] from PGRAPH/CTXCTL on channel 0x005fb79000 [Xorg.bin[909]]

My computer survived the night with the latest patchset that made it into 3.17, so I am marking this as fixed.

Changed in nouveau:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.