10de:0611 nouveau fails at suspend/resume - PAGE_NOT_PRESENT

Bug #1111884 reported by Patrik Lundquist on 2013-01-31
226
This bug affects 41 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
Nouveau Xorg driver
Confirmed
Critical
xserver-xorg-video-nouveau (Ubuntu)
Critical
Unassigned

Bug Description

Happens when resuming from standby. Can be reproduced. Every time.

nouveau E[ PFB][0000:05:00.0] trapped read at 0x002001a020 on channel 0x0001fae9 SEMAPHORE_BG/PFIFO_READ/00 reason: PAGE_NOT_PRESENT
nouveau E[ PFB][0000:05:00.0] trapped write at 0x002001a020 on channel 0x0001fae9 PFIFO/PFIFO_READ/SEMAPHORE reason: PAGE_NOT_PRESENT
nouveau E[ PGRAPH][0000:05:00.0] TRAP_TPDMA_2D - TP 0 - Unknown fault at address 00449b0000
nouveau E[ PGRAPH][0000:05:00.0] TRAP_TPDMA_2D - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00000000, e20: 00000011, e24: 0c030000
nouveau E[ PGRAPH][0000:05:00.0] TRAP
nouveau E[ PGRAPH][0000:05:00.0] ch 2 [0x001fae9000] subc 2 class 0x502d mthd 0x060c data 0x000004b0
nouveau E[ PFB][0000:05:00.0] trapped write at 0x00449b0000 on channel 0x0001fae9 PGRAPH/PROP/DST2D reason: PAGE_NOT_PRESENT
nouveau E[ PGRAPH][0000:05:00.0] magic set 0:
nouveau E[ PGRAPH][0000:05:00.0] 0x00408904: 0x20087701
nouveau E[ PGRAPH][0000:05:00.0] 0x00408908: 0x00449b00
nouveau E[ PGRAPH][0000:05:00.0] 0x0040890c: 0x80000432
nouveau E[ PGRAPH][0000:05:00.0] 0x00408910: 0x9b000000
nouveau E[ PGRAPH][0000:05:00.0] TRAP_TEXTURE - TP0: Unhandled ustatus 0x00000003
nouveau E[ PGRAPH][0000:05:00.0] TRAP

ProblemType: Crash
DistroRelease: Ubuntu 13.04
Package: xserver-xorg-core 2:1.13.2-0ubuntu1
ProcVersionSignature: Ubuntu 3.8.0-2.6-generic 3.8.0-rc4
Uname: Linux 3.8.0-2-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 2.8-0ubuntu3
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
CrashCounter: 1
Date: Thu Jan 31 22:17:49 2013
DistUpgraded: 2013-01-30 03:34:50,679 DEBUG enabling apt cron job
DistroCodename: raring
DistroVariant: ubuntu
ExecutablePath: /usr/bin/Xorg
ExtraDebuggingInterest: Yes
GraphicsCard:
 NVIDIA Corporation G92 [GeForce 8800 GT] [10de:0611] (rev a2) (prog-if 00 [VGA controller])
   Subsystem: XFX Pine Group Inc. Device [1682:2330]
MachineType: System manufacturer System Product Name
MarkForUpload: True
ProcCmdline: /usr/bin/X :0 -core -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch -background none
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-2-generic root=UUID=615f02d8-8bf1-4c07-bf26-9f8b2230d6d6 ro pcie_aspm=force
Signal: 6
SourcePackage: xorg-server
StacktraceTop:
 ?? () from /usr/lib/xorg/modules/libexa.so
 ?? () from /usr/lib/xorg/modules/libexa.so
 ?? () from /usr/lib/xorg/modules/libexa.so
 ?? () from /usr/lib/xorg/modules/libexa.so
 ?? () from /usr/lib/xorg/modules/libexa.so
Title: Xorg crashed with SIGABRT
UpgradeStatus: Upgraded to raring on 2013-01-30 (1 days ago)
UserGroups:

dmi.bios.date: 12/24/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0806
dmi.board.name: P5WD2-Premium
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0806:bd12/24/2009:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnP5WD2-Premium:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer
version.compiz: compiz 1:0.9.9~daily13.01.25-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.41-0ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.0.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.0.2-0ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.13.2-0ubuntu1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.1.0-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.20.19-0ubuntu3
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.6-0ubuntu2
xserver.bootTime: Thu Jan 31 22:30:57 2013
xserver.configfile: /etc/X11/xorg.conf
xserver.devices:
 input Power Button KEYBOARD, id 6
 input Power Button KEYBOARD, id 7
 input Microsoft Natural Keyboard Pro KEYBOARD, id 8
 input Microsoft Natural Keyboard Pro KEYBOARD, id 9
 input Microsoft Microsoft Basic Optical Mouse MOUSE, id 10
xserver.errors:
 Failed to load module "nvidia" (module does not exist, 0)
 Failed to load module "nvidia" (module does not exist, 0)
xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.13.2-0ubuntu1
xserver.video_driver: nouveau

Thank you for taking the time to report this crash and helping to make this software better. This particular crash has already been reported and is a duplicate of bug #1033533, so is being marked as such. Please look at the other bug report to see if there is any missing information that you can provide, or to see if there is a workaround for the bug. Additionally, any further discussion regarding the bug should occur in the other report. Please continue to report any other bugs you may find.

information type: Private → Public
tags: removed: need-amd64-retrace

Adding a log that is typical to the suspend/resume problem for nouveau.

tags: added: nouveau
affects: xorg-server (Ubuntu) → xserver-xorg-video-nouveau (Ubuntu)
summary: - Xorg crashed with SIGABRT due to nouveau
+ nouveau fails at suspend/resume

Ubuntu 3.8.0-4.8-generic 3.8.0-rc6 also fails.

Bryce Harrington (bryce) on 2013-02-19
description: updated
summary: - nouveau fails at suspend/resume
+ nouveau fails at suspend/resume - PAGE_NOT_PRESENT
description: updated

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: New → Confirmed
Hendrik Knackstedt (hennekn) wrote :

Having this issue with a nVidia GeForce 9300M GS under Ubuntu Precise 12.04 with upstream kernel 3.8.2.

Maarten Lankhorst (mlankhorst) wrote :

I hate to ask for a retest because nouveau in linux 3.9 broke various times, but could you retest with a recent 3.9 git kernel?

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Confirmed → Incomplete

Still broken on Linux 3.9.0-rc5 (commit 66ade474237745a57b7e87da9a93c7ec69fd52bb to be specific).

Tried it again with nouveau.debug=trace but it logged a surprisingly small amount after the MMIO write error. Probably froze. I couldn't switch to a virtual console but DPMS(?) worked strangely enough (monitor both went to sleep and woke up).

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Incomplete → Confirmed

Got a slightly better log with Linux 3.9.0-rc6 (I'll attribute it to luck since not much has changed since rc5). Display froze. Resume begins at 174.379193 and errors at 2369.924009.

The X server seems to initially survive in a semi-frozen state, where the mouse pointer moves but nothing else, and the PAGE_NOT_PRESENT errors start to appear after I try to switch to a virtual console.

X crashes and I eventually get a frozen console.

Viktor Mileikovskyi (v-mil) wrote :

Workaround options for hybrid video systems:

Option 1. In BIOS switch to use only embedded video card (for example UMA only for Lenovo Ideapad Z580A).

Option 2. In BIOS use Optimus video and blacklist nouveau:

sudo gedit /etc/modprobe.d/blacklist.conf

Add a line to the end of file:

blacklist nouveau

Save, close file and restart. (sudo rmmod nouveau does not helpful becaude nouveau is in use). Now if required optimus graphic You can run (and disable suspend until next reboot because rmmod does not work):

sudo modprobe nouveau

Juha Luoma (jsluoma) wrote :

Same problem with up to date 64-bit 13.04 on Dell Latitude E6400 with

01:00.0 VGA compatible controller: NVIDIA Corporation G98M [Quadro NVS 160M] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Dell Device 0233
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f2000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at df00 [size=128]
        [virtual] Expansion ROM at f4000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau

Florian Fainelli (f-fainelli) wrote :

I am also getting this error after suspend/resume, always reproducible after a suspend/resume cycle and then trying to play a video with VLC. I could also get it to be reproduced by running Steam. Here is my graphics card:

01:00.0 VGA compatible controller: NVIDIA Corporation G86M [Quadro NVS 140M] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Lenovo ThinkPad T61
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at d4000000 (64-bit, non-prefetchable) [size=32M]
        Region 5: I/O ports at 2000 [size=128]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: nouveau

Here is my configuration:
- kernel: 3.8.0-26-generic kernel
- xserver-xorg-video-nouveau: 1:1.0.7-0ubuntu1
- xserver-xorg: 1:7.7+1ubuntu4
- xserver-xorg-core: 2:1.13.3-0ubuntu6

Florian Fainelli (f-fainelli) wrote :

Has this issue been reported upstream?

Created attachment 83798
dmesg output from boot to attempted resume

On a Dell Latitude E6400 with nVidia G98M and 3.10 kernel I find that the machine will reproducibly resume from suspend with a non-responsive X session. Given attempts I can eventually get to a somewhat functional VT. dmesg output attached.

lspci gives the following device details,

01:00.0 VGA compatible controller: NVIDIA Corporation G98M [Quadro NVS 160M] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: Dell Device 0233
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
 Memory at e0000000 (64-bit, prefetchable) [size=256M]
 Memory at f2000000 (64-bit, non-prefetchable) [size=32M]
 I/O ports at df00 [size=128]
 [virtual] Expansion ROM at f4000000 [disabled] [size=128K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 Capabilities: [100] Virtual Channel
 Capabilities: [128] Power Budgeting <?>
 Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
 Kernel driver in use: nouveau

Created attachment 83799
End of log from resume with debug=trace

It seems 3D/compositing might be to blame here; suspending while running metacity seems to resume correctly.

The same issue can be reproduced with 3.8.

Interestingly, 3.5 seems to work correctly, even with compiz.

From discussions with xexaxo in #nouveau, it seems that this might be a similar regression to what happened in 3.8 (described in bug #59057). My plan is as follows,

 1) Verify that 3.8.11 works
 1.1) If not, verify that 3.8.1 works and bisect to find the broken release
 1.2) If so, check whether 3.9 works
 2) Bisect backwards from the first broken release to 3.8.11 (or whichever release was tested to work)

The 3.8.11 kernel fails in the same way that 3.10 does, trying 3.8.1 next.

Rereading bugs #59057 and #62835, it's not entirely clear whether the bug was actually ever fixed; it may be that the reporter simply worked around it. Comment #33 of Bug #59057 (https://bugs.freedesktop.org/show_bug.cgi?id=59057#c33) actually refers to commit e5a58edc94a20a7ef4b7db67c166c4ca0588bad0 (46c13c131d3b73080aa0f50f45e834a9ab3c0e71 in Linus's tree) as working. Going to test this and surrounding commits to verify this.

Tested 46c13c131d3b73080aa0f50f45e834a9ab3c0e71. Things appear to fail in a similar way to the resume failure upon starting compiz.

One factor that I've neglected to mention thusfar is that I've been using my own mesa build in the above tests,

    $ glxinfo
    ...
    OpenGL version string: 2.1 Mesa 9.2.0-devel (git-5a7bdd4)

After reverting to Ubuntu's packaged mesa,

    $ glxinfo
    ...
    OpenGL version string: 2.1 Mesa 9.1.4

Resume appears to work as expected.

The mesa tests mentioned above were conducted on a 3.10 kernel. In the mesa 9.1 case there is nothing interesting spit out by nouveau to dmesg. Only a few status messages,

    [ 312.871019] nouveau [ DRM] suspending fbcon...
    [ 312.871023] nouveau [ DRM] suspending display...
    [ 312.871049] nouveau [ DRM] unpinning framebuffer(s)...
    [ 312.871108] nouveau [ DRM] evicting buffers...
    [ 313.133858] nouveau [ DRM] waiting for kernel channels to go idle...
    [ 313.133883] nouveau [ DRM] suspending client object trees...
    [ 313.134682] nouveau [ DRM] suspending kernel object tree...
    ...
    [ 317.178155] nouveau [ DRM] re-enabling device...
    [ 317.178170] nouveau [ DRM] resuming kernel object tree...
    [ 317.178176] nouveau [ VBIOS][0000:01:00.0] running init tables
    [ 317.287651] serial 00:08: activated
    [ 317.357172] nouveau [ DRM] resuming client object trees...
    [ 317.357691] nouveau [ DRM] resuming display...

I can confirm that mesa 9.1.4 on a 3.10 kernel can successfully resume, even while running glxgears on compiz. The messages mentioned in Comment 12 are the only things produced by nouveau in dmesg.

Confirmed that 72916698b056d0559263e84372bb45cd83a1c2c2 is bad. Unfortunately this is a merge base. Here is the bisection log,

    git bisect start
    # bad: [5a7bdd4b4173958c53109517b7c95f1039623e7e] docs: Add items for GL4.4
    git bisect bad 5a7bdd4b4173958c53109517b7c95f1039623e7e
    # good: [e64febb4b71475b35765f0dc168df22655444a7f] docs: 9.1.4 release notes
    git bisect good e64febb4b71475b35765f0dc168df22655444a7f
    # bad: [72916698b056d0559263e84372bb45cd83a1c2c2] r600g: fix segfault with old kernel
    git bisect bad 72916698b056d0559263e84372bb45cd83a1c2c2

Unfortunately I'm now having trouble reproducing the working conditions with mesa 9.1.4.

Returning to mapping out kernel version. It appears that 3.6 works correctly.

It seems that a 3.7 kernel will resume correctly as well.

Confirmed that a clean 3.8 build exhibits the issue.

Starting a bisection between 3.7 and 3.8.

Linux commit 992956189de58cae9f2be40585bc25105cd7c5ad is bad.

(In reply to comment #18)
> Linux commit 992956189de58cae9f2be40585bc25105cd7c5ad is bad.

That seems thoroughly unlikely.

commit 992956189de58cae9f2be40585bc25105cd7c5ad
Author: Eric W. Biederman <email address hidden>
Date: Mon Dec 17 17:19:36 2012 -0800

    efi: Fix the build with user namespaces enabled.

This doesn't apply to your situation on many levels... this is a build fix... to efi vars...

I bet if you checkout to 992956189de58cae9f2be40585bc25105cd7c5ad^ then you will still have a bad kernel. You should probably redo the bisect, but look at the bisect log (git bisect log) and keep all your "bad" commits, since they are likely indeed bad. But you may have been a bit too eager in calling out the "good" kernels. (See git help bisect for how to start with a bunch of bad commits.) While you're at it, you may want to re-test whether 3.7 really is good.

@Ilia, I should have been more specific. These next few comments are largely just notes recording the state of my bisection. By "bad" I mean that I have tested the commit and it exhibits the issue, not that it is the first bad commit. Currently I have around 10 more bisection steps to go before the culprit is hopefully identified.

Linux commit 2b8318881ddbcb67c5e8d2178b42284749442222 appears to work.

Linux kernel 3c2e81ef344a90bb0a39d84af6878b4aeff568a2 exhibits the issue.

640631d04cd2cfbb4792d6a8fc5fcab14ee273a5 is bad.
9fabd4eedeb904173d05cb1ced3c3e6b9d2e8137 is good.

74b6685089591fa275929109f7b839bf386890a0 is good.
bd3b49f25a3eae2d91432247b7565489120b6bcf is bad.

2d8b9ccbcee694c9ce681ec596df642e52ddcb15 is bad.
b6e4ad200a726a32c7083f491383713bc8680f86 is good.

47057302f075578618ea36fc3c4c97a5a6f97f00 is good.
4f6029da58ba9204c98e33f4f3737fe085c87a6f is bad.

647bf61d0399515c526c125450cadaade79b1988 is good.
f9887d091149406de5c8b388f7e0bb6932dd621b is good.

Created attachment 83940
Bisection log between v3.7 and v3.8

According to the bisection,

4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit
commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f
Author: Ben Skeggs <email address hidden>
Date: Fri Nov 16 11:54:31 2012 +1000

    drm/nv50-nvc0: switch to common disp impl, removing previous version

    Signed-off-by: Ben Skeggs <email address hidden>

:040000 040000 9daeb0bd5ed3e9b22b53c21fab853bd2e392f6ed 4bdbb1d96e57d3f254affb8812788f04b7474bf7 M drivers

Created attachment 83942
dmesg output from successful suspend/resume with 4f6029da^

Created attachment 83943
dmesg output from a few failed suspend/resume attempts with 4f6029da

In this case I logged in and suspended the machine. Upon resuming, the X session was frozen with the cursor being updated occassionally. After several attempts I was able to get to a VT. Shortly thereafter the X server died, causing lightdm to respawn a greeter which functioned correctly, presumably because it doesn't require acceleration. I could log in again, also the X session would freeze before getting to a functional desktop (presumably upon compiz starting). Again, with a few tries I could get back to a VT, at which point lightdm would start a greeter. With every freeze a "failed to idle channel 0xcccc0000" message would be dumped to dmesg.

One idea I just randomly had was that there might be a difference in the teardown process. For example, in the removed code, nv50_display_fini did stuff. In the current code, it's basically empty (well, some small bits in nouveau_display_fini).

It looks like the old code

(a) blanked each crtc
(b) sent out a EVO_UPDATE command
(c) waited for each crtc to hit a vblank
(d) did something with the cursor (cleared it?)
(e) waited for some sort of DPMS thing

It could well be that this now happens elsewhere, but I just wanted to put that thought down on "paper".

tags: added: saucy

This bug is listed in the upstream bug tracker.

Changed in nouveau:
importance: Undecided → Unknown
status: New → Unknown

Since this bug renders the system temporarily or permanently unusable, it has a priority of 'critical'.

Changed in xserver-xorg-video-nouveau (Ubuntu):
importance: Undecided → Critical

Since this bug is:

- Valid.
- Well described.
- Reported in the upstream bug tracker (FreeDesktop).
- Ready to be worked on by a developer.

it's also triaged.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Confirmed → Triaged

Since the root cause of this bug is the Nouveau Xorg driver, it's unrelated to Linux.

Changed in linux:
status: New → Invalid
T_W (walshtc) wrote :

Also seeing issue with Ubuntu 13.10 (Saucy Salamander) Final Beta (AMD64)

Changed in nouveau:
importance: Unknown → Critical
status: Unknown → Confirmed

@jhoechtl: What makes you think the bug you linked is related?

I've updated to the "fixed" libdrm in Ubuntu 13.10 (2.4.46-1ubuntu1), and I still have problems with nouveau on resume. I no longer get the PAGE_NOT_PRESENT nouveau error, but my X server still hangs on resume. I get some new errors in my Xorg.0.log. I'll past a snippet below, but perhaps this should be the basis of a new bug report?

[ 1995.657] setversion 1.4 failed
(EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x49) [0xb77000b9]
(EE) 1: /usr/bin/X (mieqEnqueue+0x213) [0xb76e0113]
(EE) 2: /usr/bin/X (QueuePointerEvents+0x6d) [0xb75b41ad]
(EE) 3: /usr/bin/X (xf86PostMotionEventM+0x24b) [0xb75ed84b]
(EE) 4: /usr/bin/X (xf86PostMotionEvent+0xaa) [0xb75ed9aa]
(EE) 5: /usr/lib/xorg/modules/input/synaptics_drv.so (0xb6a48000+0x434d) [0xb6a4c34d]
(EE) 6: /usr/lib/xorg/modules/input/synaptics_drv.so (0xb6a48000+0x5e81) [0xb6a4de81]
(EE) 7: /usr/bin/X (0xb7560000+0x7ce85) [0xb75dce85]
(EE) 8: /usr/bin/X (0xb7560000+0xa7d5b) [0xb7607d5b]
(EE) 9: (vdso) (__kernel_sigreturn+0x0) [0xb753d400]
(EE) 10: (vdso) (__kernel_vsyscall+0x10) [0xb753d424]
(EE) 11: /lib/i386-linux-gnu/libc.so.6 (ioctl+0x19) [0xb720dd09]
(EE) 12: /usr/lib/i386-linux-gnu/libdrm.so.2 (drmIoctl+0x40) [0xb74309e0]
(EE) 13: /usr/lib/i386-linux-gnu/libdrm.so.2 (drmCommandWrite+0x3c) [0xb74333dc]
(EE) 14: /usr/lib/i386-linux-gnu/libdrm_nouveau.so.2 (nouveau_bo_wait+0xa5) [0xb6c53ee5]
(EE) 15: /usr/lib/xorg/modules/drivers/nouveau_drv.so (0xb6c5a000+0x9a57) [0xb6c63a57]
(EE) 16: /usr/lib/xorg/modules/drivers/nouveau_drv.so (0xb6c5a000+0xa079) [0xb6c64079]
(EE) 17: /usr/bin/X (DRI2SwapBuffers+0x366) [0xb76d0bc6]
(EE) 18: /usr/bin/X (0xb7560000+0x172498) [0xb76d2498]
(EE) 19: /usr/bin/X (0xb7560000+0x3c48d) [0xb759c48d]
(EE) 20: /usr/bin/X (0xb7560000+0x2a52a) [0xb758a52a]
(EE) 21: /lib/i386-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0xb713e905]
(EE) 22: /usr/bin/X (0xb7560000+0x2a908) [0xb758a908]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause. It is a victim.
(EE) [mi] EQ overflow continuing. 100 events have been dropped.
... (more here) ...

After a few more tests: I still get the same PAGE_NOT_PRESENT errors, as well as a subsequent NULL pointer kernel oops. This occurs both with the new Ubuntu package and with a manually-compiled upstream libdrm. So the mesa/libdrm update does not solve this bug.

After thinking about this, this problem can't be solely a user-space issue; user space should not be able to trigger this kind of kernel oops. Plus, Ben Gamari was already able to bisect this within the kernel. So at least some part of this is related to the kernel-space nouveau component.

No, libdrm 2.4.46-1ubuntu1 doesn't solve the problem. Linux kernel 3.7 is still the last working one.

Not working here either on quadro nvs 160m, libdrm 2.4.46.

Bug still present on 3.13-rc3

(In reply to comment #32)
> One idea I just randomly had was that there might be a difference in the
> teardown process. For example, in the removed code, nv50_display_fini did
> stuff. In the current code, it's basically empty (well, some small bits in
> nouveau_display_fini).
>
> It looks like the old code
>
> (a) blanked each crtc
> (b) sent out a EVO_UPDATE command
> (c) waited for each crtc to hit a vblank
> (d) did something with the cursor (cleared it?)
> (e) waited for some sort of DPMS thing
>
> It could well be that this now happens elsewhere, but I just wanted to put
> that thought down on "paper".

I tried this idea by doing the following:

1) Checked out 4f6029da58ba9204c98e33f4f3737fe085c87a6f^1 (= f9887d091149406de5c8b388f7e0bb6932dd621b)
2) Deleted everything in nv50_display_fini

With that change suspend/resume works so I guess the problem is elsewhere.

Some new observations made while investigating this issue:

* Without X started suspend/resume works fine
* With NoAccel set in xorg conf file suspend/resume works fine
* If X is stopped before suspending the resume works ok but if I try to start X a second time after resume, X hangs.

How do I make NoAccel set in xorg conf. Do i make a file in /usr/share/X11/xorg.conf.d? Also from what I have read NoAccel means no 2D acceleration. Doesn't that affect performance drasticaly?

tags: added: latest-bios-0709
summary: - nouveau fails at suspend/resume - PAGE_NOT_PRESENT
+ 10de:0611 nouveau fails at suspend/resume - PAGE_NOT_PRESENT

What does latest-bios-0709 and 10de:0611 mean in this context?

I believe 10de:0611 is the associated PCI ID of the NVIDIA GPU on which this problem was seen. You can check yours with something like lspci -nn

I'm not sure about latest-bios-0709. I assume it helps with tracking.

BTW, I've also see this problem on a PCI ID of 10de:0659:

$ lspci -nn | grep NVIDIA
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation G96GL [Quadro FX 580] [10de:0659] (rev a1)

Tim Gehpunkt (rollercoaster) wrote :

Still happens on current 13.10, all update applied. Sad, after one year...

02:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 520] (rev a1)
~# uname -a
Linux Artemis 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:16:27 UTC 2013 i686 athlon i686 GNU/Linux

Re-tested on drm-next, at:

commit ef64cf9d06049e4e9df661f3be60b217e476bee1
Merge: 279b9e0cc300 f3980dc50c51
Author: Dave Airlie <email address hidden>
Date: Thu Jan 30 10:46:06 2014 +1000

    Merge branch 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next

Still reproducible:

  nouveau E[Xorg[1140]] failed to idle channel 0xcccc0000 [Xorg[1140]]
  nouveau E[ PFB][0000:01:00.0] trapped read at 0x002001e020 on channel 0x0001fb14 [unknown] SEMAPHORE_BG/PFIFO_READ/00 reason: PAGE_NOT_PRESENT

I've been following this ticket and attempting to poke around a bit. I just tested my hardware with various points in the 3.6, 3.7, and 3.7-rc kernels, and all of those still gave me a non-responsive screen with messages like the following after resume. e.g., on Linux 3.6:

[ 161.192867] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
[ 164.355897] [drm] nouveau 0000:01:00.0: Failed to idle channel 4.

or on Linux 3.7.9:

[ 336.337207] nouveau E[ 3134] failed to idle channel 0xcccc0000

None of these builds give me a PAGE_NOT_PRESENT error, though. This makes it hard to bisect, as I can't find any working point to test...

My hardware:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218M [NVS 3100M] [10de:0a6c] (rev a2) (prog-if 00 [VGA controller])
 Subsystem: Dell Device [1028:040a]
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
 Memory at d0000000 (64-bit, prefetchable) [size=256M]
 Memory at e0000000 (64-bit, prefetchable) [size=32M]
 I/O ports at 7000 [size=128]
 Expansion ROM at e3000000 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 Capabilities: [b4] Vendor Specific Information: Len=14 <?>
 Capabilities: [100] Virtual Channel
 Capabilities: [128] Power Budgeting <?>
 Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
 Kernel driver in use: nouveau

(In reply to comment #37)
> I've been following this ticket and attempting to poke around a bit. I just
> tested my hardware with various points in the 3.6, 3.7, and 3.7-rc kernels,
> and all of those still gave me a non-responsive screen with messages like
> the following after resume. e.g., on Linux 3.6:

Do you have any reason to believe that you have the same problem? This one was bisected to commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f which appeared in v3.8 (which means that 3.7.x should all be fine).

Of course you have the additional problem of having a nva8 (this bug has nv98 hw users, although that doesn't exclude you from having the same issue -- the bisected commit was fairly generic), which is still unstable for some users, but was even more unstable in earlier kernels. As for not seeing a PAGE_NOT_PRESENT -- are you sure that the kernels in question had code to emit the error in the first place?

(In reply to comment #38)
> Do you have any reason to believe that you have the same problem? This one
> was bisected to commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f which
> appeared in v3.8 (which means that 3.7.x should all be fine).

Not necessarily, although initially the bug symptoms were rather similar, and it's a similar family of hardware. I'm only now checking the bisection myself, and it seems that that particular commit is not my only problem.

> Of course you have the additional problem of having a nva8 (this bug has
> nv98 hw users, although that doesn't exclude you from having the same issue
> -- the bisected commit was fairly generic), which is still unstable for some
> users, but was even more unstable in earlier kernels.

Well, that could complicate my ability to fix things here. I suspect that this particular regression is one of several issues on my hardware, then.

> As for not seeing a
> PAGE_NOT_PRESENT -- are you sure that the kernels in question had code to
> emit the error in the first place?

The code seems to be present. For instance, I'm trying 3.6, where I see nv50_fb_vm_trap() (drivers/gpu/drm/nouveau/nv50_fb.c) has the same "VM: trapped write at 0x...." log message. So I presume that if I was still experiencing the fault in 3.6, it would appear in the log.

BTW, I noticed Ben Gamari's earlier comments about mesa versioning, so I downgraded to 9.1.4, and I still experience the same behavior.

Viktor Mileikovskyi (v-mil) wrote :

It is appears that the bug is FIXED in Ubuntu 13.10 x64. (upgraded from 13.04 x64).
lspci -k said that the Nouveau is active.
Suspend and resume was successful twice.
Tested on Lenovo Ideapad Z580A
With best regards.
Viktor.

Florian Fainelli (f-fainelli) wrote :

I am still able to get that bug to pop up on 13.10. The same sequence as before: play a video with VLC, suspend, resume, then play a video again will make the system crash with the same "PAGE_NOT_PRESENT" error here.

Anders (eddiedog988) on 2014-03-13
Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Triaged → Confirmed
Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Confirmed → Triaged
Tobias Krais (tux-spam) wrote :

Problem exists in trusty, too. ( 1:1.0.10-1ubuntu2)

tags: added: precise trusty
Jochen Fahrner (jofa) wrote :

Is there any progress in this critical bug? My MacBook 5,1 became mostly unusable after upgrade to Trusty because of Bug #1319899. Neither nouveau nor nvidia are working reliable in Trusty. Since this critical bug is unsolved for more than a year, I have no hope that it will ever be fixed. If there is no chance to get my graphics working reliable in Trusty, I will have to downgrade to Precise, which was working flawless for me.

Jochen, try the kernel in http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7.10-raring/ or the last official 3.5 kernel from Quantal. I don't think you have to downgrade to Precise.

Jochen Fahrner (jofa) wrote :

Thank you Patrik! Kernel 3.7.10 works. But this kernel does not get security updates, right?

I hit this bug upon upgrade from Ubuntu 12.04 to 14.04. I can confirm nouveau is working in Ubuntu kernel 3.7.10 and it freezes on resume in 3.13.0.
My hardware:

02:00.0 VGA compatible controller: NVIDIA Corporation C79 [GeForce 9400M] (rev b1) (prog-if 00 [VGA controller])
 Subsystem: Apple Inc. MacBook5,1
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at d2000000 (32-bit, non-prefetchable) [size=16M]
 Memory at c0000000 (64-bit, prefetchable) [size=256M]
 Memory at d0000000 (64-bit, prefetchable) [size=32M]
 I/O ports at 1000 [size=128]
 Expansion ROM at d3000000 [disabled] [size=128K]
 Capabilities: [60] Power Management version 2
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Kernel driver in use: nouveau

[ 3.918406] nouveau 0000:02:00.0: setting latency timer to 64
[ 3.919370] nouveau [ DEVICE][0000:02:00.0] BOOT0 : 0x0ac180b1
[ 3.919374] nouveau [ DEVICE][0000:02:00.0] Chipset: MCP79/MCP7A (NVAC)
[ 3.919376] nouveau [ DEVICE][0000:02:00.0] Family : NV50
[ 3.930101] nouveau [ VBIOS][0000:02:00.0] checking PRAMIN for image...
[ 3.995688] nouveau [ VBIOS][0000:02:00.0] ... appears to be valid
[ 3.995692] nouveau [ VBIOS][0000:02:00.0] using image from PRAMIN
[ 3.995827] nouveau [ VBIOS][0000:02:00.0] BIT signature found
[ 3.995831] nouveau [ VBIOS][0000:02:00.0] version 62.79.40.00
[ 4.069047] nouveau [ MXM][0000:02:00.0] no VBIOS data, nothing to do
[ 4.147303] nouveau [ PFB][0000:02:00.0] RAM type: stolen system memory
[ 4.147309] nouveau [ PFB][0000:02:00.0] RAM size: 256 MiB
[ 4.938049] nouveau [ DRM] VRAM: 256 MiB
[ 4.938053] nouveau [ DRM] GART: 512 MiB
[ 4.938057] nouveau [ DRM] BIT BIOS found
[ 4.938061] nouveau [ DRM] Bios version 62.79.40.00
[ 4.938065] nouveau [ DRM] TMDS table version 2.0
[ 4.938068] nouveau [ DRM] DCB version 4.0
[ 4.938071] nouveau [ DRM] DCB outp 00: 01000123 00010014
[ 4.938073] nouveau [ DRM] DCB outp 01: 02021232 00000010
[ 4.938076] nouveau [ DRM] DCB outp 02: 02021286 0f220010
[ 4.938078] nouveau [ DRM] DCB conn 00: 00000040
[ 4.938081] nouveau [ DRM] DCB conn 01: 0000a146
[ 6.473715] nouveau [ DRM] 4 available performance level(s)
[ 6.473720] nouveau [ DRM] 0: core 100MHz shader 200MHz voltage 900mV fanspeed 100%
[ 6.473724] nouveau [ DRM] 1: core 150MHz shader 300MHz voltage 900mV fanspeed 100%
[ 6.473728] nouveau [ DRM] 2: core 350MHz shader 800MHz voltage 900mV fanspeed 100%
[ 6.473731] nouveau [ DRM] 3: core 450MHz shader 1100MHz voltage 1010mV fanspeed 100%
[ 6.473734] nouveau [ DRM] c:
[ 6.498971] nouveau [ DRM] MM: using M2MF for buffer copies
[ 6.593834] nouveau [ DRM] allocated 1280x800 fb: 0x50000, bo ffff88013692ac00
[ 6.593930] fbcon: nouveaufb (fb0) is primary device
[ 6.862546] fb0: nouveaufb frame buffer device
[ 6.863165] [drm] Initialized nouveau 1.1.0 20120801 for 0000:02:00.0 on minor 0

Jochen, no it does not get security updates. I don't know if Canonical still updates their 3.5 kernel. You can get an updated 3.4 from https://www.kernel.org/ if that works for you.

(In reply to comment #40)
> I hit this bug upon upgrade from Ubuntu 12.04 to 14.04. I can confirm
> nouveau is working in Ubuntu kernel 3.7.10 and it freezes on resume in
> 3.13.0.
> My hardware:
>
> 02:00.0 VGA compatible controller: NVIDIA Corporation C79 [GeForce 9400M]
> (rev b1) (prog-if 00 [VGA controller])
> Subsystem: Apple Inc. MacBook5,1

Did you verify that the same commit is responsible? If not, please do a bisect (you can cheat and just assume this is the same issue and test the commit and its parent). If it's the same commit, please let us know. If not, open a separate issue.

Ilia, I don't know how to do that.

(In reply to comment #42)
> Ilia, I don't know how to do that.

Use your favourite search engine to see how to use 'git bisect'. You may even be able to find some guide specific to your distro. If you can't narrow the problem down, we definitely won't be able to help.

I'm sorry Ilia, I'm a normal user, no kernel developer.

(In reply to comment #44)
> I'm sorry Ilia, I'm a normal user, no kernel developer.

If you're unable/unwilling/whatever-the-reason to do some amount of debugging, you will be best-served by your distribution's support channels.

Jochen Fahrner (jofa) wrote :

Look at Ilia Mirkins answer. Looks like freedesktop.org is only cooperating with kernel developers. Normal users have to contact their distributions support channel. But I see no activities by Canonical to solve this issue. Now I understand why many people don't switch to linux :-(

Jochen Fahrner (jofa) wrote :

I have to correct myself. Kernel 3.7.10 does not solve the issues with nouveau. The only difference is: it does not always freeze after resume, but sometimes it also does. With later kernel version it always freezes.

In many years as a Ubuntu user my experience is, that the free graphic drivers (nouveau, radeon) never worked satisfying, and this has not changed until now. The proprietary drivers always worked better, and I hope they gain control of the problems with nvidia driver.

At the moment I think it's best to downgrade to 12.04 and wait for a half year or a year.

I tried to fix this a few months ago but failed. If someone with the right skills and time want to have a look at this problem, I'd be happy to give away a laptop with this chipset. Shipment cost on me. I can prepare a linux installation with sources of the kernel at the regression point. Contact me if you are up to the task.

(In reply to comment #46)
> I tried to fix this a few months ago but failed. If someone with the right
> skills and time want to have a look at this problem, I'd be happy to give
> away a laptop with this chipset. Shipment cost on me. I can prepare a linux
> installation with sources of the kernel at the regression point. Contact me
> if you are up to the task.

Don't do that.

Read here: https://wiki.ubuntu.com/Kernel/KernelBisection
If you can follow those Ubuntu-specific instructions, that will narrow down the problem.

If you can't follow those instructions, have you filed a Launchpad bug? If so, please post the link or bug # (double-check that it's a public bug or say it's private).

(In reply to comment #47)
> (In reply to comment #46)
> > I tried to fix this a few months ago but failed. If someone with the right
> > skills and time want to have a look at this problem, I'd be happy to give
> > away a laptop with this chipset. Shipment cost on me. I can prepare a linux
> > installation with sources of the kernel at the regression point. Contact me
> > if you are up to the task.
>
> Don't do that.
>
> Read here: https://wiki.ubuntu.com/Kernel/KernelBisection
> If you can follow those Ubuntu-specific instructions, that will narrow down
> the problem.
>
> If you can't follow those instructions, have you filed a Launchpad bug? If
> so, please post the link or bug # (double-check that it's a public bug or
> say it's private).

The problem is already bisected. The commit that introduces the regression switches nv50 to use nvc0's disp implementation. The commit is basically deleting all the nv50 code and changing a few function pointers to use the nvc0 implementation. I tried pin pointing what the problem was (see comment 34) but I was not able to fix the problem.

(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > I tried to fix this a few months ago but failed. If someone with the right
> > > skills and time want to have a look at this problem, I'd be happy to give
> > > away a laptop with this chipset. Shipment cost on me. I can prepare a linux
> > > installation with sources of the kernel at the regression point. Contact me
> > > if you are up to the task.
> >
> > Don't do that.
> >
> > Read here: https://wiki.ubuntu.com/Kernel/KernelBisection
> > If you can follow those Ubuntu-specific instructions, that will narrow down
> > the problem.
> >
> > If you can't follow those instructions, have you filed a Launchpad bug? If
> > so, please post the link or bug # (double-check that it's a public bug or
> > say it's private).
>
> The problem is already bisected. The commit that introduces the regression
> switches nv50 to use nvc0's disp implementation. The commit is basically
> deleting all the nv50 code and changing a few function pointers to use the
> nvc0 implementation. I tried pin pointing what the problem was (see comment
> 34) but I was not able to fix the problem.

I saw Ben Gamari's bisection log (from 3.7->3.8), but I didn't realize that you had duplicated the bisection; I only saw your bump 4 months later, on 3.13.

The OP reported this for a Dell Latitude E6400 with G98; are you running similar hardware?

>
> I saw Ben Gamari's bisection log (from 3.7->3.8), but I didn't realize that
> you had duplicated the bisection; I only saw your bump 4 months later, on
> 3.13.
>
> The OP reported this for a Dell Latitude E6400 with G98; are you running
> similar hardware?

HW is not identical. I have a Dell XPS 1330 with 8400M GS (10de:0427)

I originially reported "my" problem in bug 62835. After bisection I found this issue with the same offending commit and identical symptom.

Donation offer still valid. I have two of these machines laying around collecting dust.

I have this problem also using the nouveau driver in Ubuntu 14.04.1 LTS 64-bit. I have a Dell D630C with an nvidia 135M graphics card. After power management puts the laptop to sleep, the laptop becomes unusable after resuming. The login screen appears and the mouse moves for a short time, then freezes. Nothing else works,but to completely power down and restart the computer. I would love to see a fix for this issue. Has there been any further progress?

To anyone looking to pile on with a "me too" comment: Only do so *AFTER* verifying that commit 4f6029da is the first bad commit for you.

[Also, check that it's still happening with the latest kernel... 3.17 at the time of writing.]

BobbyJ: I'm guessing you didn't do that. File your own bug with all the relevant info, and we can take it from there.

My bug https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1378881 was marked as a duplicate of this one.

The nouveau driver does not work, but the nvidia drivers work as long as I don't use 3D stuff like steam games like half life 2. The moment I fire up a 3D game it gets weird colors and a frozen screen. Keyboard frozen also.

Now that 14.10 is out, maybe the programmers will have more time to fix this bug.

Tim (tburnett80) wrote :

I’m experiencing this with both 14.04 and 15.04

I have tried just Ubuntu installs, and dual boot with Windows 8.1 and it always affects Ubuntu but never any hiccups on win.

Can any Canonical people or anybody else tell us what the status of this bug is. The last activity is from June 2015 and a year before that in November 2014.

This bug still affects me and I know we would all like it fixed. I know the programmers are busy. Just would like some information on the status of this bug and where it currently stands.

What's the status of this bug. The last activity is from a year ago on November 2014. I think I'm affected by this bug.

I filed this bug report at Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1378881

Which was marked as a duplicate of this bug:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1111884

My desktop is unusable. I know your all busy with life, but a little reassurance that this is going to get fixed would be appreciated.

Thanks, Nate.

P.S. And if I can help out anyway I will. I used to build from source years ago before Git came around, but I could learn fast on how to apply patches from source if necessary if you teach me.

John Lyons (nsnoc) wrote :

This bug appears with several variations on a theme on several sites.

The most common workaround of adding nouveau.nofbaccel=1 to /etc/default/grub as a kernel command line didn't work for me.

However

nouveau.noaccel=1

Has resulted in a stable PC for the last 24 hours which is a 3 year record!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.