nouveau failed to idle channel 0xcccc0000

Bug #1097178 reported by Poil
186
This bug affects 37 people
Affects Status Importance Assigned to Milestone
linux-lts-raring (Ubuntu)
Invalid
Low
Unassigned

Bug Description

Hi,

With Kernel 3.7.* and 3.8rc1 and rc2 I have this error
Jan 8 09:05:21 pcpoil kernel: [407794.764008] nouveau E[ 1442] failed to idle channel 0xcccc0000
Jan 8 09:05:23 pcpoil kernel: [407796.764446] nouveau W[PCIEGART][0000:01:00.0] flush timeout, 0x00000002
Jan 8 09:05:25 pcpoil kernel: [407798.764540] nouveau W[PCIEGART][0000:01:00.0] flush timeout, 0x00000002
Jan 8 09:05:27 pcpoil kernel: [407800.764586] nouveau W[PCIEGART][0000:01:00.0] flush timeout, 0x00000002
Jan 8 09:05:27 pcpoil kernel: [407800.768002] [sched_delayed] sched: RT throttling activated

The system become instable, and I need to reboot.

It was working fine with kernel 3.5

Kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline on Ubuntu 12.10
VGA : 01:00.0 VGA compatible controller: NVIDIA Corporation NV43 [GeForce 6600 GT] (rev a2)

dpkg -l |grep nouveau
libdrm-nouveau1a:amd64 2.4.40+git20130104.baf0a7da-0ubuntu0ricotz~quantal amd64
libdrm-nouveau2:amd64 2.4.40+git20130104.baf0a7da-0ubuntu0ricotz~quantal amd64
xserver-xorg-video-nouveau 1:1.0.6+git20130107.8f934fad-0ubuntu0sarvatt~quantal amd64

uname -a
Linux pcpoil 3.7.0-7-generic #15-Ubuntu SMP Sat Dec 15 16:34:25 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Best regards,

Poil (poil)
description: updated
Revision history for this message
Hein van Dam (h-t-vandam) wrote :

With kernel 3.8 I have the same problem, with kernel 3.7 I can still boot. It must be the nvidia card as the 3.8 kernel works fine with my msi wind netbook.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-quantal (Ubuntu):
status: New → Confirmed
Revision history for this message
xxx (xgddghxx-deactivatedaccount-deactivatedaccount) wrote :

I've run a bisect on Linus' master branch, and narrowed the issue down to this one line change, which isn't much to go on:

commit 7707b701ebfea64afa6bfb23aa318fd687892754
Author: Marcin Slusarz <email address hidden>
Commit: Ben Skeggs <email address hidden>

    drm/nv40/mpeg: fix context handling

    It slipped in thanks to typeless API.

    Signed-off-by: Marcin Slusarz <email address hidden>
    Signed-off-by: Ben Skeggs <email address hidden>

diff --git a/drivers/gpu/drm/nouveau/core/engine/mpeg/nv40.c b/drivers/gpu/drm/n
index 1241857..f7c581a 100644
--- a/drivers/gpu/drm/nouveau/core/engine/mpeg/nv40.c
+++ b/drivers/gpu/drm/nouveau/core/engine/mpeg/nv40.c
@@ -38,7 +38,7 @@ struct nv40_mpeg_priv {
 };

 struct nv40_mpeg_chan {
- struct nouveau_mpeg base;
+ struct nouveau_mpeg_chan base;
 };

Revision history for this message
xxx (xgddghxx-deactivatedaccount-deactivatedaccount) wrote :

Bug also exists in Nouveau driver bugzilla, which I've also updated with the bisect result:

https://bugs.freedesktop.org/show_bug.cgi?id=54786

Revision history for this message
xxx (xgddghxx-deactivatedaccount-deactivatedaccount) wrote :

Nouveau driver bug comment states that this issue is no longer present in the latest kernel - after updating to latest from Linus' tree (commit 323a72d83c9) and running, I can confirm that this is the case.

tags: added: kernel-fixed-upstream
Revision history for this message
Jan K. (jan-launchpad-kantert) wrote :

Will there be a kernel update to 3.9 with this fix for 13.04? Also affects Thinkpad T420s with Nvidia Optimus.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-raring (Ubuntu):
status: New → Confirmed
Revision history for this message
FuzzyQ (atomicfuzzyq) wrote :

I'm experiencing this bug on a MSI GT70 with a GeForce GTX 675MX (Optimus).

Revision history for this message
Mohan (dr-mohan) wrote :

This bug affects me too. Details as follows
Kernel 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64

Revision history for this message
giri (bollsg) wrote :

I see the nouveau failed due to idle channel error on installing 12.10.

Revision history for this message
Marc Quinton (mquinton) wrote :

Hello.

I have this bug, with ubuntu-trusty, in early stage, when I ready some video with VLC.
- my video card : NVIDIA Corporation GT218 [NVS 300]
- xserver-xorg-video-nouveau, 1:1.0.10-1ubuntu2

best regards.

Revision history for this message
Marc Quinton (mquinton) wrote :

redhat bugzilla point out to a kernel patch for 3.14 : https://bugzilla.redhat.com/show_bug.cgi?id=918732

Revision history for this message
Vova U (uwl) wrote :

after the upgrade from 12.04 to 14.04 the systemis not useable any more

Revision history for this message
Marian Krause (mkdugi) wrote :

Hi,

This bug affects me :-(.

Up to date Ubuntu trusty 14.04 (today 2014.04.27).
01:00.0 VGA compatible controller: NVIDIA Corporation G71GL [Quadro FX 3500] (rev a1)
xserver-xorg-video-nouveau 1:1.0.10-1ubuntu2

Syslog:
Apr 27 17:27:05 cacko kernel: [ 852.982767] nouveau E[Xorg[3900]] failed to idle channel 0xcccc0001 [Xorg[3900]]
Apr 27 17:27:20 cacko kernel: [ 867.981445] nouveau E[Xorg[3900]] failed to idle channel 0xcccc0001 [Xorg[3900]]
Apr 27 17:27:35 cacko kernel: [ 882.980118] nouveau E[Xorg[3900]] failed to idle channel 0xcccc0000 [Xorg[3900]]
Apr 27 17:27:50 cacko kernel: [ 897.978806] nouveau E[Xorg[3900]] failed to idle channel 0xcccc0000 [Xorg[3900]]

X freezes. I can change to text terminal, do some things and reboot (reboot takes long).

This appear in my system when Adobe Flash Player trys to play flash in Firefox (didn't check with other browsers).
Maby not every flash.

Revision history for this message
Jakub Liška (liska-jakub) wrote :

I got it too, X freezes right after this :

May 1 16:05:56 lisak kernel: [ 1588.032792] nouveau E[Xorg[1323]] failed to idle channel 0xcccc0000 [Xorg[1323]]
May 1 16:05:57 lisak gnome-session[1879]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.#012
May 1 16:05:57 lisak colord: device removed: xrandr-BenQ-BenQ G2411HD-D4910151SL0
May 1 16:05:57 lisak colord: Profile removed: icc-d63583c0a76a513bac920519a90532e8
May 1 16:06:12 lisak kernel: [ 1603.182708] nouveau E[compiz[2063]] failed to idle channel 0xcccc0000 [compiz[2063]]

VGA compatible controller: NVIDIA Corporation C77 [GeForce 8200] (rev a2)

dpkg -l | grep nouveau
ii libdrm-nouveau2:amd64 2.4.52-1 amd64 Userspace interface to nouveau-specific kernel DRM services -- runtime
ii xserver-xorg-video-nouveau 1:1.0.10-1ubuntu2 amd64 X.Org X server -- Nouveau display driver

uname -a
Linux lisak 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Jakub Liška (liska-jakub) wrote :

Btw I don't know if it is related to this issue, but sometimes I also get this error :

411.265798] nouveau E[ PFIFO][0000:02:00.0] DMA_PUSHER - ch 2 [Xorg[1108]] get 0x00200358c4 put 0x00200358e0 ib_get 0x000000cc ib_put 0x000000cf state 0x80000000 (err: INVALID_CMD) push 0x00400040

After that my thinkpad usb keyboard stops working. There is nothing logged about it though except for this nouveau error.

Revision history for this message
Sven Arnold (sven-internetallee) wrote :

I see this error also repeatedly since upgrading from 13.10 to 14.04. The system is unusable since then.
I tried to use different version of the nvidia proprietary drivers but not of them did run stable either. Currently I use updated drivers from oibaf ppa.

Kernel:
3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Graphics Device:
VGA compatible controller: NVIDIA Corporation C79 [GeForce 9400] (rev b1)

xserver-xorg-video-nouveau 1:1.0.10+git1405091930.8604a7~gd~t
libdrm-nouveau2:amd64 2.4.54+git1405131830.305478~gd~t
libdrm2:amd64 2.4.54+git1405131830.305478~gd~t

[ 4657.804009] nouveau E[Xorg[1120]] failed to idle channel 0xcccc0001 [Xorg[1120]]
[ 4657.804035] nouveau E[ PFB][0000:02:00.0] trapped write at 0x010028e3a8 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4672.804008] nouveau E[Xorg[1120]] failed to idle channel 0xcccc0001 [Xorg[1120]]
[ 4672.804036] nouveau E[ PFB][0000:02:00.0] trapped write at 0x010028e398 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4672.804063] nouveau E[ PFB][0000:02:00.0] trapped write at 0x010028e3a0 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4672.804078] nouveau E[ PFB][0000:02:00.0] trapped write at 0x010028e390 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4672.804093] nouveau E[ PFB][0000:02:00.0] trapped write at 0x010028e388 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4672.804126] nouveau E[ PFB][0000:02:00.0] trapped write at 0x01003f9020 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4672.804151] nouveau E[ PFB][0000:02:00.0] trapped write at 0x0100000000 on channel 0x0001fee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
[ 4676.506881] nouveau W[ PFIFO][0000:02:00.0] unknown intr 0x06080000, ch 127

above line is repeated about 100 times, then:

May 16 22:12:33 comet kernel: [ 4676.507823] nouveau E[ PFIFO][0000:02:00.0] still angry after 101 spins, halt

Revision history for this message
Juan (elkato) wrote :

Sven,

Are you still using 14.04 and nvidia?
How you solved?
I'm at that point. My card is optimus nvidia.

Thanks in advance!

Revision history for this message
Sven Arnold (sven-internetallee) wrote :

Juan,

meanwhile I noted that I have a potential hardware problem (Mainboard or RAM) which added confusion:

Strangely, my system works with one DDR2 module of 2GB but crashes when using two modules. The RAM itself seems ok (tried each of the four modules separately tried every memory bank). While this could be induced by a problem on the mainboard and/or power supply it is still confusing that the problem occured exactly when upgrading to 14.04.

Anyways: Currently, with one 2GB DIMM in use I have a probably working setup:

kernel 3.13.0-27
xserver-xorg-video-nouveau 1:1.0.10+git1405261930.4a18dd~gd~t
libdrm-nouveau2:amd64 2.4.54+git1405200630.8fc62c~gd~t
nouveau-firmware 20091212-0ubuntu1

graphics devices drivers are from oibaf ppa

Best regards,

Sven

Revision history for this message
penalvch (penalvch) wrote :

Poil, thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command, as it will automatically gather debugging information, in a terminal:
apport-collect 1097178
When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

tags: added: regression-release
tags: removed: kernel-fixed-upstream
no longer affects: linux-lts-quantal (Ubuntu)
Changed in linux-lts-raring (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
tags: added: raring
Revision history for this message
John Small (jds340) wrote :

Why is this listed as low importance. For people who have the problem it's high importance. I can't use my laptop becauses of it.

I can't wait for a fix. I'm using 13.10, so I'll wipe it and try 14.04. It's that serious I have to wipe my setup and do a complete re-install

Revision history for this message
Poil (poil) wrote :

@Christopher M. Penalver (penalvch)

Sorry I'm no more using an Nvidia cards on my computer; I can give my old card if someone want to debug ...

Revision history for this message
penalvch (penalvch) wrote :

Poil, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1097178/comments/23 regarding you are no longer using the hardware. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

John Small, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into the default Ubuntu kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Changed in linux-lts-raring (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
John Small (jds340) wrote :

Ok will do. It'll be a while yet. I've been through many cycles of install-wipe-install trying to sort this one out.

Perham (perham-x)
Changed in linux-lts-raring (Ubuntu):
status: Invalid → Confirmed
penalvch (penalvch)
Changed in linux-lts-raring (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Zerosith (zerosith) wrote :

It also happens to me, I don't think this bug should be closed. I have an asus n53g with optimus support and haven't been able to fix this with the nvidia propietary drivers.

Can someone provide more assistance?

Thanks

Revision history for this message
penalvch (penalvch) wrote :

Zerosith, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into the default Ubuntu kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
Jakub Liška (liska-jakub) wrote :

So, it seems I'm stuck with Raring for good on this hardware :-)

Revision history for this message
penalvch (penalvch) wrote :

Jakub Liška, unfortunately as this bug report is closed, it has nothing to do with you, your problem, or your hardware. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into the default Ubuntu kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
linas (linasvepstas) wrote :

FWIW, hit this just now, with brand new kernel 3.16.1 and ubuntu precise LTS 12.04

about 30-60 seconds into boot, after X comes up, before loging in:
nouveau failed to idle channel 0xcccc0000
then monitor shuts off, and the system hangs hard, cannot alt-f1 to get to a tty

Revision history for this message
linas (linasvepstas) wrote :

The same bug hits redhat too, and seems to affect *all* kernels after 3.7 See https://bugzilla.redhat.com/show_bug.cgi?id=918732 for details. That bug report also suggests a hacky kernel patch that appears to avoid a race condition, and is claimed to fix the problem. Will try it out shortly.

Revision history for this message
penalvch (penalvch) wrote :

linas, unfortunately as this bug report is closed, it has nothing to do with you, your problem, or your hardware. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into the default Ubuntu kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
linas (linasvepstas) wrote :

The kernel patches there are not sufficent to resolve the hang for me.

Revision history for this message
linas (linasvepstas) wrote :

Harrumph. The indicated kernel patch does make the "failed to idle channel 0xcccc0000" message go away!

specifically, this:

diff --git a/drivers/gpu/drm/nouveau/core/subdev/mc/base.c
b/drivers/gpu/drm/nouveau/core/subdev/mc/base.c
index b4b9943..719db60 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/mc/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/mc/base.c
@@ -49,6 +49,8 @@ nouveau_mc_intr(int irq, void *arg)
        if (pmc->use_msi)
                oclass->msi_rearm(pmc);

+ udelay(1);
+
        if (intr) {
                u32 stat = intr = nouveau_mc_intr_mask(pmc);
                while (map->stat) {

It does NOT fix the X11 crash/hang, and appears to maybe never have been the root cause of the crash/hang. Why? Because system can still be ssh'ed into. Only the keyboard is unresponsive (and the monitor is self-powers off) (can't alt-f1 switch to tty)

The root cause appears to be this: in /var/log/Xorg.0.log:
33.188] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
followed by dozen or more stack traces

ps aux shows that X server is in D state (uninterruptible sleep) and of course kill -9 does not work on it, as a result.
 top shows bizarre stuff -- a high loadavg, but idle system . Hrmmm Go figure. That's wrong.

top - 22:03:52 up 19 min, 1 user, load average: 3.00, 2.99, 2.22
Tasks: 133 total, 1 running, 132 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Revision history for this message
linas (linasvepstas) wrote :
Download full text (29.2 KiB)

some more uninterruptible sleep stuff, from /var/log:

Aug 30 21:48:43 blackspot kernel: [ 240.160069] kworker/1:2 D 0000000000000001 0 64 2 0x00000000
Aug 30 21:48:43 blackspot kernel: [ 240.160088] Workqueue: pm pm_runtime_work
Aug 30 21:50:42 blackspot kernel: [ 360.160069] kworker/0:1 D 0000000000000001 0 26 2 0x00000000
Aug 30 21:50:42 blackspot kernel: [ 360.160102] Workqueue: events output_poll_execute [drm_kms_helper]
Aug 30 21:50:46 blackspot kernel: [ 360.160397] kworker/1:2 D 0000000000000001 0 64 2 0x00000000
Aug 30 21:50:46 blackspot kernel: [ 360.160407] Workqueue: pm pm_runtime_work
Aug 30 21:50:51 blackspot kernel: [ 360.160678] Xorg D 0000000000000001 0 1946 1136 0x00000000
Aug 30 21:52:43 blackspot kernel: [ 480.160069] kworker/0:1 D 0000000000000001 0 26 2 0x00000000
Aug 30 21:52:43 blackspot kernel: [ 480.160101] Workqueue: events output_poll_execute [drm_kms_helper]
Aug 30 21:52:47 blackspot kernel: [ 480.160392] kworker/1:2 D 0000000000000001 0 64 2 0x00000000
Aug 30 21:52:47 blackspot kernel: [ 480.160402] Workqueue: pm pm_runtime_work
Aug 30 21:52:52 blackspot kernel: [ 480.160674] Xorg D 0000000000000001 0 1946 1136 0x00000000
Aug 30 21:54:43 blackspot kernel: [ 600.160071] kworker/0:1 D 0000000000000001 0 26 2 0x00000000
Aug 30 21:54:43 blackspot kernel: [ 600.160105] Workqueue: events output_poll_execute [drm_kms_helper]
Aug 30 21:54:47 blackspot kernel: [ 600.160396] kworker/1:2 D 0000000000000001 0 64 2 0x00000000
Aug 30 21:54:47 blackspot kernel: [ 600.160405] Workqueue: pm pm_runtime_work
Aug 30 21:54:53 blackspot kernel: [ 600.160677] Xorg D 0000000000000001 0 1946 1136 0x00000000

and this:

Aug 30 21:46:04 blackspot kernel: [ 8.731718] nouveau E[ DISPLAY][0000:02:00.0] 01:0130: func 08 lookup failed, -2
Aug 30 21:46:05 blackspot kernel: [ 8.731750] nouveau W[ DRM] TMDS table script pointers not stubbed
Aug 30 21:46:06 blackspot kernel: [ 9.320395] EXT4-fs (md4): mounting with "discard" option, but the device does not support discard
Aug 30 21:46:08 blackspot kernel: [ 17.034452] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[1247]] get 0x00010000 put 0x00010090 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 30 21:46:08 blackspot kernel: [ 17.287691] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[1247]] get 0x00010090 put 0x000100a0 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 30 21:46:08 blackspot kernel: [ 21.764994] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[1247]] get 0x000100a0 put 0x000100b0 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 30 21:46:09 blackspot kernel: [ 21.796431] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Aug 30 21:46:09 blackspot kernel: [ 21.963345] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[1247]] get 0x000100b0 put 0x000100c0 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 30 21:46:09 blackspot kernel: [ 42.381349] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER ...

Revision history for this message
penalvch (penalvch) wrote :
Revision history for this message
linas (linasvepstas) wrote :

All the kernel stak traces are in power-management code, which should not be tripping.

Note also the timestampes in this intersting sequence:

Aug 30 21:46:14 blackspot kernel: [ 90.783374] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[1247]] get 0x65725f64 put 0x000102c8 state 0xc0000000 (err: MEM_FAULT) push 0x00000000
Aug 30 21:46:27 blackspot kernel: [ 105.780011] nouveau E[Xorg[1247]] failed to idle channel 0xcccc0000 [Xorg[1247]]
Aug 30 21:46:28 blackspot kernel: [ 105.780056] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[1247]] get 0x000102c8 put 0x000102d0 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 30 21:46:42 blackspot kernel: [ 120.780015] nouveau E[Xorg[1247]] failed to idle channel 0xcccc0000 [Xorg[1247]]
Aug 30 21:48:41 blackspot kernel: [ 240.160046] INFO: task kworker/1:2:64 blocked for more than 120 seconds.

the DMA push, then *exactly* 15 seconds later, the idle channel, then exctly 15 seconds later, another, then exactly 120 seconds later, the deadlock warning.

Revision history for this message
linas (linasvepstas) wrote :

Since mine is a desktop system, and I don't need power-management or suspend, I make menuconfig and unset CONFIG_PM to disable power management. The subroutines in the stack trace: rpm_suspend is in ./base/power and pci_pm_runtime_resume pci_pm_runtime_suspend etc. are in drivers/pci/pci-driver.c and are built only if CONFIG_PM_RUNTIME is set.

Recompiled, rebooted. Its .. sort of better. X no longer hung in uninterruptible sleep. The /var/log/Xorg.0.log messages " [mi] EQ overflowing. Additional events will be discarded until..." went away too, because X can now run.

Anyway, X seems to run now. Still getting junk like this though:

Aug 31 00:02:56 blackspot kernel: [ 247.134014] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[2061]] get 0x000102b0 put 0x000102c0 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 31 00:02:56 blackspot kernel: [ 247.134014] nouveau E[ PFIFO][0000:01:06.0] DMA_PUSHER - ch 1 [Xorg[2061]] get 0x000102b0 put 0x000102c0 state 0x80000000 (err: INVALID_CMD) push 0x00000000
Aug 31 00:03:11 blackspot kernel: [ 262.312013] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001 [Xorg[2061]]
Aug 31 00:03:11 blackspot kernel: [ 262.312013] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001 [Xorg[2061]]
Aug 31 00:03:11 blackspot kernel: [ 262.312013] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001 [Xorg[2061]]
Aug 31 00:03:26 blackspot kernel: [ 277.312016] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001 [Xorg[2061]]
Aug 31 00:03:26 blackspot kernel: [ 277.312016] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001 [Xorg[2061]]
Aug 31 00:03:26 blackspot kernel: [ 277.312016] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001 [Xorg[2061]]
Aug 31 00:03:26 blackspot kernel: [ 277.312790] nouveau E[ PFIFO][0000:01:06.0] CACHE_ERROR - ch 1 [Xorg[2061]] subc 0 mthd 0x1130 data 0x00000000
Aug 31 00:03:26 blackspot kernel: [ 277.312790] nouveau E[ PFIFO][0000:01:06.0] CACHE_ERROR - ch 1 [Xorg[2061]] subc 0 mthd 0x1130 data 0x00000000
Aug 31 00:03:26 blackspot kernel: [ 277.312790] nouveau E[ PFIFO][0000:01:06.0] CACHE_ERROR - ch 1 [Xorg[2061]] subc 0 mthd 0x1130 data 0x00000000
Aug 31 00:03:41 blackspot kernel: [ 292.312012] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0000 [Xorg[2061]]
Aug 31 00:03:41 blackspot kernel: [ 292.312012] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0000 [Xorg[2061]]
Aug 31 00:03:41 blackspot kernel: [ 292.312012] nouveau E[Xorg[2061]] failed to idle channel 0xcccc0000 [Xorg[2061]]
Aug 31 00:03:41 blackspot kernel: [ 292.312027] nouveau E[ PFIFO][0000:01:06.0] CACHE_ERROR - ch 1 [Xorg[2061]] subc 0 mthd 0x1134 data 0x0046d1d7
Aug 31 00:03:41 blackspot kernel: [ 292.312027] nouveau E[ PFIFO][0000:01:06.0] CACHE_ERROR - ch 1 [Xorg[2061]] subc 0 mthd 0x1134 data 0x0046d1d7

so not all is well, not just yet. But for now X is up, for me.

Revision history for this message
linas (linasvepstas) wrote :

Summary/wrap-up report: After above changes, the X server took 8 tries to come up, each time hanging when it went to paint the lightdm pane. Each try took about 3 minutes (almost 1/2 hour elapsed), after which some failsafe tries to restart X. Each try is corellated with the "failed to idle channel" messages and/or the DMA_PUSHER errors. The 8th time it suceeded; at which point I was able to log in and use X as normal. There have been no further X disruptions after that point: the problem, whatever it is, is transient. (I have not yet tried to watch any youtube videos, though...) BTW, the 5th attempt Xorg.5.log file is filled with dozens of "[mi] EQ overflowing. Additional events will be discarded" error messages, and dozens of corresponding stack traces. None of the other failed attempts have this.

Since this is the very latest linux kernel, and what appears to be the latest libdrm, I'll try to pursue this with the kernel devs directly. The above report is an FYI for anyone else enountering this issue, since this bug is the #1 hit that google currently provides for these error messages, and its the ONLY bug that provides real, actionable information on how to resolve the issue. (i.e. simply marking this bug as invalid and telling everyone to take a flying leap doesn't actually decrease the relevance of the bug report for real-world users).

Revision history for this message
penalvch (penalvch) wrote :

linas:

>"The above report is an FYI for anyone else enountering this issue,"

Launchpad is a development platform, not an FYI forum. If you want to help on Launchpad, file a bug report as already previously requested of you on multiple occasions. To do otherwise is unhelpful noise on a closed report.

>"...since this bug is the #1 hit that google currently provides for these error messages,"

This being a #1 hit on a search engine is irrelevant. Finding information that just repeats an error message one encounters doesn't help anyone in getting their bug resolved.

"and its the ONLY bug that provides real, actionable information on how to resolve the issue."

There is nothing actionable on this closed report, as it's not about you, your hardware, or your problem. If you want to provide real, actionable information, submit a commit upstream that addresses this problem. Anything else is just a hack, or WORKAROUND, that again, would need to be on a new report, as already fully detailed to you previously on multiple occasions.

"(i.e. simply marking this bug as invalid and telling everyone to take a flying leap"

I've never said that. Please stop making false accusations.

"...doesn't actually decrease the relevance of the bug report for real-world users)."

One wants to make a bug report on Launchpad, a development platform, relevant to developers (i.e. the people actually providing fixes). By not filing a bug report, you are further delaying developers from addressing your issue.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
linas (linasvepstas) wrote :

Solution (for me): see https://bugs.freedesktop.org/show_bug.cgi?id=70388

boot the kernel with vram_pushbuf=1

if that does not work, try agpmode=0

Revision history for this message
linas (linasvepstas) wrote :

Users with AGP graphics cards and VIA pcie chipsets might find some luck. with agpmode=2 as described here: https://www.libreoffice.org/bugzilla/show_bug.cgi?id=20341

Revision history for this message
rubberducky (rubber-ducky170) wrote :

Had the same problem with computer Freezing after Log In. The mouse and wallpaper were displayed, but did not go any further than that, except to black screen with the "nouveau E[Xorg[2061]] failed to idle channel 0xcccc0001" Error repeated.
I've tried a lot of tricks, including whats on this page, to no avail.

Finally found one that worked:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1313402

And here's the code that worked for me:

sudo apt-get install nvidia-current
sudu reboot

Revision history for this message
penalvch (penalvch) wrote :

rubberducky, thank you for your comment. Unfortunately, as this bug report is closed, this bug report is not scoped to you, your hardware, or your problem. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into the default Ubuntu kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

As well, please do not announce in this report you created a new bug report.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
cagancelik (cagancelik) wrote :

This bug effects me as well. Let alone installation, I can't even use Live CD due to this bug. Ubuntu is completely unusable. See more in this thread (with screen shots). SuSe Linux boots and Works just fine but Ubuntu hangs all the time.

My laptop has GTX880M 8GB GPU. Obviously this is an Nvidia related bug and no importance is given to fix it. Shame for Linux community not to support a top of the line graphics card in 2015.

Revision history for this message
cagancelik (cagancelik) wrote :

Sorry, forget to include my screenshot.

Revision history for this message
penalvch (penalvch) wrote :

cagancelik, as this report is closed, it doesn't affect you. If you want your problem addressed, it would help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

Revision history for this message
mik047 (mik047) wrote :

affects me too.. I have NVIDIA® Quadro® K2100M Graphics 2GB GDDR5 on my config. While installing ubuntu, it complains of following..

nouveau E [DRM] failed to idle channel 0xcccc0000 [DRM]
xhci_hcd 0000:00:14.0: HC died: cleaning up
INFO: rcu_sched detected stails on CPUs/tasks: { 2} (detected by 0, t=150002 jiffies, g=324, c=323, q=0)
BUG: soft lockup - CPU#0 stuck for 22s! [khubd:74]
BUG: soft lockup - CPU#0 stuck for 22s! [scsi_id:1042]
...series of the last two lines....
INFO: task kworker/2:1:83 blocked for more than 120 seconds.
Tainted: G W 3.16.0.30-generic #40~04.1-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
...again the above message continues randomly starting from INFO: rcu_sched .......

Revision history for this message
Andrew (am-public-o) wrote :

Another one affected here. Nvidia GTX 250? Cannot even access shell commands to fix anything. Supposedly the proprietary drivers fix this. I have now had at least three PC's with varying hardware affected by driver issues on a clean install. Bugs have been present since sometime after 12.04 LTS, in the 13.x releases and is still present in 14.04.02LTS.

Feature Request: Install Proprietary drivers at OS install time by default

Revision history for this message
penalvch (penalvch) wrote :

Andrew, as this report is closed, you wouldn't be affected by it.

However, if you would like your issue addressed, please file a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.