GF114 cards = "PFIFO - playlist update failed" on boot.

Bug #1041637 reported by 3vi1
70
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Linux
Incomplete
Critical
linux (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

The bug reported at https://bugs.freedesktop.org/show_bug.cgi?id=53101 appears to be affecting Quantal too.

Using a GTX 560 Ti, I see the same "PFIFO - playlist update failed" and "Failed to idle channel #" messages when using the Nouveau driver. LightDM fails to start and is restarted over and over.

This appears to be due to this commit: http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=1a46098e910b96337f0fe3838223db43b923bad4

Revision history for this message
In , M. Oliver Ghingold (mog55356) wrote :

Created attachment 65099
Kernel log from boot

Problem:
    Can no longer start x server.

Steps to reproduce:
    1. Boot the computer
Expected behaviour:
    User expects to reach the login prompt.
Actual behaviour:
    X Server fails to start

History:
    Updated to latest kernel on Fedora 17 x86_64: Linux version 3.5.0-2.fc17.x86_64 (<email address hidden>) (gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) ) #1 SMP Mon Jul 30 14:48:59 UTC 2012
    Rebooted
    Saw fedora begin to boot up, but instead of being presented with a login screen I saw noise/leftover images from previous boot
    X server terminated and I saw some nouveau errors on screen:
        [ 43.155163] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed
        [ 53.020045] [drm] nouveau 0000:01:00.0: Failed to idle channel 1.
        [ 57.019076] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed
        [ 60.017783] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
        [ 64.016807] [drm] nouveau 0000:01:00.0: PFIFO - playlist update failed
    The screen then went back to noise/leftovers for a few seconds, then displayed those error messages again in sequence
    This continued endlessly until I boot with the previous kernel.

Hardware information:
    The model is a GTX 580m. According to the wiki, this is an NVCE (GF114).
    sudo lspci -v | less found this:
        01:00.0 VGA compatible controller: nVidia Corporation Device 1211 (rev a1) (prog-if 00 [VGA controller])
            Subsystem: CLEVO/KAPOK Computer Device 7100
            Flags: bus master, fast devsel, latency 0, IRQ 16
            Memory at f4000000 (32-bit, non-prefetchable) [size=32M]
            Memory at e8000000 (64-bit, prefetchable) [size=128M]
            Memory at f0000000 (64-bit, prefetchable) [size=64M]
            I/O ports at e000 [size=128]
            Expansion ROM at f6000000 [disabled] [size=512K]
            Capabilities: [60] Power Management version 3
            Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
            Capabilities: [78] Express Endpoint, MSI 00
            Capabilities: [b4] Vendor Specific Information: Len=14 <?>
            Capabilities: [100] Virtual Channel
            Capabilities: [128] Power Budgeting <?>
            Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
            Kernel driver in use: nouveau

Attached files:
    messages.txt
        This is the kernel log from the boot with the new kernel. The Fatal X server error and the PFIFO errors can be found near the end of the log. If I had let the computer keep running, the last few messages would have looped, presumably endlessly.
        NOTE: THE LOG CAN ALSO BE ACCESSED HERE - http://pastebin.com/rrVddzgq

Thank you for taking the time to look into this matter. Please let me know if you require any additional information.

Revision history for this message
In , M. Oliver Ghingold (mog55356) wrote :

I was asked to try booting with option nouveau.noaccel=1. Grub didn't complain when I added it to the boot instructions, but the results were identical so I'm not sure whether or not the command "took." Below is a pastebin link to the new /var/log/messages. I hope it is useful.

http://pastebin.com/t39ZHCwP

Revision history for this message
In , Michael-weirauch (michael-weirauch) wrote :

Hijacking this bug as I get the same messages, just after resume.

ThinkPad W520 4276CTO NVC0 (2000M)
openSUSE 12.2 + nouveau 20120813 872dcac

* proposed nouveau.noaccel=1 crashes kernel (nouveau_abi16_ioctl_channel_alloc>nouveau_channel_new)

* Booting works (nox2apic, W520 ACPI table issue)
* gdm has graphics distortions though (see early dmesg excerpt)
* double ctrl+alt+backspace "fixes" this and gdm looks good
* suspend from gnome-shell 3.4.2 works
* resume shows gdm-password prompt and usually a white-noise background
** the gnome-shellish top-panel looks intact, though
** mouse cursor not movable, cpu load
** looks like "something" tries to restart gdm/X over and over again
* switching to vt possible with some insisting
* restarting gdm does lock up the system
* the "channel x kick timeout" seems new since some commits IIRC

repeatedly in dmesg:
[ 156.925301] nouveau E[ PFIFO][0000:01:00.0] playlist update failed
[ 159.924800] nouveau E[ DRM][0000:01:00.0] failed to idle channel 0xcccc0000
[ 161.924690] nouveau E[ PFIFO][0000:01:00.0] channel 1 kick timeout
[ 161.924787] nouveau [ PFIFO][0000:01:00.0] unknown status 0x00000100
[ 163.924603] nouveau E[ PFIFO][0000:01:00.0] playlist update failed
[ 163.989722] nouveau [ PFIFO][0000:01:00.0] unknown status 0x00000100
[ 165.989535] nouveau E[ PFIFO][0000:01:00.0] channel 3 kick timeout
[ 165.989670] nouveau [ PFIFO][0000:01:00.0] unknown status 0x00000100
[ 167.989455] nouveau E[ PFIFO][0000:01:00.0] playlist update failed
[ 167.989517] nouveau ![ PFIFO][0000:01:00.0] unhandled status 0x00000001
[ 170.649537] nouveau E[ PFIFO][0000:01:00.0] playlist update failed
[ 172.660200] nouveau E[ PFIFO][0000:01:00.0] playlist update failed
[ 185.103713] nouveau E[ DRM][0000:01:00.0] failed to idle channel 0xcccc0001
[ 187.103627] nouveau E[ PFIFO][0000:01:00.0] channel 2 kick timeout

I tried a fc17 install and the original kernel (3.3.4-5.fc17.x86_64) worked. Suspend/resume fine at least when not in docking station. After updating that test install to 3.5.1-1.fc17.x86_64 the same issues cropped up I see in openSUSE 12.2. So this looks distribution agnostic.

Any pointers on what to try to help diagnose this issue are welcome.

Revision history for this message
In , Michael-weirauch (michael-weirauch) wrote :

Created attachment 65608
W520-4276CTO-NVC0 dmesg commitish-872dcac gdm + suspend/resume cycle

Revision history for this message
In , M. Oliver Ghingold (mog55356) wrote :

*** Bug 53566 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Michael-weirauch (michael-weirauch) wrote :

Bisection rounds testing successful suspend/resume cycles on NVC0/2000M:
note:
* gdm greeter is showing garbage (screen content from before reboot) somewhere before the last known good commits
** this issue was ignored and still present in the last good commit but is not the topic of this bug

$ git bisect log
# bad: [f9b495fca46836a6a05cedde8058ccb8a3e62c3d] drm/nouveau: use ioread32_native/iowrite32_native for fifo control registers
# good: [f887c425f9eeed8ffbca64c8be45da62b07096c0] drm/nouveau: bump version to 1.0.0
git bisect start 'HEAD' 'f887c425f9eeed8ffbca64c8be45da62b07096c0' '--' 'drivers/gpu/drm/nouveau/'
# bad: [9bd0c15fcfb42f6245447c53347d65ad9e72080b] drm/nouveau/fbcon: using nv_two_heads is not a good idea
git bisect bad 9bd0c15fcfb42f6245447c53347d65ad9e72080b
# good: [5132f37700210740117f5163b5df7aa1c8469a55] drm/nve0/fifo: initial implementation
git bisect good 5132f37700210740117f5163b5df7aa1c8469a55
# bad: [71af5e62db5d7d6348e838d0f79533653e2f8cfe] drm/nv50/gr: make sure NEXT_TO_CURRENT is executed even if nothing done
git bisect bad 71af5e62db5d7d6348e838d0f79533653e2f8cfe
# good: [afada5e0bb3cac8530c2ae36aa0abca41d60e063] drm/nv04/disp: disable vblank interrupts when disabling display
git bisect good afada5e0bb3cac8530c2ae36aa0abca41d60e063
# bad: [5e120f6e4b3f35b741c5445dfc755f50128c3c44] drm/nouveau/fence: convert to exec engine, and improve channel sync
git bisect bad 5e120f6e4b3f35b741c5445dfc755f50128c3c44
# good: [35bcf5d55540e47091a67e5962f12b88d51d7131] drm/nouveau: move flip-related channel setup to software engine
git bisect good 35bcf5d55540e47091a67e5962f12b88d51d7131
# good: [d375e7d56dffa564a6c337d2ed3217fb94826100] drm/nouveau/fence: minor api changes for an upcoming rework
git bisect good d375e7d56dffa564a6c337d2ed3217fb94826100

5e120f6e4b3f35b741c5445dfc755f50128c3c44 is the first bad commit
commit 5e120f6e4b3f35b741c5445dfc755f50128c3c44
Author: Ben Skeggs <email address hidden>
Date: Mon Apr 30 13:55:29 2012 +1000

    drm/nouveau/fence: convert to exec engine, and improve channel sync

    Now have a somewhat simpler semaphore sync implementation for nv17:nv84,
    and a switched to using semaphores as fences on nv84+ and making use of
    the hardware's >= acquire operation.

    Signed-off-by: Ben Skeggs <email address hidden>

:040000 040000 8f2ca4ddf4969c75f688a96fdb152e449fda4852 da67a1bd8d608577e659a26715cf8af3644d8efe M drivers

Revision history for this message
In , Vlad-kvs (vlad-kvs) wrote :

Michael, either your bug is a different regression and needs new bug report, or I will reopen bug 53566.

Revision history for this message
In , Michael-weirauch (michael-weirauch) wrote :

(In reply to comment #6)
> Michael, either your bug is a different regression and needs new bug report, or
> I will reopen bug 53566.

I am not even sure bug 53566 is a duplicate as your bisection determined first bad commit is different to what I bisected.

What's the stance from the devs on this?
Reopen 53566? Me filing a new bug (replicating the info here)? Both?

Revision history for this message
In , M. Oliver Ghingold (mog55356) wrote :

Based on the description the bug Michael is describing sounds different from mine. Your description of the problem in 53566 sounds exactly like my problem, and matches what I saw in my own kernel log. I must have done a poor job explaining the problem because when Michael hijacked this bug he said that he thought it was the same problem I was having; it obviously is not. His problem probably belongs in a different bug.

Revision history for this message
In , Michael-weirauch (michael-weirauch) wrote :

I was basing my assumption that I am hitting the same issue like you based on your log output with "PFIFO - playlist update
failed" and "Failed to idle channel x" which is exactly the errors I get when resuming. (Just not on boot)

I will create a new bug. Sorry for the noise guys. Perhaps we are bitten by the same root cause, nevertheless.

Revision history for this message
In , Michael-weirauch (michael-weirauch) wrote :

Ok, after finding out the bad commit and looking for it around here I have found bug 50121 where I attached my info (again).

Revision history for this message
In , Kel-p-doran (kel-p-doran) wrote :

I have been playing around with this a bit and made some progress. It seems to affect any nvc0 card (I have a GTX 580). I went through the commits between 3.4.0 and 3.5.0-rc1 and determined that the cause of the error is http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=1a46098e910b96337f0fe3838223db43b923bad4

The cards work fine with the latest nouveau git tree if you comment out:
  { "COPY1", 5, 0x90b8, nvc0_bo_move_copy, nvc0_bo_move_init },
  { "COPY0", 4, 0x90b5, nvc0_bo_move_copy, nvc0_bo_move_init },

which seems to imply that the nvc0_bo_move_copy function is not working correctly. I don't know nearly enough about nouveau to try to fix this function or know what consequence commenting out these lines has, but hopefully this helps.

On a possibly related note, running glxinfo seems to crash xorg and produce some more PFIFO errors in dmesg, I have no idea if this is related to those lines being commented out or not (this is the first time I have ever gotten nouveau working on this computer). Everything else seems stable... so far...

Revision history for this message
In , Oe-frepdesktoh-n8 (oe-frepdesktoh-n8) wrote :

I seem to be seeing the exact same thing at boot with the current Ubuntu 12.10 alphas and my GTX560 Ti (also a GF114).

Shouldn't this be marked as a high priority regression? I would expect that in a month and a half we're going to see a lot of sad pandas saying that Linux sucks when they try the new Ubuntu release and get a looping LightDM crash.

Revision history for this message
In , M. Oliver Ghingold (mog55356) wrote :

Sorry, when I created this bug I had no idea it was affecting other nvc0 cards. I Googled extensively and couldn't find anyone else who had my exact error, so I assumed that it was some esoteric detail about my specific hardware configuration. I didn't want to make it seem like a big deal if it wasn't. Since this seems to be affecting all nvc0's on 3.5+, I'll mark it as high priority critical. If those are not the correct importance settings just let me know.

Revision history for this message
In , Vlad-kvs (vlad-kvs) wrote :

In the meantime, you can just revert commit 1a46098e910b96337f0fe3838223db43b923bad4, which allowed me to boot properly. Ubuntu devs can do the same if its not fixed in time for release.

Revision history for this message
In , Oe-frepdesktoh-n8 (oe-frepdesktoh-n8) wrote :

>> ...when I created this bug I had no idea it was affecting other
>> nvc0 cards. I Googled extensively and couldn't find anyone
>> else who had my exact error...

Understandable. I would imagine that most users with these card models are using the proprietary drivers for performance reasons. I wouldn't have even noticed it myself, if the new xserver 1.13 hadn't been pushed into Quantal before the supporting nvidia-current package was ready.

I'll open a bug in Ubuntu's launchpad with a reference to this one, as I don't think they're aware of the problem yet.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1041637/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
3vi1 (launchpad-net-eternaldusk) wrote :

I believe the affected file is included in the kernel package. Please correct if wrong.

affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-bug-exists-upstream quantal
Changed in linux:
importance: Unknown → Critical
status: Unknown → Confirmed
Revision history for this message
randomizer (randomizer) wrote :

This bug also affects my GF110-based card in today's daily build of Quantal (I haven't tested other builds yet).

Revision history for this message
Sebastian (slovdahl) wrote :

GTX 560 Ti (GF110) and booting failed with both alpha 3 and today's daily (04-Sep).

Also tried booting 12.04.1, but that seems to be a different issue.

Revision history for this message
randomizer (randomizer) wrote :

This issue still affects Beta 1

Revision history for this message
Mark Abramov (markizko) wrote :

FWIW, I have the same issue, 560 Ti owner, but I never could get nouveau to work. I tried, I believe, 2.6.x kernels, 2.8.x kernels, 3.0.kernels, and now 3.5.x kernels. Mostly writing this comments to confirm and subscribe to updates.

Revision history for this message
Mark Abramov (markizko) wrote :

Compiled and tried 3.6-rc5 and it doesn't have this error, although I couldn't get a desktop environment to work with it. At least it semi-booted and the display had almost none pixel garbage.

Revision history for this message
Elisha Hastings (tshakah) wrote :

Same problem here. I've got a 570.

Revision history for this message
In , Andrei Amuraritei (sirdeiu) wrote :

Same problem here, ever since kernel 3.5.x. Using an Nvidia GTX 570 card with the nouveau driver and kernel 3.5.x results in no X start-up. Same messages, and I`ve tried Fedora 17 x64 - Fedora 18 Alpha x64 and Ubuntu 12.10 x64.

lspci -v

01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 570] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: eVga.com. Corp. Device 1570
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
 Memory at f0000000 (64-bit, prefetchable) [size=128M]
 Memory at f8000000 (64-bit, prefetchable) [size=32M]
 I/O ports at cc00 [size=128]
 [virtual] Expansion ROM at fe900000 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 Capabilities: [b4] Vendor Specific Information: Len=14 <?>
 Capabilities: [100] Virtual Channel
 Capabilities: [128] Power Budgeting <?>
 Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
 Kernel driver in use: nvidia

uname -a
Linux 3.5.3-1.fc17.x86_64 #1 SMP Wed Aug 29 18:46:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
saperr.pmi@gmail.com (saperr-pmi) wrote :

You do something with this bug?

Revision history for this message
Ivan (quids) wrote :

Prevented testing of Ubuntu 12.10 Beta 2 since I couldn't get into the installer

lspci -v
(output from Ubuntu 12.04 with Kernel 3.5.4 and Nvidia driver installed)

01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 560 Ti] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: ASUSTeK Computer Inc. Device 838b
 Flags: bus master, fast devsel, latency 0, IRQ 24
 Memory at fc000000 (32-bit, non-prefetchable) [size=32M]
 Memory at d0000000 (64-bit, prefetchable) [size=128M]
 Memory at d8000000 (64-bit, prefetchable) [size=64M]
 I/O ports at e000 [size=128]
 [virtual] Expansion ROM at fe000000 [disabled] [size=512K]
 Capabilities: <access denied>
 Kernel driver in use: nvidia
 Kernel modules: nvidia, nouveau, nvidiafb

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
StephanBeal (sgbeal) wrote :

This happens to me (since upgrading to 12.10) after coming out of suspend mode, making suspend absolutely useless.

root@host:/etc# lspci | grep -i nvid
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 425M] (rev a1)

HTH can the maintainers justify triaging a bug which breaks a feature which was worked flawlessly for years?

Revision history for this message
In , Raphgro (raphgro) wrote :

Well, I think I can reproduce.
But I have a current ArchLinux with Cinnamon and a NV44.

https://bugs.freedesktop.org/show_bug.cgi?id=61463
https://bugs.freedesktop.org/show_bug.cgi?id=61611

Someone else asks on the kernel mailing list.
http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/01611.html

Revision history for this message
In , Raphgro (raphgro) wrote :
Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

This is pseudo-similar to bug 53566 which I closed earlier. Do these issues persist, or are they all fixed in recent kernels?

Revision history for this message
Ivan (quids) wrote :

Its all working fine for me now

Revision history for this message
In , Søren Dalby Larsen (sdlarsen) wrote :

I'm seeing this for the first time on a 3.11 kernel, so I'd say it's still a problem. I haven't seen it on 3.9.* or 3.10.* kernels though.

Revision history for this message
In , Radulican (radulican) wrote :

Same here.
recently switched to a nvidia card (gt630) on kernel 3.11.6
after startx, blank screen with top left cursor, and then wide stair-like black white stripes with noise.

after trying git kernel 3.12.0rc7 no stripes, different kind of noise but same result.

i installed the video card today, so I can't tell if it worked before (yet, but i am installing a <3.11 kernel now)

If further information are helpful i will provide them

Revision history for this message
In , Radulican (radulican) wrote :

http://bpaste.net/show/144880/
Xorg.0.log

While this problem persist even with nouveau.noaccel=1, with modesetting everything works fine

Revision history for this message
penalvch (penalvch) wrote :

3vi1, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
In , Tobias-klausmann (tobias-klausmann) wrote :

Can you still reproduce this with a newer kernel?

Changed in linux:
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.