Resume broken with new ATI stack

Bug #411294 reported by Alexander Hunziker
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
xserver-xorg-driver-ati
Fix Released
High
linux (Ubuntu)
Fix Released
High
Unassigned
xserver-xorg-video-ati (Fedora)
Fix Released
High

Bug Description

Binary package hint: xserver-xorg-video-ati

When using the new ATI stack ffrom the X-swat PPA (radeon-rewrite with DRI2 and KMS), suspend is broken on my Thinkpad T60 with ATI Mobile X1400. After waking up the system shows colorful vertical bars on screen. I'll attach a screenshot of that as soon as I can.

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub [8086:27a0] (rev 03)
 Subsystem: Lenovo Device [17aa:2015]
\01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145]
 Subsystem: Lenovo Device [17aa:2006]

Revision history for this message
Bryce Harrington (bryce) wrote :

Hi alex-hunziker,

Please attach the output of `lspci -vvnn`, and attach your /var/log/Xorg.0.log (and maybe Xorg.0.log.old) file from after reproducing this issue. If you've made any customizations to your /etc/X11/xorg.conf please attach that as well.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-xorglog
tags: added: needs-lspci-vvnn
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Incomplete
Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :
tags: removed: needs-lspci-vvnn
Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks for the lspci, when you get a chance the /var/log/Xorg.0.log is needed as well.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → New
description: updated
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Incomplete
Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

Xorg.0.log as requested. Also, dmesg is full of the following messages:

[ 9315.666488] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(7).
[ 9315.666491] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !

tags: removed: needs-xorglog
Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created an attachment (id=28602)
Xorg.0.log.old

Forwarding this bug from Ubuntu reporter Alexander Hunziker:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/411294

[Problem]
Screen corruption shown on resume when using KMS from the radeon-rewrite branch

[Versions]
libdrm - 2.4.12+git20090806.d74c67fb-0ubuntu1
linux - 2.6.31-6.25~radeon2
mesa - 7.6.0~git20090805.ac3de85e-0ubuntu1
xserver-xorg-video-ati - 1:6.12.99+git20090805.bd03977e-0ubuntu2

[Original Description]
When using the new ATI stack from the X-swat PPA (radeon-rewrite with DRI2 and KMS), suspend is broken on my Thinkpad T60 with ATI Mobile X1400. After waking up the system shows colorful vertical bars on screen. I'll attach a screenshot of that as soon as I can.

Also, dmesg is full of the following messages:

[ 9315.666488] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(7).
[ 9315.666491] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub [8086:27a0] (rev 03)
 Subsystem: Lenovo Device [17aa:2015]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145]
 Subsystem: Lenovo Device [17aa:2006]

Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Alexander,

I've forwarded your bug upstream to https://bugs.freedesktop.org/show_bug.cgi?id=23290 and subscribed you to it. It would probably help if you could attach your dmesg and a screenshot to the upstream bug. Beyond that, please keep an eye out for any requests from upstream for further information or things they need you to test. Thanks ahead of time.

Changed in xserver-xorg-video-ati (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
In , Alexander Hunziker (alex-hunziker) wrote :

Created an attachment (id=28618)
Screenshot illustrating the problem

Revision history for this message
In , Alexander Hunziker (alex-hunziker) wrote :

Created an attachment (id=28619)
Screenshot during shutdown

On the first screenshot attached, one can see a square that moves when I move the mouse pointer.

After "blindly" shutting down the machine, the screen turns orange (presumable because of the Ubuntu usplash being orange), see second screenshot.

Revision history for this message
In , Yang-yangman (yang-yangman) wrote :

Seeing this problem also on a T60 with a M52.

It's not LVDS specific, and the same corruption also occurs on externally connected monitors.

Also, the corruption actually happens before kernel finishes suspending. On suspend-to-disk, corruption is triggered around the time kernel starts writing the image to disk.

Appears to be a Thinkpad-specific BIOS quirk.

Changed in xserver-xorg-driver-ati:
status: Unknown → Confirmed
Revision history for this message
In , Yang-yangman (yang-yangman) wrote :

Duplicate of #23273 ?

Revision history for this message
In , Jmxorg (jmxorg) wrote :

Yes, the pictures look very familiar :-)
Note that I do not see the bars before suspend - but as my system currently doesn't support suspend-to-disk, I can't verify whether that's a suspend-to-disk vs. suspend-to-ram thing.

Revision history for this message
In , Jmxorg (jmxorg) wrote :

I probably should have mentioned that I opened bug 23479 about that. That bug also contains the logs that come up during resume.

Revision history for this message
In , Tom Morton (tomm) wrote :

Can confirm seeing this on my Thinkpad T60 with radeon x1300 mobile.

Corruption just as in those screenshots. System still 'up' (I can switch VT and see different garbled corruption patterns, and do ctrl-alt-delete to initiate restart)

Revision history for this message
In , David Kiliani (mail-davidkiliani) wrote :

Same problem here with 2.6.31 kernel, xorg-server and radeon driver from git (as of today). Kernel commandline option "nomodeset" is a workaround for me, so the problem is obviously KMS related.

Revision history for this message
Id2ndR (id2ndr) wrote :

Is this bug a duplicate of bug #318325 ?

22 comments hidden view all 103 comments
Revision history for this message
In , Peng (peng-redhat-bugs) wrote :
Download full text (20.5 KiB)

+++ This bug was initially created as a clone of Bug #473195 +++

Description of problem:
The resume after suspend or hibernate of a Thinkpad T60 with Radeon Mobility X1400 fails fails. The light of the display is turned on, but it remains black. In /var/log/messages I have the following lines:

kernel: [drm:radeon_resume] *ERROR*
kernel: [drm] Loading R500 Microcode
kernel: [drm] Num pipes: 1
kernel: [drm] writeback test failed
kernel: [drm:drm_ttm_bind] *ERROR* Couldn't bind backend.
kernel: executing set pll
kernel: executing set crtc timing
kernel: [drm] LVDS-8: set mode 1400x1050 11
kernel: executing set LVDS encoder

When booting with nomodeset suspend/resume works just fine, but without the nice new eye candy... The machine has been upgraded from F-9 to F-10 via a yum upgrade.

Version-Release number of selected component (if applicable):
kernel-2.6.27.5-117.fc10.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot laptop (without nomodeset)
2. Suspend
3. Resume, see black screen

Actual results:
The machine cannot be used. A hard power down and power up is required.

Expected results:
The screensaver password prompt should appear.

Additional info:
The smolt profile of the machine can be found at http://www.smolts.org/client/show/pub_d3521300-de3d-40ee-be30-5c99bb593c3b

--- Additional comment from <email address hidden> on 2008-11-27 11:02:46 EDT ---

Same hardware, same problem.

--- Additional comment from <email address hidden> on 2008-11-30 05:31:03 EDT ---

suspend/resume fails on Thinkpad T40 (Radeon Mobility 7500) too after upgrading from FC9 -> FC10 without error messages.

#
Nov 30 10:59:12 thinkpad kernel: [drm] Loading R100 Microcode
Nov 30 10:59:12 thinkpad kernel: [drm] writeback test succeeded in 2 usecs
Nov 30 11:07:46 thinkpad kernel: [drm] Initialized drm 1.1.0 20060810
Nov 30 11:07:46 thinkpad kernel: [drm] Initialized radeon 1.29.0 20080528 on minor 0
Nov 30 11:07:54 thinkpad kernel: [drm] Setting GART location based on new memory map
#

Machine is unusable... black screen after resume.

kernel-2.6.27.5-117.fc10.i686
rhgb/plymouth is enabled using vga=0x318 as kernel boot arg.

--- Additional comment from <email address hidden> on 2008-11-30 05:48:16 EDT ---

pm-suspend --quirk-none doesn't help. Problem is related to xorg. I can find countless related messages in previous xorg.log:

...
II) Macintosh mouse button emulation: Device reopened after 1 attempts.
(II) USB Optical Mouse: Device reopened after 1 attempts.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg(xorg_backtrace+0x3b) [0x812bc5b]
1: /usr/bin/Xorg(mieqEnqueue+0x289) [0x810b379]
2: /usr/bin/Xorg(xf86PostMotionEventP+0xc2) [0x80d4262]
3: /usr/bin/Xorg(xf86PostMotionEvent+0x68) [0x80d43c8]
4: /usr/lib/xorg/modules/input//evdev_drv.so [0x355a8d]
5: /usr/bin/Xorg [0x80bcdb7]
6: /usr/bin/Xorg [0x80ac91e]
7: [0x110400]
8: [0x110416]
9: /lib/libc.so.6(ioctl+0x19) [0x484949]
10: /usr/lib/libdrm.so.2 [0x20026cf]
11: /usr/lib/libdrm.so.2(drmCommandWriteRead+0x34) [0x2002934]
12: /usr/lib/dri/radeon_dri.so [0x3089b2]
13: /usr/lib/dri/radeon_dri.so [0x308b38]
14: /usr/lib/dri/radeon_dri.so(ra...

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

Created attachment 364054
dmesg after pm-suspent

[phuang@phuang-notebook ~]$ lspci |grep VGA
01:00.0 VGA compatible controller: ATI Technologies Inc M52 [Mobility Radeon X1300]

After below commands, I got the dmesg output.
1> switch vt from X to text console
2> echo 1 > /sys/module/drm/parameters/debug
3> pm-suspend --quirk-none

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

Created attachment 364056
The photo of my screen is suspend/resume in X

23 comments hidden view all 103 comments
Revision history for this message
In , Silvio-frischi (silvio-frischi) wrote :

I was just wondering could it be that this is a 64-bit problem? Or are there also people around who experience this with a 32-bit kernel?

Revision history for this message
In , Alexander Hunziker (alex-hunziker) wrote :

I'm on 32 bit

Revision history for this message
In , David Kiliani (mail-davidkiliani) wrote :

I just checked out the vanilla 2.6.32-rc4 kernel with the KMS initialization path changes and the bug still occurs. I'm also on 32bit here.

Revision history for this message
In , Tom Morton (tomm) wrote :

Is everyone who sees this bug on a Thinkpad T60?

Revision history for this message
In , David Kiliani (mail-davidkiliani) wrote :

Thinkpad T60 with ATI X1400 here.

Is it possible / helpful to supply any additional data, like logs or memory dumps?

43 comments hidden view all 103 comments
Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

Created attachment 366381
outputs of lspci

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

Hi Jerome,

This problem also happens in level 3 without Xserver. Why put this bug to xorg-x11-drv-ati?

Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

Peng we assign kms bugs to X drivers because the kernel gets too many bugs, hopefully we can separate kernel stuff out later.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Can you run the same lcpi command after resume & vbetool post with KMS disable in init 3. I put the to ati because it's easier for us to find ati hw related bug their.

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

Created attachment 366412
output of lspci for nokms

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Here are my observation before i forget about them :

Register dump i asked are all register the vbetool post ever read or write on
this specific hw. Thus if there is any difference in the way vbetool or KMS restore the card it should reflect in various dumps. It's not the case. The dumps
show that with KMS VGA is disable, PLL are different too (because video mode
setup by KMS and vbetool are different), of course video mode related register
are different.

Interesting things that diff btw dump shows, is that MC is idle on KMS and the
3D pipe configuration isn't restored. MC being idle could be either the source
of the bug or just reflect the fact that VRAM is not working. If MC is not
properly restored or in bogus state it could report IDLE because it doesn't
answer to any memory request from the GPU. Or if VRAM is not properly restored
MC can simply fail at executing request from the GPU and thus report IDLE.

Otherwise all others register have similar values.

My first attempt to fix the issue tried to reset the MC at resume, i found a bug
in my patch i am working on new one which will do the following (order matter) :
-stop MC at suspend
-reset MC at resume
-restore MC
-ASIC_Init

I am relooking at Atombios dump as i was looking at the wrong disasm of the atom bios tools, to check if vbetool post takes a different path than ASIC_Init.

Revision history for this message
In , Robert (robert-redhat-bugs-1) wrote :

I think I am seeing this problem also on an old ThinkPad T41 with RV250 on resume. I first see this:
radeon 0000:01:00.0: PCI INT A -> Link[LNKA] -> GSI 11 (level, low) -> IRQ 11
Oct 29 10:02:08 t41 kernel: [drm] GPU reset succeed (RBBM_STATUS=0x00000140)
Oct 29 10:02:08 t41 kernel: [drm] radeon: cp idle (0x02000000)
Oct 29 10:02:08 t41 kernel: [drm] radeon: ring at 0x00000000D0000000
Oct 29 10:02:08 t41 kernel: [drm:r100_ring_test] *ERROR* radeon: ring test failed (sracth(0x15E4)=0xCAFEDEAD)
Oct 29 10:02:08 t41 kernel: [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
Oct 29 10:02:08 t41 kernel: radeon 0000:01:00.0: failled initializing CP (-22).
Oct 29 10:02:08 t41 kernel: [drm] LVDS-13: set mode 1400x1050 1e

After which I get a continuous stream of these errors:
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(7).

My display is garbled, but I can switch to a VT. Would it help if I collected the debug data also?

kernel-2.6.31.5-97.fc12.i686
xorg-x11-drv-ati-6.13.0-0.10.20091006git457646d73.fc12.i686
xorg-x11-server-Xorg-1.7.1-1.fc12.i686

Revision history for this message
In , Robert (robert-redhat-bugs-1) wrote :

not sure if it helps, but after installing the -104 kernel from koji, IB(7) changed to IB(11)

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Robert i am confident your issue is different. Please open a new bug with following bug title:
RADEON:RV250:KMS Suspend/Resume fails (ThinkPad T41)

Attach full output of lspci -v and full dmesg after resume. Thanks.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Created attachment 366649
Stop mc at suspend and reset it at resume

Please try the attached patch it apply on top of lastest drm-next branch of Dave repo.

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

I'm seeing the same issue on my T60 with the X1400 as in comment #5. Everything works until I suspend/resume, after which the whole screen is garbled and blinking, but otherwise responsive. This is with 2.6.32-rc5-git4. Unfortunately the drm-next branch didn't work (unclear why, produced a hard lockup), so I stuck with drm-linus which seemed to contain a few safe bugfixes.

After applying your proposed workaround patch, things are still not working, and the error message you added is triggered: "[drm] (rv370_pcie_gart_set_page 78) VRAM seems to not work properly !", which seems to get emitted every time I am switching between a VT and the X server (none of which puts the graphics card into a sane state), with tons of the "couldn't schedule IB(15)" around them.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Peng, Christophe can you try to build :
http://people.freedesktop.org/~glisse/radeonvram.tar.bz2

You will need libpciaccess-dev (iirc name correctly). Than boot with KMS enabled in init 3 (add 3 to kernel boot cmd line). Suspend/resume and on resume when you get garbled screen run radeondump program (which is in radeonvram.tar.bz2) as root and report the output of the program, you likely need to do all this through ssh from another computer. Thanks.

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

Before suspending:

Found card 1002:7145 RV515
  region: (base: 0x0000000000000000, bus: 0x00000000D8000000, size: 134217728, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000002000, size: 256, is_io: 1)
  region: (base: 0x0000000000000000, bus: 0x00000000EE100000, size: 65536, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
BUS_CNTL: 0x00000001
CONFIG_CNTL: 0x00020100
CONFIG_MEMSIZE: 0x08000000
COMMAND|STATUS: 0x00100107
vram_test_hdp succeed

After resuming:

Found card 1002:7145 RV515
  region: (base: 0x0000000000000000, bus: 0x00000000D8000000, size: 134217728, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000002000, size: 256, is_io: 1)
  region: (base: 0x0000000000000000, bus: 0x00000000EE100000, size: 65536, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
BUS_CNTL: 0x00000001
CONFIG_CNTL: 0x00020000
CONFIG_MEMSIZE: 0x08000000
COMMAND|STATUS: 0x20100107
vram_test_hdp failed
vram_test_gpu succeed

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

Sorry, I cut off the last line containing a "vram_test_gup succeed" from the first output before suspending. This is BTW with the last patch you posted (where you attempt to reset the part that supposedly broke).

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

*** Bug 522253 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

Just an interesting observation from just now:

I put my laptop into suspend out of habit, and now that resumed it about an hour later (as opposed to my tests where I always suspended it for like 5 seconds), surprisingly it came back correctly.

It shows:

BUS_CNTL: 0x00000001
CONFIG_CNTL: 0x00020000
CONFIG_MEMSIZE: 0x08000000
COMMAND|STATUS: 0x00100107
vram_test_hdp succeed
vram_test_gpu succeed

While CONFIG_CNTL is 0x20000 like for the non-working case after resuming, COMMAND|STATUS doesn't have 0x20 in the upper byte, but 0x00 instead like before resuming.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

So quick comment it seems vram is properly working but that we can't access it through HDP (pci aperture). I will do a patch to reset hdp after resume to see if it helps. (Btw this is kind of good news as it means VRAM is likely properly restored by atombios which was my feeling).

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Created attachment 367460
Reset HostDataPath at resume

Please test this patch which apply on top of drm-next branch of Dave repo:
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git

I can generate rpm if you want but this will take me sometime.

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

I'm afraid to tell that this patch doesn't make any difference. If the power is plugged in, it always comes back in a broken state (and if it is not, chances are good that it does, as before) So, I guess it must be something else. If I had the slightest idea how modern graphics hardware works, I could have tried to help you figuring things out, but unfortunately I don't...

Revision history for this message
In , Matěj (matj-redhat-bugs) wrote :

Since this bugzilla report was filed, there have been several major updates in various components of the Xorg system, which may have resolved this issue. Users who have experienced this problem are encouraged to upgrade their system to the latest version of their packages (at least F12Beta, but even better if the very latest versions).

Please, if you experience this problem on the up-to-date system, let us now in the comment for this bug, or whether the upgraded system works for you.

If you won't be able to reply in one month, I will have to close this bug as INSUFFICIENT_DATA. Thank you.

[This is a bulk message for all open Fedora Rawhide Xorg-related bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]

Revision history for this message
In , David (david-redhat-bugs) wrote :

This problem still happens in F12 beta on a Clevo D870P with Mobility Radeon 9700.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

David you more than likely have another issue, this one is specific to X1400 on T60. Please open a new bug with dmesg, Xorg.log and a description of what you are seeing on resume. Also if it's an AGP GPU try booting with radeon.agpmode=-1 and report in the bug if it works when doing that. Thanks.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Created attachment 367798
X1400 restore mc+hdp before asic_init and put vram at 0x10000000

Please try this patch, top of drm-next again, hope it works

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

The last two patches still have the problem.
BTW, this week, I am on trip, so can not get more logs.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Created attachment 368232
X1400 restore mc+hdp before asic_init and put vram at 0x10000000 + VGA HDP

Please test new patch. In this version i program the VGA HDP, maybe some VGA stuff happens at one point. Crossing fingers, but i don't think this one will help much.

Revision history for this message
In , Ferry (ferry-redhat-bugs) wrote :

ok, a step back :-(
with the updates of today my laptop doesn't even boot anymore with KMS. I get a hard hang during modeset. using nomodeset allows the laptop to boot.

kernel.x86_64 2.6.31.5-127.fc12
xorg-x11-drv-ati.x86_64 6.13.0-0.10.20091006git457646d73.fc12
xorg-x11-server-Xorg.x86_64 1.7.1-7.fc12
mesa-dri-drivers.x86_64 7.6-0.13.fc12

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

Yes, I saw this too with drm-next at some point, since then I decided to stick with drm-linus. Anyhow, no luck with the latest patch either. :(

Can anyone of you confirm the strange effect that 90% of the times everything comes back as it should if you have the notebook unplugged when resuming?

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

(In reply to comment #50)
> Created an attachment (id=368232) [details]
> X1400 restore mc+hdp before asic_init and put vram at 0x10000000 + VGA HDP
>
> Please test new patch. In this version i program the VGA HDP, maybe some VGA
> stuff happens at one point. Crossing fingers, but i don't think this one will
> help much.

Hi Jerome,
Which version does the patch base on? I can not apply the patch on drm-next of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git successfully.

Revision history for this message
In , Jérôme (jrme-redhat-bugs) wrote :

Created attachment 368412
Shutdown lvds, force mc to be on, dump regs

Please try new patch, should apply cleanly on top of drm-next. This one take a different path i try to shutdown things at suspend and reactivate them at resume it also dump few registers which might be helpfull to further debug the issue. Please try it and attach full dmesg after a suspend/resume cycle. Thanks

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

Created attachment 368963
dmesg output

The problem and NMI still happens with last patch.

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :
Download full text (4.3 KiB)

I see the same effect. Also, nothing useful in the debug output.

I was wondering what could trigger the NMI. I read somewhere (I know that is not a very reliable citation, was somewhere in a forum) that it doesn't need to be the device itself, it might also be the bus. I looked at lspci output and also checked the PCIE bridge. It is some sort of memory access from the CPU through a PCIE "aperture" that isn't working, right? Can the bridge be at fault perhaps?

Here are the diffs for the bridge (00:01.0) and the GPU (01:00.0) between a successful resume (with power unplugged) and a failed resume:

Here the working full output of lspci -vvv 00:01.0

00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
 I/O behind bridge: 00002000-00002fff
 Memory behind bridge: ee100000-ee1fffff
 Prefetchable memory behind bridge: 00000000d8000000-00000000dfffffff
 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
 BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
  PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
 Capabilities: [88] Subsystem: Lenovo Device 2014
 Capabilities: [80] Power Management version 2
  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
  Address: 00000000 Data: 0000
 Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
   ExtTag- RBE- FLReset-
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 128 bytes, MaxReadReq 128 bytes
  DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
   ClockPM- Surprise- LLActRep- BwNot-
  LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-
   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
   Slot # 1, PowerLimit 75.000000; Interlock- NoCompl-
  SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
   Control: AttnInd Off, PwrInd On, Power- Interlock-
  SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet+ Interlock-
   Changed: MRL- PresDet- LinkState-
  RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
  RootCap: CRSVisible-
  RootSta: PME ReqID 0000, PMEStatus- PMEPending-
 Capabilities: [100] Virtual Channel <?>
 Capabilities: [140] Root Complex Link <?>
 Kernel driver in use: pcieport

and the diff to after a failed resume:

@@ -1,6 +1,6 @@
 ...

Read more...

Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

Two patches here

http://people.freedesktop.org/~airlied/scratch/0001-drm-radeon-kms-fix-handling-of-d1-d2-vga.patch
http://people.freedesktop.org/~airlied/scratch/0002-drm-radeon-kms-read-back-register-before-writing-in-.patch

Can you guys please try them, I'll try and make a Fedora kernel with them in it ASAP, they are also on the drm-radeon-testing of my drm-2.6 tree.

I've tested them on Peng's laptop with my USB disk, hopefully when he tests them with his normal install they also work.

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

I almost feel bad by telling you this, but I am now running 2.6.32-rc6 + drm-radeon-testing and made sure these two patches are in it - but it's still giving the same results as before. If the notebook is unplugged on resume, there's a 90% of it coming back correctly, otherwise not.

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

In order to avoid any confusion: With "unplugged" I am referring to the power, not an external monitor (as in the description of patch no 1).

Revision history for this message
In , Christophe (christophe-redhat-bugs) wrote :

OMG, I'm so stupid. I was actually booting the wrong kernel when doing my last test (I made sure the modules contained the patch but I never checked if I was actually loading it).

I can confirm this patch is fixing the issue for me. Great. :-)

Sorry for the confusion, I hope I didn't cause any additional work.

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Dave's patches also fixed the suspend/resume problem on a KMS-enabled T60/x1400 for me.

Revision history for this message
In , Peng (peng-redhat-bugs) wrote :

The patch can fix this S/R problem. Thanks

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message
In , Matěj (matj-redhat-bugs) wrote :

Thank you for letting us know.

Revision history for this message
In , Tim (tim-redhat-bugs) wrote :

Will this update be available for F-12 (closed as rawhide)?

81 comments hidden view all 103 comments
Revision history for this message
In , Jmxorg (jmxorg) wrote :

*** Bug 23479 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jmxorg (jmxorg) wrote :

eading http://airlied.livejournal.com/68550.html from
describes how the bug was fixed.
Link to the fix:
http://people.freedesktop.org/~airlied/scratch/0002-drm-radeon-kms-read-back-register-before-writing-in-.patch
The bug can also be found in redhats bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=527874
where it is marked as fixed. Unfortunately that is not where non-redhat/fedora
users would look :-(

I can confirm that this fixes my problem.
Not closing it yet as it isn't part of the kernel.org sources yet.

Changed in xserver-xorg-video-ati (Fedora):
status: Unknown → Fix Released
Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

*** Bug 23273 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jmxorg (jmxorg) wrote :

The patch has become part of the kernel.org tree with 2.6.32-rc8-git2
Closing

Changed in xserver-xorg-driver-ati:
status: Confirmed → Fix Released
Revision history for this message
Michele (mikelito) wrote :

sorry to bother, but since this seems to be fixed upstream, can anyone eitherport the patch to ubuntu repositories, or provide instructions to fix this while we wait for the changes to propagate? tkx

Revision history for this message
Bryce Harrington (bryce) wrote :

[From the upstream bug report, this is a kernel issue; refiling.]

affects: xserver-xorg-video-ati (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Triaged → New
tags: added: xorg-needs-kernel-fix
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

According to the upstream bug, "The patch has become part of the kernel.org tree with 2.6.32-rc8-git2". The Lucid kernel is based on 2.6.32. Can someone experiencing this issue test and confirm this is resolved with either a Lucid LiveCD or the 2.6.32 mainline kernel build.

LiveCD - http://cdimage.ubuntu.com/daily-live/current/
2.6.32 mainline kernel build - http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.32/

Please let us know your results. Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Frank (frank-scriptzone) wrote :

I'm on 2.6.32 (Linux piquet 2.6.32-020632-generic #020632 SMP Thu Dec 3 10:58:45 UTC 2009 i686 GNU/Linux) now.
Though this does not result in different behavior.

I'll attach the files mentioned above.

Revision history for this message
Frank (frank-scriptzone) wrote :
Revision history for this message
Frank (frank-scriptzone) wrote :

Xorg log with chrash

Revision history for this message
Frank (frank-scriptzone) wrote :

lspci output

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Frank,
    Based on your LSPCI output, I'd say that you would probably be better served opening a new bug for your specific issue. The fix noted in the upstream bug was for ATI Mobility X1400 and you appear to have slightly different hardware.

Alexander,
   As the original bug reporter, can you tell us if you are still seeing this behavior with the upstream kernel?

Thanks in advance,

-JFo

Revision history for this message
Alexander Hunziker (alex-hunziker) wrote :

AFAIK, 2.6.32 fixed this issue. In any case, the current lucid lynx suspends and resumes fine with the new ATI stack, I guess this can be marked as fixed.

Revision history for this message
Michele (mikelito) wrote :

Great to know that this has been fixed for Lynx. Any hope to have the fix backported to the Koala?
Thanks!

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Marked Fix Released.

Michele,
     Normally we only backport security issues, so I doubt this will make it into Karmic.

-JFo

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Dimitrios Ntoulas (ntoulasd) wrote :

Systems boots, no picture on screen.
Many error in dmesg (I see them remotely with ssh)

[ 3769.950744] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(5).
[ 3769.955521] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3770.444452] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(6).
[ 3770.446821] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3770.450594] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(7).
[ 3770.452966] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3770.951576] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(8).
[ 3770.956356] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3771.447347] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(9).
[ 3771.449933] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3771.453620] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(10).
[ 3771.455965] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3771.954313] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(11).
[ 3771.959109] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3772.446402] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(12).
[ 3772.448792] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[ 3772.453970] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[ 3772.456333] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !

ubuntu 10.04
Linux dimitris-laptop 2.6.32-21-generic #32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010 i686 GNU/Linux

Revision history for this message
Dimitrios Ntoulas (ntoulasd) wrote :

And Xorg.0.log

Changed in xserver-xorg-driver-ati:
importance: Unknown → High
Changed in xserver-xorg-driver-ati:
importance: High → Unknown
Changed in xserver-xorg-driver-ati:
importance: Unknown → High
Changed in xserver-xorg-video-ati (Fedora):
importance: Unknown → High
Displaying first 40 and last 40 comments. View all 103 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.