[RV730] GPU soft reset infinite loop scrolling in firefox with compiz

Bug #564181 reported by Brian Murray on 2010-04-15
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
xserver-xorg-driver-ati
Fix Released
Medium
linux (Ubuntu)
High
Unassigned
Lucid
High
Unassigned

Bug Description

This has happened approximately 3 times today - only on the 3rd time was I able to ssh into the system and examine things. I believe the test case is scrolling in firefox with compiz on. I saw the following in my dmesg.

Binary package hint: xserver-xorg-video-ati

Apr 15 12:22:16 flash kernel: [ 2660.341313] radeon 0000:01:00.0: GPU softreset
Apr 15 12:22:16 flash kernel: [ 2660.341317] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE00014A4
Apr 15 12:22:16 flash kernel: [ 2660.341320] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00300002
Apr 15 12:22:16 flash kernel: [ 2660.341324] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200030C0
Apr 15 12:22:16 flash kernel: [ 2660.501418] radeon 0000:01:00.0: Wait for MC idle timedout !
Apr 15 12:22:16 flash kernel: [ 2660.501423] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Apr 15 12:22:16 flash kernel: [ 2660.501477] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Apr 15 12:22:16 flash kernel: [ 2660.501537] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000C02
Apr 15 12:22:16 flash kernel: [ 2660.526198] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF
Apr 15 12:22:16 flash kernel: [ 2660.526201] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF
Apr 15 12:22:16 flash kernel: [ 2660.526203] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xserver-xorg-video-radeon 1:6.13.0-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.32-21.31-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-21-generic x86_64
Architecture: amd64
CheckboxSubmission: fee5e196cb921cbd36888f428b38b488
CheckboxSystem: 2a6f54df59af338184485e85cbcf0d32
Date: Thu Apr 15 13:35:13 2010
DkmsStatus:

MachineType: Dell Inc. Dell DXP051
ProcCmdLine: root=/dev/md1 ro debug ignore_loglevel
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
SourcePackage: xserver-xorg-video-ati
dmi.bios.date: 10/28/2005
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A02
dmi.board.name: 0YC523
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA02:bd10/28/2005:svnDellInc.:pnDellDXP051:pvr:rvnDellInc.:rn0YC523:rvr:cvnDellInc.:ct7:cvr:
dmi.product.name: Dell DXP051
dmi.sys.vendor: Dell Inc.
system:
 distro: Ubuntu
 codename: lucid
 architecture: x86_64
 kernel: 2.6.32-21-generic

Created an attachment (id=34395)
dmesg.log

Created an attachment (id=34396)
Xorg log

Created an attachment (id=34397)
xorg.conf

Created an attachment (id=34414)
allow bo placement in vram or gart

Does this patch help?

I tried the patch and it did not solve the regression on my machine. I found that I can only duplicate this issue when running with a window manager that does not have composite enabled. (ie. stock metacity). Running with compiz makes the slowness go away.

Yes this patch seems to have fixed the problem. Thanks a lot!

*** Bug 27296 has been marked as a duplicate of this bug. ***

Created an attachment (id=34424)
switching tabs in firefox oprofile

Sorry, I was celebrating too early. Tab switching in Opera works fine now but it's slow when using firefox. I've created another oprofile for firefox.

Not yet resolved.

Created an attachment (id=34425)
switching tabs in firefox oprofile

Sorry, I was celebrating too early. Tab switching in Opera works fine now but it's slow when using firefox. I've created another oprofile for firefox.

(In reply to comment #7)
> *** Bug 27296 has been marked as a duplicate of this bug. ***
>

Have done some more testing and it's the first commit -

dda3f5a99e7a2dc5d57860f4d07df3498e1e21df
r6xx EXA/Xv: track src/dst domains

that introduces the problems for me.

(In reply to comment #11)

Testing head + the patch does seem to solve the seamonkey perf, but VT1 is still flooded with

WRITE DOMAIN RELOC FAILURE 0xd 6 2
WRITE DOMAIN RELOC FAILURE 0xd 2 6

*** Bug 27283 has been marked as a duplicate of this bug. ***

Created an attachment (id=34434)
flush command stream if bo domain changes

Can you try this patch both with and without the previous one?

(In reply to comment #12)
> (In reply to comment #11)
>
> Testing head + the patch does seem to solve the seamonkey perf

After about 4 hrs of running this things fell apart -

7fps in glxgears

xv didn't draw.

screen taking 1/2 sec to update its self.

> --- Comment #15 from Andy Furniss <email address hidden>  2010-03-25 10:50:32 PST ---
> (In reply to comment #12)
>> (In reply to comment #11)
>>
>> Testing head + the patch does seem to solve the seamonkey perf
>
> After about 4 hrs of running this things fell apart -
>
> 7fps in glxgears
>
> xv didn't draw.
>
> screen taking 1/2 sec to update its self.
>
Is there any errors in xorg.log or dmesg when the slow down happens?

(In reply to comment #14)
> Created an attachment (id=34434) [details]
> flush command stream if bo domain changes
>
> Can you try this patch both with and without the previous one?
>

Both patches seem to (while it lasts) fix the perf but whether alone or together I still get the RELOC errors to stderr.

Both in any combination seem to fix the dmesg error I get with unpatched head -

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !

(In reply to comment #16)

> Is there any errors in xorg.log or dmesg when the slow down happens?

Nothing in dmesg - I briefly saw a different line scroll off when I quit X but my fbcon scroll back seems to be limited to a couple of lines so I can't say what it said. Now I've tried the other patch I'll try and recreate it and redirect to a file this time.

Created an attachment (id=34449)
oprofile only second patch applied

Created an attachment (id=34450)
oprofile, both patches applied

My system seems to be running fine with the second patch.

(In reply to comment #18)
> (In reply to comment #16)
>
> > Is there any errors in xorg.log or dmesg when the slow down happens?
>
> Nothing in dmesg - I briefly saw a different line scroll off when I quit X but
> my fbcon scroll back seems to be limited to a couple of lines so I can't say
> what it said. Now I've tried the other patch I'll try and recreate it and
> redirect to a file this time.
>

After waiting ages and accumulating 22k reloc errors I decided to retrace my steps and so now I've managed to find a way to trigger this - just use flash - something which doesn't usually happen as I run flashblock. Unblocking a flash totally trashes perf even after seamonkey is closed

The error when perf is trashed is -

space check failed in flush

oprofile (not that I totally trust it) shows most time in -

1116680 80.3342 libpixman-1.so.0.17.3 libpixman-1.so.0.17.3 pixman_blt_mmx

It happens with unpatched head and the first patch.

Running with the second patch alone or + the first patch fixes it.

(In reply to comment #21)

> I've managed to find a way to trigger this - just use flash -
> something which doesn't usually happen as I run flashblock. Unblocking a flash
> totally trashes perf even after seamonkey is closed

More testing shows that not just any flash will trigger it, but this one does -

http://www.speedtest.bbmax.co.uk/

(In reply to comment #21)
> (In reply to comment #18)
> > (In reply to comment #16)
> >
> > > Is there any errors in xorg.log or dmesg when the slow down happens?
> >
> > Nothing in dmesg - I briefly saw a different line scroll off when I quit X but
> > my fbcon scroll back seems to be limited to a couple of lines so I can't say
> > what it said. Now I've tried the other patch I'll try and recreate it and
> > redirect to a file this time.
> >
>
> After waiting ages and accumulating 22k reloc errors I decided to retrace my
> steps and so now I've managed to find a way to trigger this - just use flash -
> something which doesn't usually happen as I run flashblock. Unblocking a flash
> totally trashes perf even after seamonkey is closed
>
> The error when perf is trashed is -
>
> space check failed in flush
>
> oprofile (not that I totally trust it) shows most time in -
>
> 1116680 80.3342 libpixman-1.so.0.17.3 libpixman-1.so.0.17.3
> pixman_blt_mmx
>
> It happens with unpatched head and the first patch.
>
> Running with the second patch alone or + the first patch fixes it.
>

It's much better with the patches (I've tested it only with both patches applied) but it's still possibly to provoke the slowdown. I just have to open a few youtube tabs (videos paused). pixman_blt_mmx still seems to be problematic.

Created an attachment (id=34470)
slowdown, both patches applied

I'm have no idea if those oprofiles are really needed. Please let me know if you find them useful. I don't want to flood this bug report with oprofiles which no one needs.

> Created an attachment (id=34470)
>  --> (http://bugs.freedesktop.org/attachment.cgi?id=34470)
> slowdown, both patches applied
>
> I'm have no idea if those oprofiles are really needed. Please let me know if
> you find them useful. I don't want to flood this bug report with oprofiles
> which no one needs.
>

They are good but there is still some missing information that has to be solved.

What is causing the pixman calls? (pixman is software rasterizer)

(In reply to comment #22)
> (In reply to comment #21)
>
> > I've managed to find a way to trigger this - just use flash -
> > something which doesn't usually happen as I run flashblock. Unblocking a flash
> > totally trashes perf even after seamonkey is closed
>
> More testing shows that not just any flash will trigger it, but this one does -
>
> http://www.speedtest.bbmax.co.uk/
>

Might be related to bug #15293. In that case the performance hit was caused by flash reading back the video in order to draw stuff (e.g. the controls) over it.
I did see a bit of activity in pixman, though it was nowhere near what you're seeing.

(In reply to comment #21)

> The error when perf is trashed is -
>
> space check failed in flush
>
> oprofile (not that I totally trust it) shows most time in -
>
> 1116680 80.3342 libpixman-1.so.0.17.3 libpixman-1.so.0.17.3
> pixman_blt_mmx
>
> It happens with unpatched head and the first patch.
>
> Running with the second patch alone or + the first patch fixes it.

I was too hasty in saying the second patch fixes it - I can still trigger, it just takes a bit longer.

With patch2 I don't see the "space check failed in flush" errors when it happens.

The pixman oprofile above was running glxgears after triggering and closing seamonkey.

If I take a profile while just moving an xterm around (which is only redrawing at 2fps) then libc memcpy is the cpu hog and libpixman barely shows.

Created an attachment (id=34512)
sysprof of glxgears running at 7fps

(In reply to comment #25)

> They are good but there is still some missing information that has to be
> solved.
>
> What is causing the pixman calls? (pixman is software rasterizer)

In case it shows anything more than oprofile.
Here's a sysprof of glxgears running at 7fps after I've triggered the bug.

It does look like the X driver is falling back to software for everything for some reason.

bc93395b3eb5e3511c1b62af90693269f4fa6e13 should hopefully fix this.

(In reply to comment #30)
> bc93395b3eb5e3511c1b62af90693269f4fa6e13 should hopefully fix this.
>

I'm using git version 6baa96c44ca93b88acf5233335cee233e59d5af4 and wasn't able to trigger the software fallback. Hopefully this bug is really fixed as it was not clear what actually triggered the bug.

i have no problem anymore ( Bug 27283), with last git.

thanks !

(In reply to comment #30)
> bc93395b3eb5e3511c1b62af90693269f4fa6e13 should hopefully fix this.
>

Todays head is working OK for me.

I'm still experiencing similar effects. Firefox triggers the "[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !" messages when scrolling up/down fast. Sometimes Xorg freezes. Remote ssh is still possible, but I didn't get a shell, just motd. At one occasion it didn't crash complete and i got the following dmesg output (attachment). Xorg.0.log is not reporting anything special. Maybe it's a different bug, but the message is still the same...

Software is:
Kernel: 2.6.33-2-amd64 from Debian experimental
libdrm2: 2.4.18-4
libgl: 7.7.1-1
radeon: 1:6.13.0-1
xorg-core: 2:1.7.6-2

Hardware:
01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3470 (prog-if 00 [VGA controller])
 Subsystem: PC Partner Limited Device e390
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Interrupt: pin A routed to IRQ 34
 Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
 Region 2: Memory at d0100000 (64-bit, non-prefetchable) [size=64K]
 Region 4: I/O ports at 2000 [size=256]
 Expansion ROM at d0120000 [disabled] [size=128K]
 Capabilities: <access denied>
 Kernel driver in use: radeon

Created an attachment (id=35012)
dmesg output on hang

Created an attachment (id=35013)
Xorg.log after hang

Binary package hint: xserver-xorg-video-ati

Apr 15 12:22:16 flash kernel: [ 2660.341313] radeon 0000:01:00.0: GPU softreset
Apr 15 12:22:16 flash kernel: [ 2660.341317] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE00014A4
Apr 15 12:22:16 flash kernel: [ 2660.341320] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00300002
Apr 15 12:22:16 flash kernel: [ 2660.341324] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200030C0
Apr 15 12:22:16 flash kernel: [ 2660.501418] radeon 0000:01:00.0: Wait for MC idle timedout !
Apr 15 12:22:16 flash kernel: [ 2660.501423] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Apr 15 12:22:16 flash kernel: [ 2660.501477] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Apr 15 12:22:16 flash kernel: [ 2660.501537] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000C02
Apr 15 12:22:16 flash kernel: [ 2660.526198] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF
Apr 15 12:22:16 flash kernel: [ 2660.526201] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF
Apr 15 12:22:16 flash kernel: [ 2660.526203] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xserver-xorg-video-radeon 1:6.13.0-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.32-21.31-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-21-generic x86_64
Architecture: amd64
CheckboxSubmission: fee5e196cb921cbd36888f428b38b488
CheckboxSystem: 2a6f54df59af338184485e85cbcf0d32
Date: Thu Apr 15 13:35:13 2010
DkmsStatus:

MachineType: Dell Inc. Dell DXP051
ProcCmdLine: root=/dev/md1 ro debug ignore_loglevel
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
SourcePackage: xserver-xorg-video-ati
dmi.bios.date: 10/28/2005
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A02
dmi.board.name: 0YC523
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA02:bd10/28/2005:svnDellInc.:pnDellDXP051:pvr:rvnDellInc.:rn0YC523:rvr:cvnDellInc.:ct7:cvr:
dmi.product.name: Dell DXP051
dmi.sys.vendor: Dell Inc.
system:
 distro: Ubuntu
 codename: lucid
 architecture: x86_64
 kernel: 2.6.32-21-generic

Brian Murray (brian-murray) wrote :
description: updated
summary: - GPU soft reset infinite loop
+ [RV730] GPU soft reset infinite loop

I was able to recreate this again by dragging around the world in googleearth. This time I lost network connectivity.

Brian Murray (brian-murray) wrote :

I was able to recreate it with compiz disabled and also with 2.6.32-19 which I was running for quite some time with no issues.

Brian Murray (brian-murray) wrote :

So I got netconsole setup and sending messages to another system these are the last messages before the signal went dead.

[ 2984.193110] [drm:radeon_fence_wait] *ERROR* fence(ffff88000dbea700:0x00015BCB) 510ms timeout going to reset GPU
[ 2984.193126] radeon 0000:01:00.0: GPU softreset
[ 2984.193133] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xA53224A4
[ 2984.193139] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00030002
[ 2984.193144] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200030C0
[ 2984.351699] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 2984.351706] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
[ 2984.351763] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
[ 2984.351834] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000C02
[ 2984.376075] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF
[ 2984.376081] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF
[ 2984.376086] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF
[ 2984.390814] [drm:radeon_fence_wait] *ERROR* fence(ffff88000dbea700:0x00015BCB) 710ms timeout
[ 2984.390820] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x00015BCB)

It is no longer possible to communicate with computer ship flash.

Brian Murray (brian-murray) wrote :

Attached is an incomplete output of radeontool regmatch '*' - apparently it crashed while in the process of running it.

Bryce Harrington (bryce) on 2010-04-15
summary: - [RV730] GPU soft reset infinite loop
+ [RV730] GPU soft reset infinite loop scrolling in firefox with compiz
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Triaged
importance: Undecided → High
Bryce Harrington (bryce) wrote :

Brian Murray - I've forwarded this bug upstream to http://bugs.freedesktop.org/show_bug.cgi?id=27678 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Bryce Harrington (bryce) wrote :

A few other ideas to try out:

There have been some updates to -ati recently, and since you indicate this to be a regression that occurred in the last day, this would be the first thing I would suggest looking at. None of the ubuntu changes look like they could cause this kind of a bug, but who knows. It would be helpful if you could downgrade and test the following versions:

  * -ati: 1:6.13.0-1ubuntu1 - this will rule out two (innocuous?) changes in the last couple days
  * -ati: 1:6.12.192-2ubuntu2 or earlier - this will rule out upstream changes we brought in about a week ago
  * -mesa: 7.7.1-1ubuntu1 - this will rule out a change in the last couple days which I think is innocuous but who knows
  * xorg-server: 2:1.7.6-2ubuntu3 - this will rule out several patches recently added to xserver which were taken from upstream that look safe but may have changed something unexpectedly.

I haven't looked at what has changed in the kernel recently, but you probably have the -19 or -20 kernel. If you find you cannot reproduce it after booting into an earlier kernel that could point to a regression in the kernel.

Beyond that, look in your /var/log/dpkg.log file to see what got updated and go through downgrading them until you find a good suspect.

Brian Murray (brian-murray) wrote :

It seems to be working quite well with xserver-xorg-video-ati 1:6.12.192-2ubuntu2.

2010-04-15 18:29:36 upgrade xserver-xorg-video-ati 1:6.13.0-1ubuntu3 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 status half-configured xserver-xorg-video-ati 1:6.13.0-1ubuntu3
2010-04-15 18:29:36 status unpacked xserver-xorg-video-ati 1:6.13.0-1ubuntu3
2010-04-15 18:29:36 status half-installed xserver-xorg-video-ati 1:6.13.0-1ubuntu3
2010-04-15 18:29:36 status triggers-pending man-db 2.5.7-2
2010-04-15 18:29:36 status half-installed xserver-xorg-video-ati 1:6.13.0-1ubuntu3
2010-04-15 18:29:36 status half-installed xserver-xorg-video-ati 1:6.13.0-1ubuntu3
2010-04-15 18:29:36 status unpacked xserver-xorg-video-ati 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 status unpacked xserver-xorg-video-ati 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 configure xserver-xorg-video-ati 1:6.12.192-2ubuntu2 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 status unpacked xserver-xorg-video-ati 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 status half-configured xserver-xorg-video-ati 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 status triggers-awaited xserver-xorg-video-ati 1:6.12.192-2ubuntu2
2010-04-15 18:29:36 trigproc man-db 2.5.7-2 2.5.7-2
2010-04-15 18:29:36 status half-configured man-db 2.5.7-2
2010-04-15 18:29:36 status installed xserver-xorg-video-ati 1:6.12.192-2ubuntu2
2010-04-15 18:29:37 status installed man-db 2.5.7-2

Brian Murray (brian-murray) wrote :

I did also try with xserver-xorg-video-ati 1:6.13.0-1ubuntu2 and that didn't work out so well. Okay it crashed too!

*** Bug 27678 has been marked as a duplicate of this bug. ***

still seems to be problematic.

Alex, on bug 27678 Brian has identified that Ubuntu's 1:6.12.192-2ubuntu2 does not show the problem, so it looks like this is a regression between 6.12.192 and 6.13.0 if that helps.

(In reply to comment #39)
> Alex, on bug 27678 Brian has identified that Ubuntu's 1:6.12.192-2ubuntu2 does
> not show the problem, so it looks like this is a regression between 6.12.192
> and 6.13.0 if that helps.

Ubuntu's 6.12.192-2ubuntu2 mentioned above was a git checkout up to commit 5c256808cb5fea955eea96ffe9196473715156aa
"XAA: disable render accel"

after the 6.12.192 tag for future reference.

The debian version 1:6.12.192-2 also works fine on my system.

Can you narrow down with op is causing the problem? Add:
return FALSE;
to the top of R600PrepareCopy() or R600PrepareSolid() or R600UploadToScreenCS() or R600DownloadFromScreenCS() or R600PrepareComposite() in r600_exa.c and see if any of them prevent the problem.

I tried disabling, one at a time. Results are not very useful, I fear:

1 Disabling R600PrepareCopy resulted in slow FF scrolling

2 Disabling R600PrepareSolid: Xorg freeze

3 Disabling R600UploadToScreenCS: Xorg freeze

4 Disabling R600DownloadFromScreenCS: Xorg freeze

5 Disabling R600PrepareComposite: no freeze, just slow scrolling. I tried it for ~20 minutes, methods 2, 3, 4 crashed after <5 minutes.

The funny thing is, that I got no "Failed to parse relocation" dmesg errors. Maybe debian/ubuntu isn't using the git tag xf86-video-ati-6.13.0 as I did in these tests...

Bryce Harrington (bryce) wrote :

Sarvatt has extracted the patches that added the -ati accel stuff to 6.13.0:
http://sarvatt.com/downloads/radeon/

Bryce Harrington (bryce) wrote :

I've added the reversion patches to this PPA:

https://edge.launchpad.net/~bryceharrington/+archive/silver

Brian, please test and confirm this does indeed solve the issue. Sarvatt, your feedback on doing this revert in Lucid would be appreciated.

Brian Murray (brian-murray) wrote :

The packages from the PPA do seem to have resolved my issue. Thanks!

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-ati - 1:6.13.0-1ubuntu5

---------------
xserver-xorg-video-ati (1:6.13.0-1ubuntu5) lucid; urgency=low

  * Add 103_new_pci_ids.patch: Add support for newer ATI hardware. Adds
    PCI IDs for a number of RV7xx chips and one Redwood.
  * Revert recent performance enhancement work included in 6.13.0, as it
    appears to regress performance fairly severely in some circumstances
    such as using googleearth.
    (LP: #564181, #563400)
    + 0001-Revert-r600-exa-further-cleanup-use-the-object-struc.patch
    + 0002-Revert-r600-cleanup-wasteful-variables.patch
    + 0003-Revert-r600-reduce-function-call-overhead.patch
    + 0004-Revert-r6xx-EXA-fix-swapped-domains-in-kms-UTS.patch
    + 0005-Revert-r6xx-EXA-Xv-add-a-R600SetAccelState-function.patch
    + 0006-Revert-r6xx-EXA-always-use-a-temp-surface-for-overla.patch
    + 0007-Revert-r6xx-EXA-always-use-the-accel_state-state-in-.patch
    + 0008-Revert-r6xx-EXA-Xv-track-src-dst-domains.patch
 -- Bryce Harrington <email address hidden> Fri, 16 Apr 2010 15:20:49 -0700

Changed in xserver-xorg-video-ati (Ubuntu Lucid):
status: Triaged → Fix Released
Brian Murray (brian-murray) wrote :

This came up again this morning and this is all I caught from netconsole.

[56149.934658] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF
[56149.934664] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF
[56149.934668] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF
[56149.947673] [drm:radeon_fence_wait] *ERROR* fence(ffff88005094bd40:0x000FC2D9) 710ms timeout
[56149.947680] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x000FC2D9)

Michael (starkpc) wrote :

I still have problems with my ATi Radeon HD 4350.

dpkg -s xserver-xorg-video-ati
Version: 1:6.13.0-1ubuntu5

uname -a
Linux 2.6.32-21-generic #32-Ubuntu SMP Fri Apr 16 08:09:38 UTC 2010 x86_64 GNU/Linux

Apr 17 16:48:42 mpc kernel: [ 7555.089804] radeon 0000:02:00.0: GPU softreset
Apr 17 16:48:42 mpc kernel: [ 7555.089807] radeon 0000:02:00.0: R_008010_GRBM_STATUS=0xE00014A4
Apr 17 16:48:42 mpc kernel: [ 7555.089810] radeon 0000:02:00.0: R_008014_GRBM_STATUS2=0x00100002
Apr 17 16:48:42 mpc kernel: [ 7555.089813] radeon 0000:02:00.0: R_000E50_SRBM_STATUS=0x200020C0
Apr 17 16:48:42 mpc kernel: [ 7555.246660] radeon 0000:02:00.0: Wait for MC idle timedout !
Apr 17 16:48:42 mpc kernel: [ 7555.246663] radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Apr 17 16:48:42 mpc kernel: [ 7555.246716] radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Apr 17 16:48:42 mpc kernel: [ 7555.246775] radeon 0000:02:00.0: R_000E60_SRBM_SOFT_RESET=0x00000C02
Apr 17 16:48:42 mpc kernel: [ 7555.271045] radeon 0000:02:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF
Apr 17 16:48:42 mpc kernel: [ 7555.271047] radeon 0000:02:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF
Apr 17 16:48:42 mpc kernel: [ 7555.271050] radeon 0000:02:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF

Changed in xserver-xorg-video-ati (Ubuntu Lucid):
status: Fix Released → New
Brian Murray (brian-murray) wrote :

I received some additional information that I had not seen before with my latest crash. Notice the last three lines here.

[ 4754.254728] [drm:radeon_fence_wait] *ERROR* fence(ffff88006c4324c0:0x000102A4) 510ms timeout going to reset GPU
[ 4754.254746] radeon 0000:01:00.0: GPU softreset
[ 4754.254752] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE00014A4
[ 4754.254757] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00300002
[ 4754.254763] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200030C0
[ 4754.412213] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 4754.412220] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
[ 4754.412280] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
[ 4754.412342] radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000C02
[ 4754.436581] radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xFFFFFFFF
[ 4754.436586] radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0xFFFFFFFF
[ 4754.436591] radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0xFFFFFFFF
[ 4754.451314] [drm:radeon_fence_wait] *ERROR* fence(ffff88006c4324c0:0x000102A4) 720ms timeout
[ 4754.451320] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x000102A4)
[ 4754.460002] Uhhuh. NMI received for unknown reason a1 on CPU 0.
[ 4754.460002] You have some hardware problem, likely on the PCI bus.
[ 4754.460002] Dazed and confused, but trying to continue

*** Bug 24003 has been marked as a duplicate of this bug. ***

Bryce Harrington (bryce) on 2010-04-18
tags: added: compiz

I don't know if it is helpful but here I just get slow scrolling in firefox.

A good example of extremely slow webpage is http://www.ofai.at/research/agents/conf/at2ai7/

And I also get the error:
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !

But that's all.

Bryce Harrington (bryce) wrote :

Yeah those last three lines are just generic kernel "something went wrong" error messages. The actually relevant bits are the lines above it.

Changed in xserver-xorg-video-ati (Ubuntu Lucid):
status: New → Triaged
Brian Murray (brian-murray) wrote :

This happened again today but this time I was able to connect via ssh - for what its worth I had booted with "pci=nomsi" this time. I ran the radeontool command this time and got a complete regmatch.

Brian Murray (brian-murray) wrote :

I've been testing the packages from ppa:ubuntu-x-swat/x-updates as documented at https://wiki.ubuntu.com/X/Testing/GEMLeak and have yet to recreate this bug.

Bryce Harrington (bryce) wrote :

Brian suggested #568605 as a possible dupe, but I think I'd like to keep the bugs separate until we know for sure. Freeze bugs can be awfully hard to distinguish from each other.

Toby Meehan (themeehans) wrote :

Immediately after upgrading to Kubuntu 10.04 AMD64, the system would enter this video loop symptom as described after login and as items were added to the system tray. I removed several packages (hplip-gui and kdebluetooth) that load into the system tray and it had no affect.

I booted into recovery mode as root and started X. It came up but was notified that Nepomuk needed Virtuoso and that there were some missing language files. I corrected the missing language files and installed the virtuoso-server package, and then was able to login without the system going into this loop mode for several hours.

When it did enter the video loop state again, it occurred immediately after I removed a USB flash drive. It does not happen consistently as I was unable to reproduce this. This parallels bugs 568605 and 565323. I hope this provides some clue as to what's happening.

I am closing this bug as the original issue is fixed, please test a kernel which has the e86527533586259875f08fccb173e3347046cc3f commit and if such kernel fails open a new bug and attach full dmesg + full lspci -v output.

You can test :
http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=shortlog;h=refs/heads/drm-radeon-testing

Hi. I have manually applied the patch that the mentioned commit consists of, but the problem only seems mitigated, not fixed.

In dmesg I just found this:
radeon 0000:01:05.0: ffff880109913200 reserve failed for wait

Video device is:
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3200 Graphics (prog-if 00 [VGA controller])

Also, I was wondering whether an issue which was apparently pinpointed as a radeon driver regression could be fixed by a kernel patch alone (of course it's possible, I was just wondering). And then, would it be possible that it's not that commit alone, but a "family" of patches that fixed the issue?

Honestly I don't feel I have the authority to reopen this bug, but I'm not sure it can actually be called "resolved-fixed".

I am obviously available to provide detailed information and to help troubleshooting and isolating even further the issue, if needed.

Jesse Sweetland (sweetlandj) wrote :

Re: #19

I was having this same error recently until I followed the instructions in the X/Testing/GEMLeak wiki page (https://wiki.ubuntu.com/X/Testing/GEMLeak). After upgrading to the PPA packages glxinfo started to report OpenGL 1.2, my GPU reset problems went away, and things got much snappier.

Then I noticed that the issue referenced by the wiki (https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/565981) was closed with the status "Fix Released", which means that, being up to date, I presumably should have already had the fix. (Unless by "Fix Release" they meant the PPA packages.)

Looking at the wiki page it seems like testing is still underway, but perhaps it's just out of date. If the fix actually was released into the official repositories, then it seems like something is different between the PPA and final packages, as I no longer have this issue. (Previously I'd get GPU reset loops 3 times a day.)

Since this ticket is still open I thought I'd comment on it in case anyone else is still in the same boat.

Changed in xserver-xorg-driver-ati:
importance: Unknown → Medium
status: Unknown → Fix Released
Alendit (alendit) wrote :

Hey,

the issue isn't fixed for me in Maverick 32-bit.

Using Radeon 2600XT DDR2.

Attaching dmesg and Xorg.log`

Alendit (alendit) wrote :

My Xorg.log

Alex, that last error looks to be a different one to this bug. See bug 693754 which I filed yesterday - looks more likely to be relevant.

Changed in xserver-xorg-driver-ati:
importance: Medium → Unknown
Changed in xserver-xorg-driver-ati:
importance: Unknown → Medium
Jan K. (jan-launchpad-kantert) wrote :

Since the last ubuntu update this happens to me every time i try to watch a video in youtube.

Linux xxxx 2.6.32-29-generic #58-Ubuntu SMP Fri Feb 11 20:52:10 UTC 2011 x86_64 GNU/Linux

Distributor ID: Ubuntu
Description: Ubuntu 10.04.2 LTS
Release: 10.04
Codename: lucid

radeon 0000:01:00.0: GPU softreset
radeon 0000:01:00.0: R_008010_GRBM_STATUS=0xE57024A4
radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00110302
radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
radeon 0000:01:00.0: R_000E60_SRBM_SOFT_RESET=0x00000402
radeon 0000:01:00.0: R_008010_GRBM_STATUS=0x00003028
radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000002
radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0: GPU softreset
radeon 0000:01:00.0: R_008010_GRBM_STATUS=0x00003028
radeon 0000:01:00.0: R_008014_GRBM_STATUS2=0x00000002
radeon 0000:01:00.0: R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001

Bryce Harrington (bryce) wrote :

Looks like this was fixed in the kernel. Retargeting, and leaving the lucid task open for backporting purposes, if desired.

affects: xserver-xorg-video-ati (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Triaged → Fix Released
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in linux (Ubuntu Lucid):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.