Resume on glamor cause GPU lockup/screen corruption on Radeon 3100 Graphics" (ChipID = 0x9611) when shadowPrimary is off

Bug #1944991 reported by Paul Dufresne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-ati (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

GPU: ATI Radeon 3100 Graphics (ChipID = 0x9611) (RS780)

GPU Lockup message in dmesg output after resume:
[ 916.000550] radeon 0000:01:05.0: ring 0 stalled for more than 10084msec
[ 916.000569] radeon 0000:01:05.0: GPU lockup (current fence id 0x00000000000020cc last fence id 0x00000000000020d6 on ring 0)
[ 916.516556] radeon 0000:01:05.0: ring 0 stalled for more than 10600msec
[ 916.516569] radeon 0000:01:05.0: GPU lockup (current fence id 0x00000000000020cc last fence id 0x00000000000020d6 on ring 0)
[ 917.024566] radeon 0000:01:05.0: ring 0 stalled for more than 11108msec
[ 917.024575] radeon 0000:01:05.0: GPU lockup (current fence id 0x00000000000020cc last fence id 0x00000000000020d6 on ring 0)
[ 917.536558] radeon 0000:01:05.0: ring 0 stalled for more than 11620msec
[ 917.536570] radeon 0000:01:05.0: GPU lockup (current fence id 0x00000000000020cc last fence id 0x00000000000020d6 on ring 0)
[ 918.048571] radeon 0000:01:05.0: ring 0 stalled for more than 12132msec
[ 918.048580] radeon 0000:01:05.0: GPU lockup (current fence id 0x00000000000020cc last fence id 0x00000000000020d6 on ring 0)
...
[ 935.726783] radeon 0000:01:05.0: GPU softreset: 0x00000019
[ 935.726788] radeon 0000:01:05.0: R_008010_GRBM_STATUS = 0xE57034E0
[ 935.726793] radeon 0000:01:05.0: R_008014_GRBM_STATUS2 = 0x00110103
[ 935.726797] radeon 0000:01:05.0: R_000E50_SRBM_STATUS = 0x20000040
[ 935.726801] radeon 0000:01:05.0: R_008674_CP_STALLED_STAT1 = 0x01000000
[ 935.726805] radeon 0000:01:05.0: R_008678_CP_STALLED_STAT2 = 0x00001002
[ 935.726809] radeon 0000:01:05.0: R_00867C_CP_BUSY_STAT = 0x00028486
[ 935.726813] radeon 0000:01:05.0: R_008680_CP_STAT = 0x80838645
[ 935.726817] radeon 0000:01:05.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
[ 935.779906] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
[ 935.779962] radeon 0000:01:05.0: SRBM_SOFT_RESET=0x00000100
[ 935.782061] radeon 0000:01:05.0: R_008010_GRBM_STATUS = 0xA0003030
[ 935.782066] radeon 0000:01:05.0: R_008014_GRBM_STATUS2 = 0x00000003
[ 935.782070] radeon 0000:01:05.0: R_000E50_SRBM_STATUS = 0x20008040
[ 935.782074] radeon 0000:01:05.0: R_008674_CP_STALLED_STAT1 = 0x00000000
[ 935.782078] radeon 0000:01:05.0: R_008678_CP_STALLED_STAT2 = 0x00000000
[ 935.782081] radeon 0000:01:05.0: R_00867C_CP_BUSY_STAT = 0x00000000
[ 935.782085] radeon 0000:01:05.0: R_008680_CP_STAT = 0x80100000
[ 935.782089] radeon 0000:01:05.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
[ 935.782097] radeon 0000:01:05.0: GPU reset succeeded, trying to resume
Graphics is broken... text hardly readable, interface not responding.

First observed on latest Mint (20.2)... then discovered Peppermint 10 was not affected.
I believe Peppermint 10 is based on 18.04 and I am unsure why it is not affected.

I tested with many versions of Ubuntu...
Before 17.04... no problem... but was using EXA acceleration.
17.04 first to use Glamor... screen corruption is not obvious... but dmesg show
GPU lockup...
17.10 and following is really problematic after resume, but possible to Ctrl-Alt-F4 to go to a text console and ... sudo reboot.
There is a possibility I missed the exact version as I don't have my notes near me.

Adding Option "AccelMethod" "EXA"
at the good place in xorg.conf file have fixed the issue.

I am unlikely to have access again to that specific computer in the future.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu69
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 6610 F.... pulseaudio
CasperMD5CheckResult: pass
CasperVersion: 1.465
DistroRelease: Ubuntu 21.10
IwConfig:
 lo no wireless extensions.

 enp63s0 no wireless extensions.
LiveMediaBuild: Ubuntu 21.10 "Impish Indri" - Alpha amd64 (20210920)
MachineType: Hewlett-Packard HP Compaq dc5850 Small Form Factor
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/hostname.seed maybe-ubiquity quiet splash ---
ProcVersionSignature: Ubuntu 5.13.0-16.16-generic 5.13.13
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.13.0-16-generic N/A
 linux-backports-modules-5.13.0-16-generic N/A
 linux-firmware 1.200
RfKill:

Tags: impish
Uname: Linux 5.13.0-16-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 11/15/2011
dmi.bios.release: 3.14
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 786F6 v03.14
dmi.board.name: 3029h
dmi.board.vendor: Hewlett-Packard
dmi.chassis.type: 4
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr786F6v03.14:bd11/15/2011:br3.14:svnHewlett-Packard:pnHPCompaqdc5850SmallFormFactor:pvr:skuAP417US#ABC:rvnHewlett-Packard:rn3029h:rvr:cvnHewlett-Packard:ct4:cvr:
dmi.product.family: 103C_53307F
dmi.product.name: HP Compaq dc5850 Small Form Factor
dmi.product.sku: AP417US#ABC
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Paul Dufresne (dufresnep) wrote :

I note that daily Ubuntu [will become Ubuntu 21.10] (of yesterday) was affected by this problem.

Revision history for this message
Paul Dufresne (dufresnep) wrote :

The BIOS of that computer have some ACPI bugs: (I believe latest BIOS but unsure... I did upgrade to 3.14) but still showing as 3.14.
DMI: Hewlett-Packard HP Compaq dc5850 Small Form Factor/3029h, BIOS 786F6 v03.14 11/15/2011

[ 0.283156] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[ 0.283163] ACPI BIOS Error (bug): \_SB.PCI0._OSC: Excess arguments - ASL declared 5, ACPI requires 4 (20210331/nsarguments-162)
[ 0.283317] ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0._OSC.CAPD], AE_ALREADY_EXISTS (20210331/dsfield-184)
[ 0.283373] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210331/dswload2-477)

[ 0.283422]
               Initialized Local Variables for Method [_OSC]:
[ 0.283424] Local0: (____ptrval____) <Obj> Integer 0000000000000001
[ 0.283432] Local1: (____ptrval____) <Obj> Integer 0000000000000005

[ 0.283439] Initialized Arguments for Method [_OSC]: (5 arguments defined for method invocation)
[ 0.283440] Arg0: (____ptrval____) <Obj> Buffer(16) 5B 4D DB 33 F7 1F 1C 40
[ 0.283452] Arg1: (____ptrval____) <Obj> Integer 0000000000000001
[ 0.283457] Arg2: (____ptrval____) <Obj> Integer 0000000000000003
[ 0.283462] Arg3: (____ptrval____) <Obj> Buffer(12) 01 00 00 00 1F 01 00 00

[ 0.283479] ACPI Error: Aborting method \_SB.PCI0._OSC due to previous error (AE_ALREADY_EXISTS) (20210331/psparse-529)
[ 0.283536] acpi PNP0A08:00: _OSC: platform retains control of PCIe features (AE_ALREADY_EXISTS)
[ 0.283549] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1944991

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Paul Dufresne (dufresnep) wrote : Re: Resume with glamor cause GPU lockup/screen corruption on Radeon 3100 Graphics" (ChipID = 0x9611)

I am unsure if I will have access again to the computer.
But if that's the case, I intend the suggested apport-collect command.

Also, to do my tests I did use:
sudo su
echo mem > /sys/state/power
to cause suspend to ram.

Revision history for this message
Paul Dufresne (dufresnep) wrote :
Revision history for this message
Paul Dufresne (dufresnep) wrote :

I confirm I will not have access to that particular machine...
But maybe I will find an other computer affected by this problem.

Revision history for this message
Paul Dufresne (dufresnep) wrote :
Revision history for this message
Paul Dufresne (dufresnep) wrote :

Suggested /etc/X11/xorg.conf file.
I just edited it a bit... not really tested.

Revision history for this message
Paul Dufresne (dufresnep) wrote :

If you think about using this with amdgpu driver rather than radeon... use 'Option "AccelMethod" "none"' rather than EXA... because I think, not sure, amdgpu does not support EXA.

Revision history for this message
Paul Dufresne (dufresnep) wrote :

I found the following link:
https://forum.manjaro.org/t/new-install-amd-fx-6300-wont-wake-from-suspend/39471/10

Suggesting that setting TearFree on may help (RS780L)
However... I wonder if ShadowPrimary did not help.

Section "Device"
    Identifier "Radeon"
    Driver "radeon"
    Option "AccelMethod" "glamor" #legacy "exa"
    Option "DRI" "3" #legacy "2"
    Option "TearFree" "on"
    Option "ColorTiling" "on"
    Option "ColorTiling2D" "on"
    Option "ShadowPrimary" "on" #only possible with "glamor" AccelMethod
EndSection

Note that it should be in a file /usr/share/X11/xorg.conf.d/20-radeon.conf on Ubuntu.
And should rename my suggested /etc/X11/xorg.conf to xorg.conf.old.

I might have temporary access to the computer in the next 48h to test it.

Revision history for this message
Paul Dufresne (dufresnep) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected impish
description: updated
Revision history for this message
Paul Dufresne (dufresnep) wrote : CRDA.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : Lspci.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : Lspci-vt.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : Lsusb.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : Lsusb-t.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : Lsusb-v.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : PaInfo.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : ProcModules.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : UdevDb.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : WifiSyslog.txt

apport information

Revision history for this message
Paul Dufresne (dufresnep) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Paul Dufresne (dufresnep) wrote : Re: Resume with glamor cause GPU lockup/screen corruption on Radeon 3100 Graphics" (ChipID = 0x9611)

I see in dmesg log after resume:
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: debugfs: File 'radeon_ring_gfx' in directory '0' already pr>

Also:
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: serial 00:04: disabled
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: parport_pc 00:03: disabled
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: ACPI: Preparing to enter system sleep state S3
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: PM: Saving platform NVS memory
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: Disabling non-boot CPUs ...
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: smpboot: CPU 1 is now offline
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: IRQ 22: no longer affine to CPU2
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: smpboot: CPU 2 is now offline
oct 01 16:48:25 client-HP-Compaq-dc5850-Small-Form-Factor kernel: ACPI: Low-level resume complete

Revision history for this message
Paul Dufresne (dufresnep) wrote :
Download full text (3.7 KiB)

It looks like suspend-resume works with glamor if I have the next two lines (I think, still unsure, that both are necessary):

Section "Device"
    #Identifier "Configured Video Device"
    #Driver "radeon"
    #Option "AccelMethod" "EXA"
    Identifier "Configured Video Device"
    Driver "radeon"
    #Option "AccelMethod" "glamor" #legacy "exa"
    #Option "DRI" "3" #legacy "2"
    #Option "TearFree" "on"
    #Option "ColorTiling" "on"
    Option "ColorTiling2D" "on"
    Option "ShadowPrimary" "on" #only possible with "glamor" AccelMethod
EndSection

The following dmesg log is successful:
[ 76.376583] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 76.378801] OOM killer disabled.
[ 76.378803] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 76.380005] printk: Suspending console(s) (use no_console_suspend to debug)
[ 76.402987] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 76.403159] sd 0:0:0:0: [sda] Stopping disk
[ 76.445147] serial 00:04: disabled
[ 76.445223] parport_pc 00:03: disabled
[ 76.847345] ACPI: Preparing to enter system sleep state S3
[ 76.848098] PM: Saving platform NVS memory
[ 76.848100] Disabling non-boot CPUs ...
[ 76.848717] IRQ 25: no longer affine to CPU1
[ 76.849735] smpboot: CPU 1 is now offline
[ 76.850726] IRQ 22: no longer affine to CPU2
[ 76.851750] smpboot: CPU 2 is now offline
[ 76.853525] ACPI: Low-level resume complete
[ 76.853608] PM: Restoring platform NVS memory
[ 76.853633] PCI-DMA: Resuming GART IOMMU
[ 76.853635] PCI-DMA: Restoring GART aperture settings
[ 76.853643] LVT offset 1 assigned for vector 0x400
[ 76.853661] LVT offset 1 assigned
[ 76.854003] Enabling non-boot CPUs ...
[ 76.854101] x86: Booting SMP configuration:
[ 76.854103] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 76.854551] microcode: CPU1: patch_level=0x01000095
[ 76.857615] CPU1 is up
[ 76.857672] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 76.858117] microcode: CPU2: patch_level=0x01000095
[ 76.861277] CPU2 is up
[ 76.863872] ACPI: Waking up from system sleep state S3
[ 76.864583] ahci 0000:00:11.0: set SATA to AHCI mode
[ 76.888031] parport_pc 00:03: activated
[ 76.889094] serial 00:04: activated
[ 76.890950] tg3 0000:3f:00.0 enp63s0: Link is down
[ 76.891910] sd 0:0:0:0: [sda] Starting disk
[ 76.893464] [drm] PCIE GART of 512M enabled (table at 0x00000000C0040000).
[ 76.893531] radeon 0000:01:05.0: WB enabled
[ 76.893537] radeon 0000:01:05.0: fence driver on ring 0 use gpu addr 0x00000000a0000c00
[ 76.893856] debugfs: File 'radeon_ring_gfx' in directory '0' already present!
[ 76.925648] [drm] ring test on 0 succeeded in 1 usecs
[ 76.925681] [drm] ib test on ring 0 succeeded in 0 usecs
[ 77.210929] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 77.210973] ata3: SATA link down (SStatus 0 SControl 300)
[ 77.211016] ata4: SATA link down (SStatus 0 SControl 300)
[ 77.219093] ata2.00: configured for UDMA/100
[ 77.737610] tpm tpm0: TPM is disabled/deactivated (0x7)
[ 77.741747] OOM killer enabled.
[ 77.741750] Restarting tasks ... done.
[ 77.759486] PM: suspend exit
[ 79.981259] tg3...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
summary: - Resume with glamor cause GPU lockup/screen corruption on Radeon 3100
- Graphics" (ChipID = 0x9611)
+ Resume on glamor cause GPU lockup/screen corruption on Radeon 3100
+ Graphics" (ChipID = 0x9611) if some options not used
Revision history for this message
Paul Dufresne (dufresnep) wrote : Re: Resume on glamor cause GPU lockup/screen corruption on Radeon 3100 Graphics" (ChipID = 0x9611) if some options not used

It looks like (with glamor) that adding:
    Option "ShadowPrimary" "on"
is enough to work around the bug on resume.

summary: Resume on glamor cause GPU lockup/screen corruption on Radeon 3100
- Graphics" (ChipID = 0x9611) if some options not used
+ Graphics" (ChipID = 0x9611) when shadowPrimary is off
affects: linux (Ubuntu) → xserver-xorg-video-ati (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.