GPU lockup ring 0 stalled for more than X msec

Bug #1863390 reported by Jamie Bainbridge
44
This bug affects 8 people
Affects Status Importance Assigned to Milestone
xserver-xorg-video-ati (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since the update:

 xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

which resulted from:

 https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-ati/+bug/1841718

I've experienced GPU freezes where all video becomes unresponsive, both Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to full. I am still able to access the system via SSH.

Sometimes dmesg ends up full of this message repeating over and over:

 radeon 0000:01:00.0: ring 0 stalled for more than 24040msec
 radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000009e44 last fence id 0x0000000000009e49 on ring 0)

I sometimes get a few GPU soft reset which seem to fail in drm(?):

 radeon 0000:01:00.0: Saved 110839 dwords of commands on ring 0.
 radeon 0000:01:00.0: GPU softreset: 0x00000008
 ...
 radeon 0000:01:00.0: Wait for MC idle timedout !
 radeon 0000:01:00.0: Wait for MC idle timedout !
 [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
 radeon 0000:01:00.0: WB enabled
 radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x00000000725651ad
 radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x00000000c3678ed8
 radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0x00000000dbd9e01b
 [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
 [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume

Even if the above reset doesn't happen, this freeze always results in a unable to handle page fault" BUG in radeon_ring_backup, entered from various call paths, eg:

 BUG: unable to handle page fault for address: ffffbc2d80574ffc
 ...
 Oops: 0000 [#1] SMP PTI
 CPU: 2 PID: 11243 Comm: kworker/2:1H Not tainted 5.5.0-050500-generic #202001262030
 Workqueue: radeon-crtc radeon_flip_work_func [radeon]
 RIP: 0010:radeon_ring_backup+0xc9/0x140 [radeon]
 Call Trace:
  radeon_gpu_reset+0xc3/0x2f0 [radeon]
  radeon_flip_work_func+0x1f3/0x250 [radeon]
  ? __schedule+0x2e0/0x760
  process_one_work+0x1b5/0x370
  worker_thread+0x50/0x3d0
  kthread+0x104/0x140
  ? process_one_work+0x370/0x370
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

or:

 BUG: unable to handle page fault for address: ffffc03901000ffc
 ...
 Oops: 0000 [#1] SMP PTI

 CPU: 3 PID: 2227 Comm: compton Not tainted 5.3.0-28-generic #30~18.04.1-Ubuntu
 RIP: 0010:radeon_ring_backup+0xd3/0x140 [radeon]
 Call Trace:
  radeon_gpu_reset+0xb9/0x340 [radeon]
  ? dma_fence_wait_timeout+0x48/0x110
  ? reservation_object_wait_timeout_rcu+0x19d/0x340
  radeon_gem_handle_lockup.part.4+0xe/0x20 [radeon]
  radeon_gem_wait_idle_ioctl+0xa6/0x110 [radeon]
  ? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
  drm_ioctl_kernel+0xb0/0x100 [drm]
  drm_ioctl+0x389/0x450 [drm]
  ? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  radeon_drm_ioctl+0x4f/0x80 [radeon]
  do_vfs_ioctl+0xa9/0x640
  ? __schedule+0x2b0/0x670
  ksys_ioctl+0x75/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x5a/0x130
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

I've tried both 5.3.0-28-generic and 5.5.0-050500-generic from kernel-ppa but that made no difference. It appears to be a bug in radeon.

Nothing specific makes this happen, just regular usage with a compositing window manager. I'm not playing games or particularly exercising the GPU. The last two times I was just reading in web browser. It's also happened in the middle of the night while I was asleep. Sometimes I have a few days uptime, sometimes it happens in less than 24 hours from boot.

This never happened before the radeon update mentioned on the first line.

I'll attach two files of dmesg output. As per https://wiki.ubuntu.com/X/Troubleshooting/Freeze I've installed and started apport for next time it happens.

Revision history for this message
Jamie Bainbridge (superjamie) wrote :
Revision history for this message
Jamie Bainbridge (superjamie) wrote :
Revision history for this message
Jamie Bainbridge (superjamie) wrote :

After happening every day for a week, this hasn't happened again since I logged this bug.

I also disabled Firefox WebRender so maybe that was a contributor.

I'll re-open if I can provide any useful data.

Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Incomplete
Revision history for this message
Wladyslaw Ostrowski (w-ostrowski78) wrote :

I have very old siemens computer. The affected it after I upgraded ram from 2 to 4 GB.
I found some rumours that the bug was introduced with kernel 4.10. I'm not able to debug the problem. Anyhow I attached dmesg log maybe it help to fix this annoying bug.

Revision history for this message
Wladyslaw Ostrowski (w-ostrowski78) wrote :
Revision history for this message
Wladyslaw Ostrowski (w-ostrowski78) wrote :

System: Host: mint Kernel: 5.0.0-32-generic x86_64 bits: 64 Desktop: Xfce 4.14.1 Distro: Linux Mint 19.3 Tricia
Machine: Type: Desktop System: FUJITSU SIEMENS product: D2030-A1 v: N/A serial: <root required>
           Mobo: FUJITSU SIEMENS model: D2030-A1 v: S26361-D2030-A1 serial: <root required> BIOS: FUJITSU SIEMENS // Phoenix
           v: 5.00 R1.07.2030.A1 date: 04/21/2006
CPU: Dual Core: AMD Athlon 64 X2 4400+ type: MCP speed: 1000 MHz min/max: 1000/2200 MHz
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] driver: radeon v: kernel
           Display: x11 server: X.Org 1.20.4 driver: ati,radeon unloaded: fbdev,modesetting,vesa resolution: 1280x1024~60Hz
           OpenGL: renderer: llvmpipe (LLVM 8.0 128 bits) v: 3.3 Mesa 19.0.8
Network: Device-1: Realtek RTL8169 PCI Gigabit Ethernet driver: r8169
           Device-2: Qualcomm Atheros AR9227 Wireless Network Adapter driver: ath9k
Drives: Local Storage: total: 506.77 GiB used: 263.8 MiB (0.1%)
Info: Processes: 170 Uptime: 7h 03m Memory: 3.91 GiB used: 2.25 GiB (57.6%) Shell: bash inxi: 3.0.32

Revision history for this message
Gauthier Ostervall (gauthier-i) wrote :
Download full text (8.4 KiB)

This happens regularly to me too.

$ uname -a
Linux ionian 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 20.04.

▐$ sudo dpkg -l | grep xserver-xorg-video
ii xserver-xorg-video-all 1:7.7+19ubuntu14 amd64 X.Org X server -- output driver metapackage
ii xserver-xorg-video-amdgpu 19.1.0-1 amd64 X.Org X server -- AMDGPU display driver
ii xserver-xorg-video-ati 1:19.1.0-1 amd64 X.Org X server -- AMD/ATI display driver wrapper
ii xserver-xorg-video-fbdev 1:0.5.0-1ubuntu1 amd64 X.Org X server -- fbdev display driver
ii xserver-xorg-video-intel 2:2.99.917+git20200226-1 amd64 X.Org X server -- Intel i8xx, i9xx display driver
rc xserver-xorg-video-modesetting 0.9.0-1build1 amd64 X.Org X server -- Generic modesetting driver
ii xserver-xorg-video-nouveau 1:1.0.16-1 amd64 X.Org X server -- Nouveau display driver
ii xserver-xorg-video-openchrome 1:0.6.0-3build1 amd64 X.Org X server -- OpenChrome display driver
ii xserver-xorg-video-qxl 0.1.5+git20200331-1 amd64 X.Org X server -- QXL display driver
ii xserver-xorg-video-radeon 1:19.1.0-1 amd64 X.Org X server -- AMD/ATI Radeon display driver
ii xserver-xorg-video-vesa 1:2.4.0-2 amd64 X.Org X server -- VESA display driver
ii xserver-xorg-video-vmware 1:13.3.0-3 amd64 X.Org X server -- VMware display driver

journalctl gives this at the start of the crash. I coincides with starting Slack (the chat program, it uses google-chrome):

Dec 09 08:56:07 ionian kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10156msec
Dec 09 08:56:07 ionian kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000010df8 last fence id 0x0000000000010e00 on ring 0)
Dec 09 08:56:08 ionian kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10216msec
Dec 09 08:56:08 ionian kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000002eef last fence id 0x0000000000002ef1 on ring 3)
Dec 09 08:56:08 ionian kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10664msec
Dec 09 08:56:08 ionian kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000010df8 last fence id 0x0000000000010e00 on ring 0)
Dec 09 08:56:08 ionian kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10728msec
Dec 09 08:56:08 ionian kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000002eef last fence id 0x0000000000002ef1 on ring 3)
Dec 09 08:56:09 ionian kernel:...

Read more...

Revision history for this message
Wladyslaw Ostrowski (w-ostrowski78) wrote :

I reduced RAM back to 2GB on my siemens computer, and all works fine.
I haven't experienced crash since then.
My computer is not stable with 4GB of ram.

Revision history for this message
Nemanja V (vooxo) wrote :

Recently I'm also experiencing this on:

Ubuntu 20.20 (Linux 5.8.0-41-generic)
AMD® A10-5750m apu with radeon(tm) hd graphics × 4

This is an older system indeed, but still...

Revision history for this message
Brian (thwaller) wrote :

I am also experiencing this issue. I was using 18.04.5 and did not have the issue. I did a new install of 20.04 with the same hardware and the problem started. It continues after upgrade to 20.10.

~~~
user@8560w:~$ sudo lshw -C display
[sudo] password for user:
  *-display
       description: VGA compatible controller
       product: Whistler [Radeon HD 6730M/6770M/7690M XT]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=radeon latency=0
       resources: irq:44 memory:c0000000-cfffffff memory:d4400000-d441ffff ioport:4000(size=256) memory:c0000-dffff

~~~

Revision history for this message
Brian (thwaller) wrote :

I hope I did this properly, but I changed status back to new. This is still an issue, and if more info is needed I can provide it, at least from my end of things.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → New
Revision history for this message
Ian! D. Allen (idallen) wrote (last edit ):

I'm now getting radeon GPU lockup with the Ubuntu 5.11.0-25 and 5.11.0-27 kernels.
UPDATE Sep 10 2021: The 5.11.0-34-generic kernel works fine.
The 5.8.0-63 kernel and all previous work fine.

$ lspci | grep ATI
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV620 GL [FirePro 2450]
06:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV620 GL [FirePro 2450]

I use three (of four possible) 1600x1200 LCD monitors on the four FirePro outputs.

Selected lines from /var/log/syslog are attached. These are the actual repeating lockup lines:

Aug 17 15:20:44 linux kernel: [ 7403.029720] radeon 0000:06:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
Aug 17 15:20:44 linux kernel: [ 7403.541687] radeon 0000:06:00.0: ring 0 stalled for more than 11304msec

The last two lines "GPU lockup" and "ring 0 stalled" repeat over and over. The keyboard and the X11 display won't respond. I can ssh into the machine, and I can type "reboot", but it hangs somewhere in the shutdown (or else it is waiting for something longer than I'm willing to wait). The system reboots when I type ALT-SYSRQ-B on the keyboard, and I select the 5.8.0-63 kernel and everything is fine.

Revision history for this message
Allan McCombs (amccombs) wrote (last edit ):

I think I have the same issue. Ubuntu 20.04.

dmesg
[350143.716757] radeon 0000:01:05.0: ring 0 stalled for more than 10220msec
[350143.716768] radeon 0000:01:05.0: GPU lockup (current fence id 0x0000000009a06e34 last fence id 0x0000000009a06e8e on ring 0)

sudo lshw -C display
  *-display
       description: VGA compatible controller
       product: RS780L [Radeon 3000]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 5
       bus info: pci@0000:01:05.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi vga_controller bus_master cap_list rom
       configuration: driver=radeon latency=0
       resources: irq:18 memory:d0000000-dfffffff ioport:d000(size=256) memory:fe9f0000-fe9fffff memory:fe800000-fe8fffff memory:c0000-dffff

sudo dpkg -l | grep xserver-xorg-video
ii xserver-xorg-video-all 1:7.7+19ubuntu14 amd64 X.Org X server -- output driver metapackage
ii xserver-xorg-video-amdgpu 19.1.0-1 amd64 X.Org X server -- AMDGPU display driver
ii xserver-xorg-video-ati 1:19.1.0-1 amd64 X.Org X server -- AMD/ATI display driver wrapper
ii xserver-xorg-video-fbdev 1:0.5.0-1ubuntu1 amd64 X.Org X server -- fbdev display driver
ii xserver-xorg-video-intel 2:2.99.917+git20200226-1 amd64 X.Org X server -- Intel i8xx, i9xx display driver
ii xserver-xorg-video-nouveau 1:1.0.16-1 amd64 X.Org X server -- Nouveau display driver
ii xserver-xorg-video-nvidia-460 460.91.03-0ubuntu0.20.04.1 amd64 NVIDIA binary Xorg driver
ii xserver-xorg-video-qxl 0.1.5+git20200331-1 amd64 X.Org X server -- QXL display driver
ii xserver-xorg-video-radeon 1:19.1.0-1 amd64 X.Org X server -- AMD/ATI Radeon display driver
ii xserver-xorg-video-vesa 1:2.4.0-2 amd64 X.Org X server -- VESA display driver
ii xserver-xorg-video-vmware 1:13.3.0-3 amd64 X.Org X server -- VMware display driver

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Revision history for this message
Gauthier Ostervall (gauthier-i) wrote :

For what it's worth, I don't get these messages anymore since I replaced my CPU fan.

My graphics card has passive cooling only, and relies on air flow in the case, so CPU and case fans. The CPU fan was dead, had probably been for a long time. I replaced it and have not seen this issue ever since. Of course it probably is something else, but you could try and monitor temperature, just in case.

Revision history for this message
Vladimir Mokrozub (mogaba2009) wrote :

I had the same problem with Ubuntu 20.04 LTSP client: black screen during boot, keyboard and mouse don't work, ssh works and "ring 0 stalled" messages in the system log. I had to switch to Ubuntu 18.04 which works fine.

CPU: AMD A8-5500B APU with Radeon(tm) HD Graphics

lshw -C display
  *-display
       description: VGA compatible controller
       product: Trinity [Radeon HD 7560D]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 1
       bus info: pci@0000:00:01.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=radeon latency=0
       resources: irq:35 memory:c0000000-cfffffff ioport:f000(size=256) memory:feb00000-feb3ffff memory:c0000-dffff

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.