Bug #1902981 “AGP GPUs driven as PCI ones (when AGP is disabled ...” : Bugs : linux package : Ubuntu

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2020-11-05: Missing required logs.

#1

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1902981

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Thomas Debesse (illwieckz) wrote on 2020-11-05: Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms

#2

dmesg on Linux 5.9 vanilla on Ubuntu 20.04, K10 platform, ATI Radeon HD 4670 AGP Edit (2.2 MiB, text/plain)

Download full text (5.4 KiB)

As a reminder, this is a dmesg captured when running ATI Radeon HD 4670 AGP on a K10 host on Linux 5.9 (vanilla).

The ATI Radeon HD 4670 AGP (RV730 XT) is a very capable TeraScale GPU, supporting OpenGL 3.3 (Directx 10 on Windows) and OpenCL 1.0, and featured HDMI output and 1GB of VRAM. The host is also a very capable AMD Phenom II quad core CPU with 16GB of ram.

To verify if its performances match 2020 expectations, I just engaged it (running Ubuntu 20.04) in 2020 Xonotic Defrag World Championship which is currently running (https://xdwc.teichisma.info/), and I got feedback from some players reporting this hardware may be better than their own hardware they compete with. In fact competitive games like Xonotic run at 144fps on 1920×1080 resolution.

The last kernel able to drive this GPU on Ubuntu 20.04 LTS is the 5.4.0-47-generic one, the 5.4.0-48-generic one is believed to have backported the AGP disablement from 5.9-rc1 (ba806f9).

So, when running on 5.4.0-48-generic kernel from Ubuntu repositories, or here, 5.9 vanilla compiled by myself, interesting parts from dmesg log may be:

```
[ 5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[ 5.242359] radeon 0000:01:00.0: disabling GPU acceleration
```

and:

```
[ 34.558889] trying to bind memory to uninitialized GART !
[ 34.559048] WARNING: CPU: 1 PID: 2516 at drivers/gpu/drm/radeon/radeon_gart.c:299 radeon_gart_bind+0xdf/0xf0 [radeon]
[ 34.559050] Modules linked in: zram snd_usb_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm snd_seq_midi kvm_amd snd_seq_midi_event ccp joydev kvm snd_seq snd_rawmidi input_leds snd_timer snd_seq_device snd soundcore k10temp mac_hid serio_raw binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear uas usb_storage hid_generic usbhid hid radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm psmouse forcedeth i2c_nforce2
[ 34.559107] CPU: 1 PID: 2516 Comm: gnome-shell Not tainted 5.9.0 #1
[ 34.559109] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AM2NF3-VSTA, BIOS P3.20 10/09/2009
[ 34.559178] RIP: 0010:radeon_gart_bind+0xdf/0xf0 [radeon]
[ 34.559184] Code: 00 48 89 ef 48 8b 40 60 e8 0e 2f 44 df 31 c0 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 38 6f 6b c0 e8 23 0c 6d de <0f> 0b b8 ea ff ff ff eb dc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 34.559187] RSP: 0018:ffffc030838f7a28 EFLAGS: 00010282
[ 34.559191] RAX: 0000000000000000 RBX: ffffa0cf6b88eb80 RCX: 0000000000000027
[ 34.559193] RDX: 0000000000000027 RSI: 0000000000000086 RDI: ffffa0cf6fc98d08
[ 34.559196] RBP: ffffc030838f7b28 R08: ffffa0cf6fc98d00 R09: 0000000000000004
[ 34.559198] R10: 0000000000000000 R11: 0000000000000001 R12: ffffc030838f7b28
[ 34.559201] R13: ffffa0cf6a622868 R14: ffffa0cf6c7cc6e8 R15: ffffc030838f7b28
[ 34.559204] FS: 00007f46ae245cc0(0000) GS:ffffa0cf6fc80000(0000) knlGS:0000000000000000
[ 34.559207] CS: ...

As a reminder, this is a dmesg captured when running ATI Radeon HD 4670 AGP on a K10 host on Linux 5.9 (vanilla).

The ATI Radeon HD 4670 AGP (RV730 XT) is a very capable TeraScale GPU, supporting OpenGL 3.3 (Directx 10 on Windows) and OpenCL 1.0, and featured HDMI output and 1GB of VRAM. The host is also a very capable AMD Phenom II quad core CPU with 16GB of ram.

To verify if its performances match 2020 expectations, I just engaged it (running Ubuntu 20.04) in 2020 Xonotic Defrag World Championship which is currently running (https://xdwc.teichisma.info/), and I got feedback from some players reporting this hardware may be better than their own hardware they compete with. In fact competitive games like Xonotic run at 144fps on 1920×1080 resolution.

The last kernel able to drive this GPU on Ubuntu 20.04 LTS is the 5.4.0-47-generic one, the 5.4.0-48-generic one is believed to have backported the AGP disablement from 5.9-rc1 (ba806f9).

So, when running on 5.4.0-48-generic kernel from Ubuntu repositories, or here, 5.9 vanilla compiled by myself, interesting parts from dmesg log may be:

```
[    5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[    5.242359] radeon 0000:01:00.0: disabling GPU acceleration
```

and:

```
[   34.558889] trying to bind memory to uninitialized GART !
[   34.559048] WARNING: CPU: 1 PID: 2516 at drivers/gpu/drm/radeon/radeon_gart.c:299 radeon_gart_bind+0xdf/0xf0 [radeon]
[   34.559050] Modules linked in: zram snd_usb_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm snd_seq_midi kvm_amd snd_seq_midi_event ccp joydev kvm snd_seq snd_rawmidi input_leds snd_timer snd_seq_device snd soundcore k10temp mac_hid serio_raw binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear uas usb_storage hid_generic usbhid hid radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm psmouse forcedeth i2c_nforce2
[   34.559107] CPU: 1 PID: 2516 Comm: gnome-shell Not tainted 5.9.0 #1
[   34.559109] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AM2NF3-VSTA, BIOS P3.20 10/09/2009
[   34.559178] RIP: 0010:radeon_gart_bind+0xdf/0xf0 [radeon]
[   34.559184] Code: 00 48 89 ef 48 8b 40 60 e8 0e 2f 44 df 31 c0 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 38 6f 6b c0 e8 23 0c 6d de <0f> 0b b8 ea ff ff ff eb dc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[   34.559187] RSP: 0018:ffffc030838f7a28 EFLAGS: 00010282
[   34.559191] RAX: 0000000000000000 RBX: ffffa0cf6b88eb80 RCX: 0000000000000027
[   34.559193] RDX: 0000000000000027 RSI: 0000000000000086 RDI: ffffa0cf6fc98d08
[   34.559196] RBP: ffffc030838f7b28 R08: ffffa0cf6fc98d00 R09: 0000000000000004
[   34.559198] R10: 0000000000000000 R11: 0000000000000001 R12: ffffc030838f7b28
[   34.559201] R13: ffffa0cf6a622868 R14: ffffa0cf6c7cc6e8 R15: ffffc030838f7b28
[   34.559204] FS:  00007f46ae245cc0(0000) GS:ffffa0cf6fc80000(0000) knlGS:0000000000000000
[   34.559207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.559210] CR2: 000056494261c1c8 CR3: 000000040bfe6000 CR4: 00000000000006e0
[   34.559212] Call Trace:
[   34.559286]  radeon_ttm_backend_bind+0x58/0x210 [radeon]
[   34.559305]  ttm_tt_bind+0x32/0x60 [ttm]
[   34.559321]  ttm_bo_handle_move_mem+0x236/0x590 [ttm]
[   34.559339]  ttm_bo_validate+0x16c/0x180 [ttm]
[   34.559407]  ? drm_ioctl_kernel+0xe9/0xf0 [drm]
[   34.559422]  ttm_bo_init_reserved+0x2ae/0x320 [ttm]
[   34.559438]  ttm_bo_init+0x6d/0xf0 [ttm]
[   34.559504]  ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon]
[   34.559569]  radeon_bo_create+0x184/0x210 [radeon]
[   34.559634]  ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon]
[   34.559703]  radeon_gem_object_create+0xa9/0x180 [radeon]
[   34.559773]  ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon]
[   34.559840]  radeon_gem_create_ioctl+0x66/0x120 [radeon]
[   34.559850]  ? tomoyo_path_number_perm+0x66/0x1d0
[   34.559918]  ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon]
[   34.559968]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[   34.560021]  drm_ioctl+0x1ec/0x390 [drm]
[   34.560090]  ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon]
[   34.560152]  radeon_drm_ioctl+0x49/0x80 [radeon]
[   34.560160]  __x64_sys_ioctl+0x83/0xb0
[   34.560167]  do_syscall_64+0x33/0x80
[   34.560174]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.560179] RIP: 0033:0x7f46b369550b
[   34.560183] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
[   34.560186] RSP: 002b:00007ffdb7421658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   34.560189] RAX: ffffffffffffffda RBX: 00007ffdb74216d0 RCX: 00007f46b369550b
[   34.560192] RDX: 00007ffdb74216d0 RSI: 00000000c020645d RDI: 000000000000000e
[   34.560194] RBP: 00000000c020645d R08: 0000000000000011 R09: 0000000000000005
[   34.560197] R10: 000056494245c010 R11: 0000000000000246 R12: 0000000000001000
[   34.560199] R13: 000000000000000e R14: 0000000000010000 R15: 0000000000001000
[   34.560205] ---[ end trace 9ea277f1e2a7c575 ]---
[   34.560271] [drm:radeon_ttm_backend_bind [radeon]] *ERROR* failed to bind 16 pages at 0x00000000
[   34.560363] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (65536, 2, 4096, -22)
```

Revision history for this message

Thomas Debesse (illwieckz) wrote on 2020-11-05:

#3

dmesg on Linux 5.9 with 32-bit DMA patch on Ubuntu 20.04, K10 platform, ATI Radeon HD 4670 AGP (AGP-as-PCI since AGP is disabled at build time), demonstrating some errors being workarounded and new ones occurring Edit (95.5 KiB, text/plain)

Download full text (12.3 KiB)

When applying patch from https://bugs.launchpad.net/bugs/1902795

- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902795/+attachment/5431335/+files/0001-drm-radeon-make-all-PCI-GPUs-use-32bits-DMA-bit-mask.patch

which reduces the breakage (but not fix completely) the issues faced with PCI GPUs on K8 and K10 hosts by setting DMA bit mask to 32-bits for all PCI GPUs, we can see those this that is fixed on PCI GPUs is not fixed on AGP-as-PCI GPUs (and there is even more errores before that):

```
[ 5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
```

Things even go that wrong we even don't see those other errors that are expected to be seen after that:

```
[ 5.242359] radeon 0000:01:00.0: disabling GPU acceleration
```

```
[ 34.558889] trying to bind memory to uninitialized GART !
```

Instead, the kernel loops before reaching those errors, trying desperately to pass this r600_ring_test step.

But before r600_ring_test failure message is printed, more and newer issues about ring 0 being stalled and GU lockup occurs with AGP-as-PCI GPUs that are never seen with PCI-native GPUs, especially when taken in account PCI GPUs can at least pass the r600_ring_test with the patch.

Also, after the r600_ring_test failure message, instead of getting the message telling GPU acceleration is disabled, we get a message about r600 startup failing on resume which is new.

This is why it is believed that fixing PCI GPUs may not be enough to fix AGP GPUs running as PCI ones when AGP is disabled at kernel build time.

Here are the issues that is only seen with AGP-as-PCI GPUs, occurring before and after the r600_ring_test failure message:

```
[ 45.763336] radeon 0000:01:00.0: ring 0 stalled for more than 10256msec
[ 45.763349] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 46.275324] radeon 0000:01:00.0: ring 0 stalled for more than 10768msec
[ 46.275335] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 46.787322] radeon 0000:01:00.0: ring 0 stalled for more than 11280msec
[ 46.787332] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 47.299336] radeon 0000:01:00.0: ring 0 stalled for more than 11792msec
[ 47.299346] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 47.811320] radeon 0000:01:00.0: ring 0 stalled for more than 12304msec
[ 47.811332] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 48.323331] radeon 0000:01:00.0: ring 0 stalled for more than 12816msec
[ 48.323344] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 48.835307] radeon 0000:01:00.0: ring 0 stalled for more than 13328msec
[ 48.835318] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 49.347328] radeon 0000:01:00.0: ring 0 stalled for more than...

When applying patch from https://bugs.launchpad.net/bugs/1902795

- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902795/+attachment/5431335/+files/0001-drm-radeon-make-all-PCI-GPUs-use-32bits-DMA-bit-mask.patch

which reduces the breakage (but not fix completely) the issues faced with PCI GPUs on K8 and K10 hosts by setting DMA bit mask to 32-bits for all PCI GPUs, we can see those this that is fixed on PCI GPUs is not fixed on AGP-as-PCI GPUs (and there is even more errores before that):

```
[ 5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
```

Things even go that wrong we even don't see those other errors that are expected to be seen after that:

```
[ 5.242359] radeon 0000:01:00.0: disabling GPU acceleration
```

```
 [ 34.558889] trying to bind memory to uninitialized GART !
```

Instead, the kernel loops before reaching those errors, trying desperately to pass this r600_ring_test step.

But before r600_ring_test failure message is printed, more and newer issues about ring 0 being stalled and GU lockup occurs with AGP-as-PCI GPUs that are never seen with PCI-native GPUs, especially when taken in account PCI GPUs can at least pass the r600_ring_test with the patch.

Also, after the r600_ring_test failure message, instead of getting the message telling GPU acceleration is disabled, we get a message about r600 startup failing on resume which is new.

This is why it is believed that fixing PCI GPUs may not be enough to fix AGP GPUs running as PCI ones when AGP is disabled at kernel build time.

Here are the issues that is only seen with AGP-as-PCI GPUs, occurring before and after the r600_ring_test failure message:

```
[   45.763336] radeon 0000:01:00.0: ring 0 stalled for more than 10256msec
[   45.763349] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   46.275324] radeon 0000:01:00.0: ring 0 stalled for more than 10768msec
[   46.275335] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   46.787322] radeon 0000:01:00.0: ring 0 stalled for more than 11280msec
[   46.787332] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   47.299336] radeon 0000:01:00.0: ring 0 stalled for more than 11792msec
[   47.299346] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   47.811320] radeon 0000:01:00.0: ring 0 stalled for more than 12304msec
[   47.811332] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   48.323331] radeon 0000:01:00.0: ring 0 stalled for more than 12816msec
[   48.323344] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   48.835307] radeon 0000:01:00.0: ring 0 stalled for more than 13328msec
[   48.835318] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   49.347328] radeon 0000:01:00.0: ring 0 stalled for more than 13840msec
[   49.347341] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   49.859316] radeon 0000:01:00.0: ring 0 stalled for more than 14352msec
[   49.859326] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   50.371471] radeon 0000:01:00.0: ring 0 stalled for more than 14864msec
[   50.371483] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   50.883318] radeon 0000:01:00.0: ring 0 stalled for more than 15376msec
[   50.883328] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   51.395315] radeon 0000:01:00.0: ring 0 stalled for more than 15888msec
[   51.395327] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   51.907325] radeon 0000:01:00.0: ring 0 stalled for more than 16400msec
[   51.907338] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   52.419319] radeon 0000:01:00.0: ring 0 stalled for more than 16912msec
[   52.419330] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   52.931321] radeon 0000:01:00.0: ring 0 stalled for more than 17424msec
[   52.931331] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   53.443321] radeon 0000:01:00.0: ring 0 stalled for more than 17936msec
[   53.443333] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   53.955335] radeon 0000:01:00.0: ring 0 stalled for more than 18448msec
[   53.955346] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   54.467324] radeon 0000:01:00.0: ring 0 stalled for more than 18960msec
[   54.467333] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   54.979306] radeon 0000:01:00.0: ring 0 stalled for more than 19472msec
[   54.979316] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   55.491309] radeon 0000:01:00.0: ring 0 stalled for more than 19984msec
[   55.491318] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   56.003337] radeon 0000:01:00.0: ring 0 stalled for more than 20496msec
[   56.003347] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   56.515327] radeon 0000:01:00.0: ring 0 stalled for more than 21008msec
[   56.515337] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   57.027325] radeon 0000:01:00.0: ring 0 stalled for more than 21520msec
[   57.027335] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   57.539315] radeon 0000:01:00.0: ring 0 stalled for more than 22032msec
[   57.539327] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   58.051318] radeon 0000:01:00.0: ring 0 stalled for more than 22544msec
[   58.051328] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   58.563304] radeon 0000:01:00.0: ring 0 stalled for more than 23056msec
[   58.563314] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   59.075306] radeon 0000:01:00.0: ring 0 stalled for more than 23568msec
[   59.075315] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   59.587308] radeon 0000:01:00.0: ring 0 stalled for more than 24080msec
[   59.587317] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   60.099321] radeon 0000:01:00.0: ring 0 stalled for more than 24592msec
[   60.099331] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   60.611309] radeon 0000:01:00.0: ring 0 stalled for more than 25104msec
[   60.611318] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   61.123314] radeon 0000:01:00.0: ring 0 stalled for more than 25616msec
[   61.123324] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   61.635321] radeon 0000:01:00.0: ring 0 stalled for more than 26128msec
[   61.635331] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   62.147328] radeon 0000:01:00.0: ring 0 stalled for more than 26640msec
[   62.147338] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   62.659314] radeon 0000:01:00.0: ring 0 stalled for more than 27152msec
[   62.659323] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   63.171313] radeon 0000:01:00.0: ring 0 stalled for more than 27664msec
[   63.171327] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   63.683340] radeon 0000:01:00.0: ring 0 stalled for more than 28176msec
[   63.683352] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   64.195326] radeon 0000:01:00.0: ring 0 stalled for more than 28688msec
[   64.195336] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   64.707316] radeon 0000:01:00.0: ring 0 stalled for more than 29200msec
[   64.707325] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   65.219312] radeon 0000:01:00.0: ring 0 stalled for more than 29712msec
[   65.219321] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   65.731325] radeon 0000:01:00.0: ring 0 stalled for more than 30224msec
[   65.731334] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   66.243296] radeon 0000:01:00.0: ring 0 stalled for more than 30736msec
[   66.243305] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   66.755306] radeon 0000:01:00.0: ring 0 stalled for more than 31248msec
[   66.755317] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   66.840372] radeon 0000:01:00.0: Saved 25 dwords of commands on ring 0.
[   66.840402] radeon 0000:01:00.0: GPU softreset: 0x00000019
[   66.840408] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA27034A1
[   66.840414] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000102
[   66.840419] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200028C0
[   66.840424] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x04000000
[   66.840429] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010100
[   66.840434] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008C80
[   66.840438] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x808182E7
[   66.840443] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   67.364934] radeon 0000:01:00.0: Wait for MC idle timedout !
[   67.364940] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007F6B
[   67.365005] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[   67.367106] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0x00003028
[   67.367110] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
[   67.367114] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200028C0
[   67.367118] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   67.367122] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   67.367126] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   67.367130] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[   67.367134] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   67.367152] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[   67.842179] radeon 0000:01:00.0: Wait for MC idle timedout !
[   68.068765] radeon 0000:01:00.0: Wait for MC idle timedout !
[   68.082273] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
[   68.082448] radeon 0000:01:00.0: WB enabled
[   68.082454] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00
[   68.082459] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
[   68.088977] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c598
[   68.374095] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[   68.374176] [drm:rv770_resume [radeon]] *ERROR* r600 startup failed on resume
```

See more details in dmesg log attached.

Revision history for this message

Thomas Debesse (illwieckz) wrote on 2020-11-05:

#4

To get a better picture of such top-of-the-line AGP GPU performance, when comparing to others GPUs on Unvanquished GPU compatibility matrix: https://wiki.unvanquished.net/wiki/GPU_compatibility_matrix

we can see the ATI Radeon HD 4670 AGP (RV730 XT, TeraScale 1) performs:

- better than the PCI Express ATI Radeon HD 7450 from Q1 2012 (RV910, Caicos, TeraScale 2),
- like the mobile Nvidia GeForce GT 740M from Q2 2013 with nvidia driver (NVE7, GK107M, Kepler),
- like the mobile Quadro K1100M from Q3 2013 with nvidia driver (NVE7, GK107GLM, Kepler),
- like the integrated Intel HD 4600 from Q1 2014 (i7-4810MQ, Haswell, Gen7 GT2),
- like the integrated Intel HD 520 from Q3 2015 (i3-6100U, Skylake, Gen9 GT2),
- like the PCI Express GeForce GTX 1050 Ti from Q4 2016 when running the nouveau driver (Pascal).

On Nvidia side, to outperform this GPU on Linux with the free open source nouveau driver it is required to acquire at least a GeForce GTX 1060 from 2016 (NV136, GP106-300-A1, Pascal).

Intel users may had to wait for the UHD 600 series (2016) to outperform this ATI AGP GPU. To this day the first verified Intel GPU that is known to outperform this ATI AGP GPU is the UHD 620 from Q3 2019.

Revision history for this message

Thomas Debesse (illwieckz) wrote on 2020-11-05:

#5

It looks like comment #3 had been truncated, the interesting part of the dmesg log that is missing is:

```

[ 66.755306] radeon 0000:01:00.0: ring 0 stalled for more than 31248msec
[ 66.755317] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[ 66.840372] radeon 0000:01:00.0: Saved 25 dwords of commands on ring 0.
[ 66.840402] radeon 0000:01:00.0: GPU softreset: 0x00000019
[ 66.840408] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA27034A1
[ 66.840414] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000102
[ 66.840419] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200028C0
[ 66.840424] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x04000000
[ 66.840429] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010100
[ 66.840434] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008C80
[ 66.840438] radeon 0000:01:00.0: R_008680_CP_STAT = 0x808182E7
[ 66.840443] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
[ 67.364934] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 67.364940] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007F6B
[ 67.365005] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[ 67.367106] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0x00003028
[ 67.367110] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000002
[ 67.367114] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200028C0
[ 67.367118] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
[ 67.367122] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
[ 67.367126] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
[ 67.367130] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000
[ 67.367134] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
[ 67.367152] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 67.842179] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 68.068765] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 68.082273] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
[ 68.082448] radeon 0000:01:00.0: WB enabled
[ 68.082454] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00
[ 68.082459] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
[ 68.088977] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c598
[ 68.374095] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[ 68.374176] [drm:rv770_resume [radeon]] *ERROR* r600 startup failed on resume
```

This is what happens when applying the patch to force 32-bit DMA bit mask on PCI devices.

It looks like comment #3 had been truncated, the interesting part of the dmesg log that is missing is:

```

[   66.755306] radeon 0000:01:00.0: ring 0 stalled for more than 31248msec
[   66.755317] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x0000000000000002 on ring 0)
[   66.840372] radeon 0000:01:00.0: Saved 25 dwords of commands on ring 0.
[   66.840402] radeon 0000:01:00.0: GPU softreset: 0x00000019
[   66.840408] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA27034A1
[   66.840414] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000102
[   66.840419] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200028C0
[   66.840424] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x04000000
[   66.840429] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010100
[   66.840434] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008C80
[   66.840438] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x808182E7
[   66.840443] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   67.364934] radeon 0000:01:00.0: Wait for MC idle timedout !
[   67.364940] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007F6B
[   67.365005] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[   67.367106] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0x00003028
[   67.367110] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
[   67.367114] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200028C0
[   67.367118] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[   67.367122] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[   67.367126] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[   67.367130] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[   67.367134] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   67.367152] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[   67.842179] radeon 0000:01:00.0: Wait for MC idle timedout !
[   68.068765] radeon 0000:01:00.0: Wait for MC idle timedout !
[   68.082273] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
[   68.082448] radeon 0000:01:00.0: WB enabled
[   68.082454] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00
[   68.082459] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
[   68.088977] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c598
[   68.374095] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[   68.374176] [drm:rv770_resume [radeon]] *ERROR* r600 startup failed on resume
```

This is what happens when applying the patch to force 32-bit DMA bit mask on PCI devices.

Revision history for this message

Thomas Debesse (illwieckz) wrote on 2020-11-05:

#6

On a side note, because we see a clear behaviour difference when applying the PCI patch we can assume the driver catch the `rdev->flags & RADEON_IS_PCI` test instead of the `rdev->flags & RADEON_IS_AGP` one when running an AGP GPU with AGP disabled in kernel at build time.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
tags:	added: amd64 focal kernel-bug

Thomas Debesse (illwieckz) on 2020-11-05

summary:

- AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to
- fail on K8 and K10 platforms
+ AGP GPUs driven as PCI ones (when AGP is disabled at kernel build time)
+ are known to fail on K8 and K10 platforms

Revision history for this message

Thomas Debesse (illwieckz) wrote on 2021-05-13:

#7

As said there: https://lkml.org/lkml/2021/5/13/752

The bug was also reproduced on Intel Kentsfield platform (Core 2 Quad Q6600 (with VIA PT880/VT82xx) with R300 and TeraScale GPUs.

summary:

AGP GPUs driven as PCI ones (when AGP is disabled at kernel build time)
- are known to fail on K8 and K10 platforms
+ are known to fail on AMD K8, K10 and Intel Kentsfield platforms

Ubuntu
linux package

AGP GPUs driven as PCI ones (when AGP is disabled at kernel build time) are known to fail on AMD K8, K10 and Intel Kentsfield platforms

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux package

AGP GPUs driven as PCI ones (when AGP is disabled at kernel build time) are known to fail on AMD K8, K10 and Intel Kentsfield platforms

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package