radeon: GPU lockup when restarting a video on RV730

Bug #1900854 reported by Philippe Coval
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
New
Unknown
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

To reproduce type at a terminal or open with same file locally:
mpv https://conf.tube/download/videos/ea60f030-90c1-4e8e-9782-bef14dd3b1d1-1080.mp4

Stop the video by clicking Esc. Run again and the process will be stuck on creating window frame for a while, then the video will start to show, but the video's first frame looks like blinking in this triangle mess. The system is still usable, and audio is still playing.

Log when issue occurred:
https://launchpadlibrarian.net/502954026/CurrentDmesg.txt

---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu50
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: rzr 4431 F.... pulseaudio
 /dev/snd/pcmC0D0p: rzr 4431 F...m pulseaudio
 /dev/snd/controlC1: rzr 4431 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: KDE
DistroRelease: Ubuntu 20.10
IwConfig:
 lo no wireless extensions.

 enp2s0 no wireless extensions.

 docker0 no wireless extensions.
MachineType: Dell Inc. Precision T1500
Package: linux (not installed)
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-25-generic root=UUID=036d3d8c-98d4-48ec-a18b-dce21dcf0bb5 ro initrd=initrd.gz splash quiet vt.handoff=7
ProcVersionSignature: Ubuntu 5.8.0-25.26-generic 5.8.14
RebootRequiredPkgs:
 linux-image-unsigned-5.9.1-050901-generic
 linux-base
 linux-image-unsigned-5.9.1-050901-lowlatency
 linux-base
RelatedPackageVersions:
 linux-restricted-modules-5.8.0-25-generic N/A
 linux-backports-modules-5.8.0-25-generic N/A
 linux-firmware 1.190
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
Tags: groovy
Uname: Linux 5.8.0-25-generic x86_64
UpgradeStatus: Upgraded to groovy on 2020-10-13 (8 days ago)
UserGroups: adm cdrom dialout dip docker lpadmin netdev plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 01/13/2011
dmi.bios.release: 8.15
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.4.0
dmi.board.name: 0XC7MM
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.4.0:bd01/13/2011:br8.15:svnDellInc.:pnPrecisionT1500:pvr00:rvnDellInc.:rn0XC7MM:rvrA00:cvnDellInc.:ct3:cvr:
dmi.product.family: 0
dmi.product.name: Precision T1500
dmi.product.sku: 0
dmi.product.version: 00
dmi.sys.vendor: Dell Inc.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1900854

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
penalvch (penalvch)
tags: added: apport-collected latest-bios-2.4.0
tags: added: needs-upstream-testing
Revision history for this message
penalvch (penalvch) wrote :

Philippe Coval, in order to allow additional upstream mainline kernel developers to examine the issue, at your earliest convenience, could you please test the latest mainline kernel? Please keep in mind the following:
1) The one to test is in a folder at the very top of the page (not the daily folder).
2) The release names are irrelevant.
3) The folder time stamps aren't indicative of when the kernel actually was released upstream.
4) Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds .

If testing on your main install would be inconvenient, one may:
1) Install Ubuntu to a different partition and then test this there.
2) Backup, or clone the primary install.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the mainline kernel, please comment on which kernel version specifically you tested. If this issue is not reproducible in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the Bug Description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, and Y are the first two numbers of the kernel version, and Z is the release candidate number if it exists.

If the issue is reproducible with the mainline kernel, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Also, you don't need to apport-collect further unless specifically requested to do so.

In addition, to keep this issue relevant to upstream, please continue to test the latest mainline kernel as it becomes available.

Lastly, it is most helpful that after testing of the latest mainline kernel is complete, you mark this report Status Confirmed.

Thank you for your help.

Changed in linux (Ubuntu):
importance: Undecided → Medium
penalvch (penalvch)
tags: added: regression-potential
Revision history for this message
Philippe Coval (rzr) wrote : AlsaInfo.txt

apport information

description: updated
Revision history for this message
Philippe Coval (rzr) wrote : CRDA.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lspci.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lspci-vt.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lsusb.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lsusb-t.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lsusb-v.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : PaInfo.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcEnviron.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcModules.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : PulseList.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : UdevDb.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : WifiSyslog.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : acpidump.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote :

those previous files were generated on current kernel (not mainline) before the issue,

I am about to send a new set after the replicated problem.

and then I will upgrade kernel to mainline and send a new batch.

description: updated
Revision history for this message
Philippe Coval (rzr) wrote : AlsaInfo.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : CRDA.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lspci.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lspci-vt.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lsusb.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lsusb-t.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : Lsusb-v.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : PaInfo.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcEnviron.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : ProcModules.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : PulseList.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : UdevDb.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : WifiSyslog.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote : acpidump.txt

apport information

Revision history for this message
Philippe Coval (rzr) wrote :

Maybe I should forward this issue to upload, my dmesg is full of :

  Oct 21 18:33:09 cis kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
  Oct 21 18:33:09 cis kernel: radeon 0000:01:00.0: couldn't schedule ib

Also I should mention that mpv is using VAPPI hardware decoding.

tags: added: kernel-bug-exists-upstream-5.9.1-050901-generic
Philippe Coval (rzr)
tags: added: kernel-bug-exists-upstream
removed: kernel-bug-exists-upstream-5.9.1-050901-generic
tags: added: kernel-bug-exists-upstream-5.9.1-050901-generic
tags: added: kernel-bug-exists-upstream-5.9.1
removed: kernel-bug-exists-upstream-5.9.1-050901-generic
Revision history for this message
penalvch (penalvch) wrote :

Philippe Coval, please advise to all of the following:

1) Did this issue not occur in a prior release of Ubuntu?

2) Is this still reproducible if the file is local versus streamed from a website?

Revision history for this message
Philippe Coval (rzr) wrote :

Local files are not working better.

I saw you tagged it regression as regression but since I have this system I don't think I saw this working properly.

I know it also affected Ubuntu 20.04 after I upgraded from 19.10 as stated at:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574130?comments=all

I don't think I tested on earlier Ubuntu versions, but I just booted debian-10 (installed from current i386 packages) to make a quick test on earlier kernel (4.19.0-11-686) and display is not breaking (but accelerated according to logs).

Some interesting bits in log:

root@host:~# Xorg

X.Org X Server 1.20.4
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.19.0-10-amd64 i686 Debian
Current Operating System: Linux host 4.19.0-11-686 #1 SMP Debian 4.19.146-1 (2020-09-17) i686
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-11-686 root=UUID=60f4c98d-1dac-45af-b3a2-9ff0586a1008 ro quiet
Build Date: 27 August 2020 08:51:48AM
xorg-server 2:1.20.4-1+deb10u1 (https://www.debian.org/support)
Current version of pixman: 0.36.0
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Wed Oct 21 18:32:43 2020
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(II) [KMS] drm report modesetting isn't supported.
error setting MTRR (base = 0x00000000d0000000, size = 0x01000000, type = 1) Invalid argument (22)
error setting MTRR (base = 0x00000000d0000000, size = 0x01000000, type = 1) Invalid argument (22)
error setting MTRR (base = 0x00000000d0000000, size = 0x01000000, type = 1) Invalid argument (22)
error setting MTRR (base = 0x00000000d0000000, size = 0x01000000, type = 1) Invalid argument (22)
error setting MTRR (base = 0x00000000d0000000, size = 0x01000000, type = 1) Invalid argument (22)
error setting MTRR (base = 0x00000000d0000000, size = 0x01000000, type = 1) Invalid argument (22)

DISPLAY=:0 mpv https://conf.tube/download/videos/ea60f030-90c1-4e8e-9782-bef14dd3b1d1-1080.mp4
Playing: https://conf.tube/download/videos/ea60f030-90c1-4e8e-9782-bef14dd3b1d1-1080.mp4
 (+) Video --vid=1 (*) (h264 1920x1080 30.000fps)
 (+) Audio --aid=1 (*) (aac 2ch 48000Hz)
[vo/gpu/opengl] Suspected software renderer or indirect context.
libEGL warning: DRI2: failed to authenticate
[vo/gpu/opengl] Suspected software renderer or indirect context.
[vo/gpu/opengl] Suspected software renderer or indirect context.
Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory
[vo/vdpau] Error when calling vdp_device_create_x11: 1
[vo/xv] No Xvideo support found.
[vo/sdl] Using opengl
AO: [alsa] 48000Hz stereo 2ch float
VO: [sdl] 1920x1080 yuv420p
AV: 00:01:20 / 00:53:25 (2%) A-V: -0.000 Cache: 679s+159MB

I can stop and run it again without lockups.

To bisect I can try older Ubuntu bionic.

Revision history for this message
Philippe Coval (rzr) wrote :

An other test using 5.9.1-050901-lowlatency, display is not breaking anymore but there are still issues in logs after the video has been played once:

export DISPLAY=:0 ; while true ; do mpv 'http://youtu.be/9J5CHTFWnTc' ; sleep 10 ; done

 (+) Video --vid=1 (*) (h264 1280x720)
 (+) Audio --aid=1 --alang=eng (*) (aac 2ch 44100Hz)
AO: [pulse] 44100Hz stereo 2ch float
Using hardware decoding (vaapi).
VO: [gpu] 1280x720 vaapi[nv12]
AV: 00:00:33 / 00:00:33 (99%) A-V: 0.000 Cache: 0.0s

Exiting... (End of file)
 (+) Video --vid=1 (*) (h264 1280x720)
 (+) Audio --aid=1 --alang=eng (*) (aac 2ch 44100Hz)
AO: [pulse] 44100Hz stereo 2ch float
Using hardware decoding (vaapi).
VO: [gpu] 1280x720 vaapi[nv12]
radeon: The kernel rejected CS, see dmesg for more information (-22).
radeon: The kernel rejected CS, see dmesg for more information (-22).

Audio/Video desynchronisation detected! Possible reasons include too slow
hardware, temporary CPU spikes, broken drivers, and broken files. Audio
position will not match to the video (see A-V status field).

radeon: The kernel rejected CS, see dmesg for more information (-22).

On later times, I am not sure VAAPI is still used (yuv420p vs vaapi[nv12]):

 (+) Video --vid=1 (*) (h264 1280x720)
 (+) Audio --aid=1 --alang=eng (*) (aac 2ch 44100Hz)
AO: [pulse] 44100Hz stereo 2ch float
[ffmpeg/video] h264: No support for codec h264 profile 100.
VO: [gpu] 1280x720 yuv420p
AV: 00:00:31 / 00:00:33 (92%) A-V: 0.000 Cache: 2.1s/433KB

dmesg is pretty much the same as:

https://launchpadlibrarian.net/502954026/CurrentDmesg.txt

[ 99.002309] radeon 0000:01:00.0: ring 5 stalled for more than 10080msec
[ 99.002317] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000003fb last fence id 0x00000000000003ff on ring 5)
[ 99.012046] radeon 0000:01:00.0: couldn't schedule ib
[ 99.012107] [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-22)!
[ 99.028154] radeon 0000:01:00.0: Saved 1433 dwords of commands on ring 0.

Once I managed to get backtrace around DRM parts, I'll share it if it happens again.

Do you need more informations ?

Meanwhile let me crosslink this issue to:

https://gitlab.freedesktop.org/drm/amd/-/issues/630

penalvch (penalvch)
tags: added: focal
removed: needs-upstream-testing
penalvch (penalvch)
description: updated
Revision history for this message
penalvch (penalvch) wrote :

Philippe Coval, could you please address all of the following:

1) In order for upstream to help you, could you please make a net new bug report (not link a report you didn't make) via https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/-/issues/new?issue%5Bassignee_id%5D=&issue%5Bmilestone_id%5D= ? Once done, please advise on the URL.

2) To confirm, did this issue not occur in Ubuntu 19.10 before you upgraded?

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
Revision history for this message
penalvch (penalvch) wrote :
Revision history for this message
Philippe Coval (rzr) wrote :

1)
https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/-/issues/192

2) It is also occurs on 19.10,
I don't remember any working setup,
may I try under windows 10
or could it be a hardware problem ?

Extra info, displaying videos in firefox is not causing any issue,
I suspect it uses different acceleration.

Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Revision history for this message
penalvch (penalvch) wrote :

Philippe Coval, one thing that would be helpful is if you could make a short video from a cell phone of the computer screen right before the problem begins, as it starts ,and little bit as it is happening. This helps developers root cause the issue better.

tags: added: eoan
Changed in linux:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.