[radeon] GUI becomes slow and hangs with radeon, but not with amdgpu

Bug #1975566 reported by Sebastian Schauenburg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Recently I upgraded my machine and moved my GPU over from the old (Intel something) to the new machine (AMD Ryzen 7 5700X, B550 motherboard). I did a fresh Ubuntu install (server) and added packages to get my GUI rolling (lightdm + awesomewm), just like before.

The graphics cards is a "Gigabyte GV-R928XOC-3GD Rev. 2.0" which has the R9 280X chip (TAHITI / Southern Islands). On my old system, I could play games with Vulkan, but with this system it doesn't work.

If I run the 'default' setup, it uses the 'radeon' kernel module. The first couple of seconds I get reasonable performance (glxgears @ 60 FPS), then it suddenly drop to 4 and motion is almost not visible).

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: xorg (not installed)
Uname: Linux 5.18.0-051800-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5CheckResult: pass
Date: Tue May 24 06:49:56 2022
DistUpgraded: Fresh install
DistroCodename: jammy
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X] [1002:6798] (prog-if 00 [VGA controller])
   Subsystem: Gigabyte Technology Co., Ltd Tahiti XTL [Radeon R9 280X OC] [1458:3001]
InstallationDate: Installed on 2022-05-14 (9 days ago)
InstallationMedia: Ubuntu-Server 22.04 LTS "Jammy Jellyfish" - Release amd64 (20220421)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.18.0-051800-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro
SourcePackage: xorg
Symptom: display
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/25/2022
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: P2.30
dmi.board.name: B550 Phantom Gaming-ITX/ax
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrP2.30:bd02/25/2022:br5.17:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnB550PhantomGaming-ITX/ax:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.110-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 22.0.1-1ubuntu2
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.3-2ubuntu2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2build3
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :
Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

Already tried out some different things:
- Tried the latest linux kernel
- Fiddled with amdgpu & radeon kernel module options
- Tried running the amdgpu kernel module instead of radeon

Had to SSH into the machine to do anything (usually the system locks up).

Some error messages from the fiddling:
# kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=19833, emitted seq=19835
# kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1758 thread Xorg:cs0 pid 1798

# kernel: [drm] Fence fallback timer expired on ring sdma1
# kernel: [drm] Fence fallback timer expired on ring gfx

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report. Please run:

  sudo apt install mesa-utils vulkan-tools
  glxinfo > glxinfo.txt
  vulkaninfo > vulkaninfo.txt

and attach the resulting text files here.

affects: xorg (Ubuntu) → mesa (Ubuntu)
Changed in mesa (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The errors in comment #2 are kernel bugs so please also try the official Ubuntu kernel (5.15) instead.

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :
Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :
Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :
Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

Noticed something quite weird when using the default kernel module (radeon).

- when running glxgears, it starts running normally, but stalls quickly. If I active movement (hightling a part of my terminal and moving the mouse around simultaneously), it seems to skyrocket and run normally. Until I stop moving my mouse (which highlightning a piece of the terminal), then it stops running again (5FPS, not being re-drawn).
- When using Firefox, it appears to 'stall' regularly. But if I hide the window, it seems like the page is being processed (for example when visiting a site) and when I show the window again, it is updated. This does not happen when the window is being showed, it just appears to hang.

Additional piece of info: I have a 4K 60hz monitor.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK so multiple problems there, which may have the same root cause:

* Your shell/compositor seems to be causing stalling.

* vulkaninfo says your GPU is "not using the AMDGPU kernel driver: Invalid argument (VK_ERROR_INCOMPATIBLE_DRIVER)"

Sounds fundamentally like the kernel graphics driver isn't working. Please try the supported Ubuntu kernel instead.

Also what desktop environment is this?

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

am currently using AwesomeWM ( https://awesomewm.org/ ) with LightDM.

- when I ran the vulkaninfo this time, I temporarily switched back to 'stock' config (which is radeon, not amdgpu). So that might be correct. Should I switch to amdgpu and run it again?
- between first post and now, I also switched back to the stock jammy kernel
- thanks for the pointer about the compositor, didn't look into that yet. Checked and I didn't have a compositor installed (probably). So I installed "compton" as a compositor again (used it as well on my previous system), but that didn't help one bit (made it worse actually). So I uninstalled it again.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I don't know much about AMD GPUs but from what I've seen there is usually only one kernel graphics driver that works for a given GPU.

After you have switched back to a 'stock' config and kernel, what does vulkaninfo report? Does the "bad performance" persist?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

aiui the radv vulkan driver requires amdgpu kernel driver, so I don't know how you could have it working before the upgrade.

the default driver for SI cards is radeon, that won't change so essentially we don't support vulkan on those

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Before the upgrade it was an Intel GPU.

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote (last edit ):

Sorry for the late reply.

tonight I installed the full ubuntu-desktop package (with a "clean" test user) and experienced crashes/hangs and other weird behaviour with the stock setup. When I disabled Wayland in GDM3, it got a little better, until that had issues as well.

Note: I went from an 7+ year old Intel CPU -> AMD Ryzen CPU (GPU stayed the same)

What kind of information is useful to provide including which logs? Would it help if I provided journalctl logs of with/without wayland and until the moment of the crash/hang?

P.S. thanks for trying to help me out here

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Oh I see you started with Ubuntu Server.

Please:

1. Don't disable Wayland. Wayland should be enabled.

2. Answer the previous question: After you have switched back to a 'stock' config and kernel, what does vulkaninfo report?

3. Try booting desktop Ubuntu from USB: https://ubuntu.com/download/desktop If that just works then we're probably wasting time here figuring out how to un-customise this server install. Just install desktop instead.

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

Here's the vulkaninfo. I got lucky, since the entire GUI hung shortly after (don't think it's related to running this specific command though).

Reverted the Wayland (GDM) change, so it's stock behaviour. And yes, I switched back to stock config and kernel.

Will try to USB live version shortly.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The attachment in comment #16 shows Vulkan is now working, but only with software rendering(?) so that's going to be slow...

Please run these again:

  lspci -k > lspci.txt
  journalctl -b0 > journal.txt
  glxinfo > glxinfo.txt

and attach the resulting text files here.

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

With the live USB stick, there are also Xorg issues. Even got a nice crash rapport.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

We can't read crash files. You need to use the 'ubuntu-bug' or 'apport-cli' tools to upload crash files.

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

Here are logs from 3 separate type of boots (actually I had to reboot at least 20 times, because I had a lot of crashes/hangs):

1) USB safemode boot (since the normal USB boot didn't work ~ 5 times)
2) USB normal boot (it worked, don't know why, but I'm not complaining)
3) normal installed boot (could log in to GDM, but was not able to start a terminal. Had to switch to VT3 and back to get something useful from the system. Eventually I logged in via SSH. So I was not able to get glxinfo, hope the glxinfo from "2)" suffices.)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

1) Safe mode is expected to fail because it disables graphics drivers.
2) Live session all working fine.
3) Graphics working but not able to start a terminal.

So the only problem here is the last one where you couldn't start a terminal. Did you try Ctrl+Alt+T ? Did the shell respond to mouse clicks? Were you able to launch any other apps?

summary: - radeon / amdgpu bad performance or panics
+ Unknown bug
affects: mesa (Ubuntu) → ubuntu
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Unknown bug

Please also try selecting 'Ubuntu on Xorg' on the login screen in case the problem is specific to Wayland sessions.

Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote :

Re-reading my previous post, I need to clarify.

about 1) and 2):
I tried to USB boot (normal) and it didn't work. I got many GUI hangs/crashes. After ~5-10 times I got fed up with it and tried a USB (safe) boot, which worked. Then I tried the USB (normal) boot again and it suddenly worked.

about 3):
if the GUI works, it hangs quickly but never in the same interval (probably crashes in the background) and does not recover. I seem to remember that sometimes even switching to a VT didn't work. I was surprised that remote SSH login worked, but reboots had to mostly be 'hard' eventually (e.g. the reboot command didn't completely finish).
When the GUI hangs, the mousecursor is static and nothing can be done. Alt-tabbing also did not work.

But, I found a workaround!
I installed the "5.18.0-051800-generic" kernel from https://wiki.ubuntu.com/Kernel/MainlineBuilds
and used this modprobe file with options:
# disable the older 'radeon' kernel module
blacklist radeon
# disable audio
options amdgpu audio=0
# magic options for my TAHITI R9 280X
options amdgpu si_support=1
options amdgpu cik_support=1
options amdgpu dpm=0
options amdgpu aspm=0
options amdgpu runpm=0
options amdgpu bapm=0

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

That's the same kernel version you were using in the first place. So it sounds like the only change made was to switch from radeon to amdgpu and tweaking some parameters.

Does the same change work for the stock Ubuntu kernel (5.15)?

summary: - Unknown bug
+ [radeon] GUI becomes slow and hangs in 5.18.0-051800-generic with
+ radeon, but not with amdgpu
Revision history for this message
Sebastian Schauenburg (sschauenburg) wrote : Re: [radeon] GUI becomes slow and hangs in 5.18.0-051800-generic with radeon, but not with amdgpu

Have switched back to stock 5.15.0-37-generic kernel and it appears to be stable as well :)

summary: - [radeon] GUI becomes slow and hangs in 5.18.0-051800-generic with
- radeon, but not with amdgpu
+ [radeon] GUI becomes slow and hangs with radeon, but not with amdgpu
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.