Kubuntu 18.04.01 - Ryzen 2400G / AMDGPU - Random system freezes

Bug #1804505 reported by Cannot Remember
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I am experiencing some stray lockups which are most likely related to AMDGPU. There is no apparent logic in the pattern with wich the system hangs and I cannot provide any steps to reproduce the problem. One time it happened as I clicked a link in Firefox, new tab opened and the system froze. Another time, Firefox was not open and I was about to edit a text file by clicking "F4 Edit" in Krusader and the system froze. When the system freezes, it is completely unresponsive and there are no other options than to reset and reboot. The system generates no error reports as far as I can figure out and the only apparent info are some entries in the syslog. I have appended relevant info below, if further info may be needed please ask :)

[b]inxi -Fz[/b]

System: Host: computername Kernel: 4.19.1-041901-generic x86_64 bits: 64 Desktop: KDE Plasma 5.12.6
           Distro: Ubuntu 18.04.1 LTS
Machine: Device: desktop Mobo: ASUSTeK model: PRIME A320M-A v: Rev X.0x serial: N/A
           UEFI: American Megatrends v: 4023 date: 08/20/2018
Battery hidpp__0: charge: 80% condition: NA/NA Wh
CPU: Quad core AMD Ryzen 5 2400G with Radeon Vega Graphics (-MT-MCP-) cache: 2048 KB
           clock speeds: max: 3600 MHz 1: 1454 MHz 2: 1418 MHz 3: 1419 MHz 4: 1419 MHz 5: 1419 MHz 6: 1419 MHz
           7: 1436 MHz 8: 1582 MHz
Graphics: Card: Advanced Micro Devices [AMD/ATI] Vega [Radeon Vega 8 Mobile]
           Display Server: x11 (X.Org 1.19.6 ) drivers: ati,amdgpu (unloaded: modesetting,fbdev,vesa,radeon)
           Resolution: 1680x1050@59.88hz
           OpenGL: renderer: AMD RAVEN (DRM 3.27.0 / 4.19.1-041901-generic, LLVM 6.0.0) version: 4.5 Mesa 18.0.5
Audio: Card-1 Advanced Micro Devices [AMD] Device 15e3 driver: snd_hda_intel
           Card-2 Advanced Micro Devices [AMD/ATI] Device 15de driver: snd_hda_intel
           Sound: Advanced Linux Sound Architecture v: k4.19.1-041901-generic
Network: Card: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller driver: r8169
           IF: enp5s0 state: up speed: 100 Mbps duplex: full mac: <filter>
Drives: HDD Total Size: 250.1GB (26.2% used)
           ID-1: /dev/nvme0n1 model: Samsung_SSD_970_EVO_250GB size: 250.1GB
Partition: ID-1: / size: 228G used: 61G (29%) fs: ext4 dev: /dev/nvme0n1p2
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors: System Temperatures: cpu: N/A mobo: N/A gpu: 26.0
           Fan Speeds (in rpm): cpu: 0
Info: Processes: 232 Uptime: 22 min Memory: 1176.8/6960.8MB Client: Shell (bash) inxi: 2.3.56

[b]SYSLOG[/b]

Prior to the first relevant entry posted here is a CRON entry @ 13:18:01 which seems unrelated to the issue. All entries where preceeded with "Nov 21 13:23:13 computername kernel: ":

[ 4335.012512] gmc_v9_0_process_interrupt: 13 callbacks suppressed
[ 4335.012516] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012516] )
[ 4335.012521] amdgpu 0000:08:00.0: at address 0x0000000101c0d000 from 27
[ 4335.012523] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012530] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012530] )
[ 4335.012532] amdgpu 0000:08:00.0: at address 0x0000000101c11000 from 27
[ 4335.012534] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012540] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012540] )
[ 4335.012542] amdgpu 0000:08:00.0: at address 0x0000000101c29000 from 27
[ 4335.012544] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012550] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012550] )
[ 4335.012552] amdgpu 0000:08:00.0: at address 0x0000000101c2d000 from 27
[ 4335.012554] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012560] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012560] )
[ 4335.012562] amdgpu 0000:08:00.0: at address 0x0000000101c10000 from 27
[ 4335.012563] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012569] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012569] )
[ 4335.012571] amdgpu 0000:08:00.0: at address 0x0000000101c2c000 from 27
[ 4335.012572] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012579] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012579] )
[ 4335.012581] amdgpu 0000:08:00.0: at address 0x0000000101c2b000 from 27
[ 4335.012582] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012588] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012588] )
[ 4335.012590] amdgpu 0000:08:00.0: at address 0x0000000101c0f000 from 27
[ 4335.012591] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012597] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012597] )
[ 4335.012599] amdgpu 0000:08:00.0: at address 0x0000000101c29000 from 27
[ 4335.012601] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4335.012607] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 863 thread amdgpu_cs:0 pid 870
[ 4335.012607] )
[ 4335.012609] amdgpu 0000:08:00.0: at address 0x0000000101c2a000 from 27
[ 4335.012610] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 4345.036479] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=120458, emitted seq=120461
[ 4345.036486] [drm] GPU recovery disabled.

Tags: bot-comment
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1804505/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1804505

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.20 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20-rc4

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Steven Ellis (steven-openmedia) wrote :

I appear to have a new variant of this issue using the Athlon 200GE CPU

Reference
 - https://bugs.launchpad.net/linux/+bug/1810546

Revision history for this message
Cannot Remember (dinkidonk) wrote :

I can add to this that I had lots of video lockups with 4.19.1, a day without was a wonder! Then I upgraded to 4.19.5 which did not help either. During one lockup I noticed that sound from a video was continuing to play for a while, so I tried to stick a flash drive into a USB port, and the drive was actually registered both being plugged in and removed again (syslog entries). Since the kernel seemed to continue to operate even when video was frozen, I thought this may be a MESA issue so I upgraded MESA using the Ubuntu-X Team PPA. That did not help either, so I upgraded to kernel 4.19.9 and this was a HUGE improvement in stability, no lockups for almost 2 weeks! But eventually it locked up, still with the same error being logged. Now I am on kernel 4.20, and there has been no problems for almost a week, but only time will tell... :)

Revision history for this message
Cannot Remember (dinkidonk) wrote :

Just had a lockup with kernel 4.20 :(

[ 6637.284879] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=84488, emitted seq=84491
[ 6637.284887] [drm] GPU recovery disabled.

Revision history for this message
Cannot Remember (dinkidonk) wrote :
Download full text (3.4 KiB)

And another lockup just a few hours later :'(

[ 8604.056686] gmc_v9_0_process_interrupt: 14 callbacks suppressed
[ 8604.056693] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056700] amdgpu 0000:08:00.0: in page starting at address 0x0000800103402000 from 27
[ 8604.056703] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 8604.056712] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056716] amdgpu 0000:08:00.0: in page starting at address 0x0000800103404000 from 27
[ 8604.056718] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056728] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056731] amdgpu 0000:08:00.0: in page starting at address 0x0000800103402000 from 27
[ 8604.056734] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056746] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056750] amdgpu 0000:08:00.0: in page starting at address 0x0000800103404000 from 27
[ 8604.056753] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056762] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056764] amdgpu 0000:08:00.0: in page starting at address 0x0000800103401000 from 27
[ 8604.056767] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056775] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056778] amdgpu 0000:08:00.0: in page starting at address 0x0000800103406000 from 27
[ 8604.056780] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056788] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056791] amdgpu 0000:08:00.0: in page starting at address 0x0000800103407000 from 27
[ 8604.056793] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056801] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056804] amdgpu 0000:08:00.0: in page starting at address 0x0000800103401000 from 27
[ 8604.056806] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056814] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056817] amdgpu 0000:08:00.0: in page starting at address 0x0000800103409000 from 27
[ 8604.056819] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8604.056827] amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 933 thread Xorg:cs0 pid 945)
[ 8604.056829] amdgpu 0000:08:00.0: in page starting at address 0x00008001...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please try latest linux-firmware:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/

New AMDGPU firmwares were included in December.

Revision history for this message
experimancer (experimancer) wrote :
Download full text (17.2 KiB)

I have it too: (random hangups of whole system either under load/idle)
- Ryzen5 2400 G
- Ubuntu 18.04
- kernel 4.18

$ sudo lshw -c video
  *-display
       description: VGA compatible controller
       product: Advanced Micro Devices, Inc. [AMD/ATI]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0b:00.0
       version: c6
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi msix vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:88 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:d000(size=256) memory:fe300000-fe37ffff memory:c0000-dfff

$ $ lspci -v
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15d0
 Subsystem: ASUSTeK Computer Inc. Device 876b
 Flags: fast devsel

00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 15d1
 Subsystem: Advanced Micro Devices, Inc. [AMD] Device 15d1
 Flags: bus master, fast devsel, latency 0, IRQ 27
 Capabilities: <access denied>

00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
 Flags: fast devsel

00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3 (prog-if 00 [Normal decode])
 Flags: bus master, fast devsel, latency 0, IRQ 28
 Bus: primary=00, secondary=01, subordinate=09, sec-latency=0
 I/O behind bridge: 0000e000-0000efff
 Memory behind bridge: fe400000-fe7fffff
 Capabilities: <access denied>
 Kernel driver in use: pcieport

00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3 (prog-if 00 [Normal decode])
 Flags: bus master, fast devsel, latency 0, IRQ 29
 Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0
 Memory behind bridge: fe900000-fe9fffff
 Capabilities: <access denied>
 Kernel driver in use: pcieport

00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
 Flags: fast devsel

00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15db (prog-if 00 [Normal decode])
 Flags: bus master, fast devsel, latency 0, IRQ 26
 Bus: primary=00, secondary=0b, subordinate=0b, sec-latency=0
 I/O behind bridge: 0000d000-0000dfff
 Memory behind bridge: fe000000-fe3fffff
 Prefetchable memory behind bridge: 00000000e0000000-00000000f01fffff
 Capabilities: <access denied>
 Kernel driver in use: pcieport

00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15dc (prog-if 00 [Normal decode])
 Flags: bus master, fast devsel, latency 0, IRQ 26
 Bus: primary=00, secondary=0c, subordinate=0c, sec-latency=0
 Memory behind bridge: fe800000-fe8fffff
 Capabilities: <access denied>
 Kernel driver in use: pcieport

00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
 Subsystem: ASUSTeK Computer Inc. FCH SMBus Controller
 Flags: 66MHz, medium devsel
 Kernel driver in use: piix4_smbus
 Kernel modules: i2c_piix4, sp5100_tco

00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
 Subsystem: ASUSTeK Computer Inc. FCH LPC Bridge
 Flags: bus master, 66MHz, medium devsel, latency 0

00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Devic...

Revision history for this message
experimancer (experimancer) wrote :

More info about my system:

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 17
model name : AMD Ryzen 5 2400G with Radeon Vega Graphics
stepping : 0
microcode : 0x810100b
cpu MHz : 3531.252
cache size : 512 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7400.52
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.2.2
OpenGL version string: 4.4 (Compatibility Profile) Mesa 18.2.2
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.2.2

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.