Comment 60 for bug 1810546

Revision history for this message
experimancer (experimancer) wrote :

I have it too: (random hangups of whole system either under load/idle)
- Ryzen5 2400 G
- Ubuntu 18.04 and 18.10
- kernel 4.18

Log: /var/log/syslog (or kern.log) juts before the freeze, after which only har reset, power off/on makes ysstem boot agian:

Jan 15 02:34:13 elrond kernel: [ 476.185425] gmc_v9_0_process_interrupt: 32 callbacks suppressed
Jan 15 02:34:13 elrond kernel: [ 476.185429] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 v
mid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185433] amdgpu 0000:0b:00.0: at page 0x000000010780a000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185434] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Jan 15 02:34:13 elrond kernel: [ 476.185440] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 v
mid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185441] amdgpu 0000:0b:00.0: at page 0x000000010780b000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185443] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185448] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 v
mid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185449] amdgpu 0000:0b:00.0: at page 0x0000000107805000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185450] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185455] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185457] amdgpu 0000:0b:00.0: at page 0x0000000107806000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185458] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185463] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185464] amdgpu 0000:0b:00.0: at page 0x0000000107808000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185465] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185470] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185471] amdgpu 0000:0b:00.0: at page 0x0000000107809000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185473] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185478] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185479] amdgpu 0000:0b:00.0: at page 0x0000000107803000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185480] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185485] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185486] amdgpu 0000:0b:00.0: at page 0x0000000107804000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185487] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185492] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185493] amdgpu 0000:0b:00.0: at page 0x0000000107809000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185494] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:13 elrond kernel: [ 476.185500] amdgpu 0000:0b:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768)
Jan 15 02:34:13 elrond kernel: [ 476.185501] amdgpu 0000:0b:00.0: at page 0x0000000107808000 from 27
Jan 15 02:34:13 elrond kernel: [ 476.185502] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Jan 15 02:34:23 elrond kernel: [ 486.421830] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=41331, last emitted seq=41333
Jan 15 02:34:23 elrond kernel: [ 486.421834] [drm] GPU recovery disabled.

$ sudo lshw -c video
  *-display
       description: VGA compatible controller
       product: Advanced Micro Devices, Inc. [AMD/ATI]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0b:00.0
       version: c6
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi msix vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:88 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:d000(size=256) memory:fe300000-fe37ffff memory:c0000-dfff

$ uname -a
Linux elrond 4.18.20-041820-generic #201812030624 SMP Mon Dec 3 11:25:55 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ glxinfo | grep Mesa
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.2.2
OpenGL version string: 4.4 (Compatibility Profile) Mesa 18.2.2
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.2.2

$ dmesg | grep drm
[ 1.641459] [drm] amdgpu kernel modesetting enabled.
[ 1.645616] fb: switching to amdgpudrmfb from EFI VGA
[ 1.645835] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1043:0x876B 0xC6).
[ 1.645844] [drm] register mmio base: 0xFE300000
[ 1.645845] [drm] register mmio size: 524288
[ 1.645851] [drm] probing gen 2 caps for device 1022:15db = 700d03/e
[ 1.645853] [drm] probing mlw for device 1022:15db = 700d03
[ 1.645855] [drm] add ip block number 0 <soc15_common>
[ 1.645856] [drm] add ip block number 1 <gmc_v9_0>
[ 1.645856] [drm] add ip block number 2 <vega10_ih>
[ 1.645857] [drm] add ip block number 3 <psp>
[ 1.645857] [drm] add ip block number 4 <powerplay>
[ 1.645858] [drm] add ip block number 5 <dm>
[ 1.645859] [drm] add ip block number 6 <gfx_v9_0>
[ 1.645859] [drm] add ip block number 7 <sdma_v4_0>
[ 1.645860] [drm] add ip block number 8 <vcn_v1_0>
[ 1.645889] [drm] VCN decode is enabled in VM mode
[ 1.645890] [drm] VCN encode is enabled in VM mode
[ 1.669303] [drm] BIOS signature incorrect 0 0
[ 1.669348] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 1.669358] [drm] Detected VRAM RAM=1024M, BAR=1024M
[ 1.669359] [drm] RAM width 128bits DDR4
[ 1.669466] [drm] amdgpu: 1024M of VRAM memory ready
[ 1.669467] [drm] amdgpu: 3072M of GTT memory ready.
[ 1.669473] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 1.669634] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 1.670752] [drm] use_doorbell being set to: [true]
[ 1.670839] [drm] Found VCN firmware Version: 1.73 Family ID: 18
[ 1.670841] [drm] PSP loading VCN firmware
[ 1.843030] [drm] Display Core initialized with v3.1.44!
[ 1.868553] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 1.868554] [drm] Driver supports precise vblank timestamp query.
[ 1.891323] [drm] VCN decode and encode initialized successfully.
[ 1.892629] [drm] fb mappable at 0xA1100000
[ 1.892630] [drm] vram apper at 0xA0000000
[ 1.892630] [drm] size 8294400
[ 1.892630] [drm] fb depth is 24
[ 1.892631] [drm] pitch is 7680
[ 1.892683] fbcon: amdgpudrmfb (fb0) is primary device
[ 1.950133] amdgpu 0000:0b:00.0: fb0: amdgpudrmfb frame buffer device
[ 1.966332] [drm] Initialized amdgpu 3.26.0 20150101 for 0000:0b:00.0 on minor 0