Ubuntu
linux package

Bug #1782716
Comment #5

Comment 5 for bug 1782716

Revision history for this message

Joel Stanley (shenki) wrote on 2018-07-30:

With -rc7:

[ 333.596521] EEH: PHB#33 failure detected, location: N/A
[ 333.596563] CPU: 12 PID: 811 Comm: kworker/u257:1 Not tainted 4.18.0-041800rc7-generic #201807292230
[ 333.596576] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 333.596578] Call Trace:
[ 333.596582] [c000000fec1036c0] [c000000000d4d6fc] dump_stack+0xb0/0xf4 (unreliable)
[ 333.596587] [c000000fec103700] [c00000000003b114] eeh_dev_check_failure+0x514/0x5e0
[ 333.596589] [c000000fec1037a0] [c00000000003b26c] eeh_check_failure+0x8c/0xd0
[ 333.596616] [c000000fec1037e0] [c00800000d5119f8] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
[ 333.596649] [c000000fec103840] [c00800000d623250] amdgpu_cgs_read_register+0x28/0x50 [amdgpu]
[ 333.596685] [c000000fec103860] [c00800000d6ce81c] dce110_timing_generator_get_vblank_counter+0x44/0x70 [amdgpu]
[ 333.596717] [c000000fec103880] [c00800000d6f4430] dc_stream_get_vblank_counter+0x88/0xb0 [amdgpu]
[ 333.596752] [c000000fec1038a0] [c00800000d67f5f4] dm_vblank_get_counter+0x4c/0xa8 [amdgpu]
[ 333.596774] [c000000fec103900] [c00800000d518630] amdgpu_get_vblank_counter_kms+0xa8/0x250 [amdgpu]
[ 333.596808] [c000000fec1039b0] [c00800000d67c1b8] amdgpu_dm_do_flip+0xd0/0x3b0 [amdgpu]
[ 333.596844] [c000000fec103b00] [c00800000d6801a0] amdgpu_dm_atomic_commit_tail+0x7f8/0xf20 [amdgpu]
[ 333.596850] [c000000fec103c60] [c00800000cb72da4] commit_tail+0x6c/0xe0 [drm_kms_helper]
[ 333.596853] [c000000fec103c90] [c0000000001385f0] process_one_work+0x2b0/0x560
[ 333.596855] [c000000fec103d20] [c000000000138928] worker_thread+0x88/0x610
[ 333.596858] [c000000fec103dc0] [c0000000001415dc] kthread+0x1ac/0x1c0
[ 333.596861] [c000000fec103e30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
[ 333.596886] EEH: Detected error on PHB#33
[ 333.596890] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
[ 333.596891] EEH: Notify device drivers to shutdown
[ 333.596895] EEH: Beginning: 'error_detected(IO frozen)'
[ 333.596898] EEH: PE#1fe (PCI 0033:00:00.0): no driver
[ 333.596900] EEH: PE#0 (PCI 0033:01:00.1): driver not EEH aware
[ 333.596902] EEH: PE#0 (PCI 0033:01:00.0): driver not EEH aware
[ 333.596904] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'none'
[ 333.596908] EEH: Collect temporary log
[ 333.596910] PHB4 PHB#51 Diag-data (Version: 1)
[ 333.596911] brdgCtl: 00000002
[ 333.596913] RootSts: 00000020 00402000 e9010008 00100107 00000000
[ 333.596915] nFir: 0000800000000000 0030001c00000000 0000800000000000
[ 333.596916] PhbSts: 0000001800000000 0000001800000000
[ 333.596918] Lem: 0004000100000100 0000000000000000 0000000100000000
[ 333.596919] PhbErr: 000005a000000000 0000008000000000 2148000098000240 a008400000000000
[ 333.596921] PhbTxeErr: 0000200000000000 0000200000000000 4000000000000000 0000000000000000
[ 333.596923] RxeMrgErr: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
[ 333.596925] PblErr: 0000000000000800 0000000000000800 0000000000000000 00000000028de410
[ 333.596926] RegbErr: 0010001000000000 0010000000000000 4800003c00000000 0000000000000200
[ 333.596929] EEH: Reset with hotplug activity
[ 334.084373] iommu: Removing device 0033:01:00.1 from group 3
[ 334.084445] pci 0033:01:00.1: Dropping the link to 0033:01:00.0
[ 334.085057] [drm] amdgpu: finishing device.
[ 343.769080] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=590, last emitted seq=591
[ 343.769186] [drm] GPU recovery disabled.
[ 344.281128] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last signaled seq=349, last emitted seq=350
[ 344.281189] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0] hw_done or flip_done timed out
[ 344.281390] [drm] GPU recovery disabled.

With -rc7:

[  333.596521] EEH: PHB#33 failure detected, location: N/A
[  333.596563] CPU: 12 PID: 811 Comm: kworker/u257:1 Not tainted 4.18.0-041800rc7-generic #201807292230
[  333.596576] Workqueue: events_unbound commit_work [drm_kms_helper]
[  333.596578] Call Trace:
[  333.596582] [c000000fec1036c0] [c000000000d4d6fc] dump_stack+0xb0/0xf4 (unreliable)
[  333.596587] [c000000fec103700] [c00000000003b114] eeh_dev_check_failure+0x514/0x5e0
[  333.596589] [c000000fec1037a0] [c00000000003b26c] eeh_check_failure+0x8c/0xd0
[  333.596616] [c000000fec1037e0] [c00800000d5119f8] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
[  333.596649] [c000000fec103840] [c00800000d623250] amdgpu_cgs_read_register+0x28/0x50 [amdgpu]
[  333.596685] [c000000fec103860] [c00800000d6ce81c] dce110_timing_generator_get_vblank_counter+0x44/0x70 [amdgpu]
[  333.596717] [c000000fec103880] [c00800000d6f4430] dc_stream_get_vblank_counter+0x88/0xb0 [amdgpu]
[  333.596752] [c000000fec1038a0] [c00800000d67f5f4] dm_vblank_get_counter+0x4c/0xa8 [amdgpu]
[  333.596774] [c000000fec103900] [c00800000d518630] amdgpu_get_vblank_counter_kms+0xa8/0x250 [amdgpu]
[  333.596808] [c000000fec1039b0] [c00800000d67c1b8] amdgpu_dm_do_flip+0xd0/0x3b0 [amdgpu]
[  333.596844] [c000000fec103b00] [c00800000d6801a0] amdgpu_dm_atomic_commit_tail+0x7f8/0xf20 [amdgpu]
[  333.596850] [c000000fec103c60] [c00800000cb72da4] commit_tail+0x6c/0xe0 [drm_kms_helper]
[  333.596853] [c000000fec103c90] [c0000000001385f0] process_one_work+0x2b0/0x560
[  333.596855] [c000000fec103d20] [c000000000138928] worker_thread+0x88/0x610
[  333.596858] [c000000fec103dc0] [c0000000001415dc] kthread+0x1ac/0x1c0
[  333.596861] [c000000fec103e30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
[  333.596886] EEH: Detected error on PHB#33
[  333.596890] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
[  333.596891] EEH: Notify device drivers to shutdown
[  333.596895] EEH: Beginning: 'error_detected(IO frozen)'
[  333.596898] EEH: PE#1fe (PCI 0033:00:00.0): no driver
[  333.596900] EEH: PE#0 (PCI 0033:01:00.1): driver not EEH aware
[  333.596902] EEH: PE#0 (PCI 0033:01:00.0): driver not EEH aware
[  333.596904] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'none'
[  333.596908] EEH: Collect temporary log
[  333.596910] PHB4 PHB#51 Diag-data (Version: 1)
[  333.596911] brdgCtl:    00000002
[  333.596913] RootSts:    00000020 00402000 e9010008 00100107 00000000
[  333.596915] nFir:       0000800000000000 0030001c00000000 0000800000000000
[  333.596916] PhbSts:     0000001800000000 0000001800000000
[  333.596918] Lem:        0004000100000100 0000000000000000 0000000100000000
[  333.596919] PhbErr:     000005a000000000 0000008000000000 2148000098000240 a008400000000000
[  333.596921] PhbTxeErr:  0000200000000000 0000200000000000 4000000000000000 0000000000000000
[  333.596923] RxeMrgErr:  0000000000000001 0000000000000001 0000000000000000 0000000000000000
[  333.596925] PblErr:     0000000000000800 0000000000000800 0000000000000000 00000000028de410
[  333.596926] RegbErr:    0010001000000000 0010000000000000 4800003c00000000 0000000000000200
[  333.596929] EEH: Reset with hotplug activity
[  334.084373] iommu: Removing device 0033:01:00.1 from group 3
[  334.084445] pci 0033:01:00.1: Dropping the link to 0033:01:00.0
[  334.085057] [drm] amdgpu: finishing device.
[  343.769080] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=590, last emitted seq=591
[  343.769186] [drm] GPU recovery disabled.
[  344.281128] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last signaled seq=349, last emitted seq=350
[  344.281189] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0] hw_done or flip_done timed out
[  344.281390] [drm] GPU recovery disabled.

Ubuntulinux package

Comment 5 for bug 1782716

Ubuntu
linux package