linux 4.10 and AMD Polaris11 card -> graphics crash

Bug #1696240 reported by Török Edwin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Unassigned
Zesty
Won't Fix
Medium
Unassigned

Bug Description

Using 4.10.0-22-generic from Ubuntu and running any of the Unigine benchmarks (Heaven-4.0, Valley-1.0, Superposition-1.0) causes the screen to go black and the graphics system to crash.
The graphics card's fan stops working and sensors reports 511C, clearly wrong.

I can still login via SSH and attempt to stop X, however the application (e.g. heaven) just remains in a zombie state and the system is unusable, I can't start X again. In fact the graphics card ends up in a pretty bad state, because if I press the reset button the UEFI BIOS is not able to detect it anymore, I have to power the whole system off and on again to make the card work.

Upgrading to mainline 4.11.3 avoids this problem: all 3 benchmarks are running fine, with no crashes.

I've attached two dmesgs: one with the default, where IOMMU is on and I get lots of AMD-Vi warnings logged:
[ 439.903842] ------------[ cut here ]------------
[ 439.903848] WARNING: CPU: 5 PID: 0 at /build/linux-nOqmtv/linux-4.10.0/drivers/iommu/amd_iommu.c:1252 __domain_flush_pages+0x1f7/0x220
[ 439.903848] Modules linked in: overlay ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap video edac_mce_amd edac_core kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek arc4 aes_x86_64 crypto_simd glue_helper cryptd snd_hda_codec_generic ath9k snd_hda_codec_hdmi ath9k_common ath9k_hw snd_hda_intel snd_hda_codec snd_hda_core ath snd_hwdep input_leds joydev mac80211 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq fam15h_power i2c_piix4 snd_seq_device
[ 439.903873] snd_timer snd k10temp mac_hid soundcore tpm_infineon shpchp tcp_bbr sch_fq cuse parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect r8169 sysimgblt fb_sys_fops mii drm ahci libahci fjes wmi
[ 439.903893] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.10.0-22-generic #24-Ubuntu
[ 439.903894] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014
[ 439.903895] Call Trace:
[ 439.903896] <IRQ>
[ 439.903899] dump_stack+0x63/0x81
[ 439.903900] __warn+0xcb/0xf0
[ 439.903901] warn_slowpath_null+0x1d/0x20
[ 439.903903] __domain_flush_pages+0x1f7/0x220
[ 439.903904] __queue_flush+0x4b/0xd0
[ 439.903905] ? queue_flush_all+0x90/0x90
[ 439.903907] queue_flush_all+0x77/0x90
[ 439.903908] queue_flush_timeout+0x18/0x20
[ 439.903910] call_timer_fn+0x35/0x140
[ 439.903911] run_timer_softirq+0x215/0x4b0
[ 439.903912] ? ktime_get+0x41/0xb0
[ 439.903914] ? lapic_next_event+0x1d/0x30
[ 439.903916] ? clockevents_program_event+0x7f/0x120
[ 439.903918] __do_softirq+0x104/0x2af
[ 439.903919] irq_exit+0xb6/0xc0
[ 439.903921] smp_apic_timer_interrupt+0x3d/0x50
[ 439.903922] apic_timer_interrupt+0x89/0x90
[ 439.903924] RIP: 0010:cpuidle_enter_state+0x122/0x2c0
[ 439.903925] RSP: 0018:ffffb4e181a23e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 439.903926] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000001f
[ 439.903926] RDX: 0000006665f96c97 RSI: ffff9dbcded56a98 RDI: 0000000000000000
[ 439.903927] RBP: ffffb4e181a23e98 R08: cccccccccccccccd R09: 0000000000000018
[ 439.903927] R10: 0000000000000da8 R11: 0000000000003557 R12: ffff9dbcd036b600
[ 439.903928] R13: ffffffffbaeeba38 R14: 0000000000000002 R15: ffffffffbaeeba20
[ 439.903929] </IRQ>
[ 439.903930] ? cpuidle_enter_state+0x110/0x2c0
[ 439.903931] cpuidle_enter+0x17/0x20
[ 439.903933] call_cpuidle+0x23/0x40
[ 439.903934] do_idle+0x189/0x200
[ 439.903935] cpu_startup_entry+0x71/0x80
[ 439.903937] start_secondary+0x154/0x190
[ 439.903938] start_cpu+0x14/0x14
[ 439.903939] ---[ end trace 9edd64d3e01a6c8c ]---

And another one with iommu=soft boot option, where nothing interesting in dmesg shows up, but the system still crashes.

Note: if I turn IOMMU off completely then USB devices are not working and I cannot use my keyboard/mouse so I cannot test that scenario.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-generic 4.10.0.22.24
ProcVersionSignature: Ubuntu 4.10.0-22.24-generic 4.10.15
Uname: Linux 4.10.0-22-generic x86_64
ApportVersion: 2.20.4-0ubuntu4.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: edwin 2753 F.... pulseaudio
 /dev/snd/controlC2: edwin 2753 F.... pulseaudio
 /dev/snd/controlC1: edwin 2753 F.... pulseaudio
Date: Tue Jun 6 21:09:45 2017
HibernationDevice: RESUME=UUID=3401e45a-9619-4ae8-9e4d-6dc1e7982524
InstallationDate: Installed on 2017-03-25 (72 days ago)
InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Beta amd64 (20170321.1)
MachineType: To be filled by O.E.M. To be filled by O.E.M.
ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-22-generic root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-22-generic N/A
 linux-backports-modules-4.10.0-22-generic N/A
 linux-firmware 1.164.1
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/07/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2501
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: M5A99FX PRO R2.0
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2501:bd04/07/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnASUSTeKCOMPUTERINC.:rnM5A99FXPROR2.0:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: To be filled by O.E.M.

Revision history for this message
Török Edwin (edwintorok) wrote :
Revision history for this message
Török Edwin (edwintorok) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream stable kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.10 stable kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.17/

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
status: New → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Török Edwin (edwintorok) wrote :

Ubuntu 4.10.0-22-generic: crash
mainline 4.10.17: crash
mainline 4.11.3: OK

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu Zesty):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Probably a good idea to use out-of-tree amdgpu.ko for now, until Artful release.

Download the driver from [1].
Extract the driver.
There's a package that is named "dkms", install that package:
$ sudo dpkg -i *dkms*.deb

If there are missing dependencies:
$ sudo apt -f install

[1] https://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu Zesty):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.