linux 4.10 and AMD Polaris11 card -> graphics crash

Bug #1696240 reported by Török Edwin on 2017-06-06
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Zesty
Medium
Unassigned

Bug Description

Using 4.10.0-22-generic from Ubuntu and running any of the Unigine benchmarks (Heaven-4.0, Valley-1.0, Superposition-1.0) causes the screen to go black and the graphics system to crash.
The graphics card's fan stops working and sensors reports 511C, clearly wrong.

I can still login via SSH and attempt to stop X, however the application (e.g. heaven) just remains in a zombie state and the system is unusable, I can't start X again. In fact the graphics card ends up in a pretty bad state, because if I press the reset button the UEFI BIOS is not able to detect it anymore, I have to power the whole system off and on again to make the card work.

Upgrading to mainline 4.11.3 avoids this problem: all 3 benchmarks are running fine, with no crashes.

I've attached two dmesgs: one with the default, where IOMMU is on and I get lots of AMD-Vi warnings logged:
[ 439.903842] ------------[ cut here ]------------
[ 439.903848] WARNING: CPU: 5 PID: 0 at /build/linux-nOqmtv/linux-4.10.0/drivers/iommu/amd_iommu.c:1252 __domain_flush_pages+0x1f7/0x220
[ 439.903848] Modules linked in: overlay ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap video edac_mce_amd edac_core kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek arc4 aes_x86_64 crypto_simd glue_helper cryptd snd_hda_codec_generic ath9k snd_hda_codec_hdmi ath9k_common ath9k_hw snd_hda_intel snd_hda_codec snd_hda_core ath snd_hwdep input_leds joydev mac80211 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq fam15h_power i2c_piix4 snd_seq_device
[ 439.903873] snd_timer snd k10temp mac_hid soundcore tpm_infineon shpchp tcp_bbr sch_fq cuse parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect r8169 sysimgblt fb_sys_fops mii drm ahci libahci fjes wmi
[ 439.903893] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.10.0-22-generic #24-Ubuntu
[ 439.903894] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014
[ 439.903895] Call Trace:
[ 439.903896] <IRQ>
[ 439.903899] dump_stack+0x63/0x81
[ 439.903900] __warn+0xcb/0xf0
[ 439.903901] warn_slowpath_null+0x1d/0x20
[ 439.903903] __domain_flush_pages+0x1f7/0x220
[ 439.903904] __queue_flush+0x4b/0xd0
[ 439.903905] ? queue_flush_all+0x90/0x90
[ 439.903907] queue_flush_all+0x77/0x90
[ 439.903908] queue_flush_timeout+0x18/0x20
[ 439.903910] call_timer_fn+0x35/0x140
[ 439.903911] run_timer_softirq+0x215/0x4b0
[ 439.903912] ? ktime_get+0x41/0xb0
[ 439.903914] ? lapic_next_event+0x1d/0x30
[ 439.903916] ? clockevents_program_event+0x7f/0x120
[ 439.903918] __do_softirq+0x104/0x2af
[ 439.903919] irq_exit+0xb6/0xc0
[ 439.903921] smp_apic_timer_interrupt+0x3d/0x50
[ 439.903922] apic_timer_interrupt+0x89/0x90
[ 439.903924] RIP: 0010:cpuidle_enter_state+0x122/0x2c0
[ 439.903925] RSP: 0018:ffffb4e181a23e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 439.903926] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000001f
[ 439.903926] RDX: 0000006665f96c97 RSI: ffff9dbcded56a98 RDI: 0000000000000000
[ 439.903927] RBP: ffffb4e181a23e98 R08: cccccccccccccccd R09: 0000000000000018
[ 439.903927] R10: 0000000000000da8 R11: 0000000000003557 R12: ffff9dbcd036b600
[ 439.903928] R13: ffffffffbaeeba38 R14: 0000000000000002 R15: ffffffffbaeeba20
[ 439.903929] </IRQ>
[ 439.903930] ? cpuidle_enter_state+0x110/0x2c0
[ 439.903931] cpuidle_enter+0x17/0x20
[ 439.903933] call_cpuidle+0x23/0x40
[ 439.903934] do_idle+0x189/0x200
[ 439.903935] cpu_startup_entry+0x71/0x80
[ 439.903937] start_secondary+0x154/0x190
[ 439.903938] start_cpu+0x14/0x14
[ 439.903939] ---[ end trace 9edd64d3e01a6c8c ]---

And another one with iommu=soft boot option, where nothing interesting in dmesg shows up, but the system still crashes.

Note: if I turn IOMMU off completely then USB devices are not working and I cannot use my keyboard/mouse so I cannot test that scenario.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-generic 4.10.0.22.24
ProcVersionSignature: Ubuntu 4.10.0-22.24-generic 4.10.15
Uname: Linux 4.10.0-22-generic x86_64
ApportVersion: 2.20.4-0ubuntu4.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: edwin 2753 F.... pulseaudio
 /dev/snd/controlC2: edwin 2753 F.... pulseaudio
 /dev/snd/controlC1: edwin 2753 F.... pulseaudio
Date: Tue Jun 6 21:09:45 2017
HibernationDevice: RESUME=UUID=3401e45a-9619-4ae8-9e4d-6dc1e7982524
InstallationDate: Installed on 2017-03-25 (72 days ago)
InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Beta amd64 (20170321.1)
MachineType: To be filled by O.E.M. To be filled by O.E.M.
ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-22-generic root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-22-generic N/A
 linux-backports-modules-4.10.0-22-generic N/A
 linux-firmware 1.164.1
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/07/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2501
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: M5A99FX PRO R2.0
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2501:bd04/07/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnASUSTeKCOMPUTERINC.:rnM5A99FXPROR2.0:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: To be filled by O.E.M.

Török Edwin (edwintorok) wrote :
Török Edwin (edwintorok) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream stable kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.10 stable kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.17/

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
status: New → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-da-key
Török Edwin (edwintorok) wrote :

Ubuntu 4.10.0-22-generic: crash
mainline 4.10.17: crash
mainline 4.11.3: OK

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu Zesty):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Kai-Heng Feng (kaihengfeng) wrote :

Probably a good idea to use out-of-tree amdgpu.ko for now, until Artful release.

Download the driver from [1].
Extract the driver.
There's a package that is named "dkms", install that package:
$ sudo dpkg -i *dkms*.deb

If there are missing dependencies:
$ sudo apt -f install

[1] https://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers