Acer Aspire A315 IOAPIC failure on Ubuntu 18.04, kernel hangs, can't load, kernel freeze (AMD Ryzen 5/Radeon/Raven) / AMDGPU Hybrid crash

Bug #1776563 reported by Richard Baka on 2018-06-12
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Incomplete
Medium
amd
Undecided
Unassigned
linux (Ubuntu)
Medium
Unassigned
linux-firmware (Ubuntu)
Undecided
Unassigned

Bug Description

CPU: Ryzen 5 2500U
VGA: Radeon 535
Notebook: Acer Aspire A315

This is a brand new notebook on the market with Ryzen 5/Radeon.
The default kernel of Ubuntu(18.04) hangs at loading with message:

tsc: Refined TSC clocksource calibration: 1996.250 MHz
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: (...), max_idle_ns: (...)
Soft lockup

Using pci=noacpi kernel parameter kernel loads without any problem but my notebook produces more heat than on Win10. If I know right Acer notebooks need ACPI to the correct power management.

The same thing happens on mainline 4.17,4.18rc1-2.
BIOS upgrade to the latest version: 1.08 hasn't helped

This problem has been reported upstream:
https://bugzilla.kernel.org/show_bug.cgi?id=200087

The latest correctly working kernel was 4.13.* but the heat problem was present with this too.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1776563

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic

apport-collect 1776563 can't be entered because the kernel can not load.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
summary: - Acer Aspire A315 (Ryzen5/Radeon/FHD) Ubuntu 18.04 kernel cant load
+ Ubuntu 18.04 kernel can't load kernel on Acer Aspire A315
+ (Ryzen5/Radeon/FHD)
summary: - Ubuntu 18.04 kernel can't load kernel on Acer Aspire A315
- (Ryzen5/Radeon/FHD)
+ Ubuntu 18.04 can't load kernel on Acer Aspire A315 (Ryzen5/Radeon/FHD)
no longer affects: bugzilla (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-amdgpu (Ubuntu):
status: New → Confirmed
Freihut (freihut) wrote :

Had this on my A315 too, but I returned it to the vendor. Seems to be an UEFI-Bug, because it doesn't happened with my Ryzen 2500U from HP. Could also be related to that Ryzen/Radeon 535 combination (Vega/CGN 3).

On Grub-Menu press E and add "pci=noacpi" as kernel-parameter (where normally "quite splash" is). Then go on booting by pressing F10.
Sometimes (XFCE) it was also necessary to add "nomodeset" to boot, Gnome for example didn't need it (AFAIK).

I remember, I also needed to install amd's pro driver (for 18.04) via amdgpu-pro-install to get rid of the "nomodeset". I was able to run amdgpu-pro-uninstall later and still not needed the "nomodeset". Could be related to my system, but you may give it a try.
I was also using Kernel 4.17 (Mainline), which is available on http://kernel.ubuntu.com/~kernel-ppa/mainline/ or with UKUU https://www.omgubuntu.co.uk/2017/02/ukuu-easy-way-to-install-mainline-kernel-ubuntu

Richard Baka (bakarichard91) wrote :

Thanks Freihut, I will try this.

Richard Baka (bakarichard91) wrote :

It works but very slow. This could be an ACPI problem.

Richard Baka (bakarichard91) wrote :

I installed the new amdgpu pro driver and everything is very fast now. This bug should be reported to freedesktop, would you like somebody to do it? :D

Richard Baka (bakarichard91) wrote :

*Sorry correction: Who would like to do it? :D

Richard Baka (bakarichard91) wrote :

"The fact that ACPI was designed by a group of monkeys high on LSD, and is some of the worst designs in the industry obviously makes running it at any point pretty damn ugly."
Torvalds, Linus (2005-07-31). Message. linux-kernel mailing list. IU. Retrieved on 2006-08-28.

Richard Baka (bakarichard91) wrote :

Power management doesn't work well this way. It was hot a little. I've changed back to win10. This should be fixed by kernel developers or with a downstream patch.

Created attachment 276583
dmesg after starting kernel with pci=noacpi

This is a brand new notebook on the market with Ryzen 5/Radeon. With disabled ACPI kernel boots without any problem but my notebook produces more heat than on Win10. Otherwise this happens when it is stayed on the bios screen in a while.

CPU: AMD Ryzen 5 2500U
GPU1: AMD Radeon Vega 8
GPU2: AMD Radeon 535

(I wrote to Acer to fix their bios problems but they said Linux is not supported. I don't think they are right but what can I do?)

Created attachment 276585
attachment-31427-0.html

Out of office 6/18-6/27

Created attachment 276587
Soft lockup failure without noacpi

Nothing changes with disabled iommu.

Created attachment 276589
dmesg after amd_iommu_dump=1

[ 0.000000] AMD-Vi: Using IVHD type 0x11
[ 0.000000] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[ 0.000000] AMD-Vi: mmio-addr: 00000000fd900000
[ 0.000000] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:01.0 flags: 00
[ 0.000000] AMD-Vi: DEV_RANGE_END devid: ff:1f.6
[ 0.000000] AMD-Vi: DEV_ALIAS_RANGE devid: ff:00.0 flags: 00 devid_to: 00:14.4
[ 0.000000] AMD-Vi: DEV_RANGE_END devid: ff:1f.7
[ 0.000000] AMD-Vi: DEV_SPECIAL(HPET[0]) devid: 00:14.0
[ 0.000000] AMD-Vi: DEV_SPECIAL(IOAPIC[33]) devid: 00:14.0
[ 0.000000] AMD-Vi: DEV_SPECIAL(IOAPIC[34]) devid: 00:00.1
[ 0.000000] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found

no longer affects: xserver-xorg-video-amdgpu (Ubuntu)

Created attachment 276591
Error message before freezing (without quite splash)

Please try booting with linux 4.18-rc1 or later. Also, please try 4.18-rc1+ with/without ACPI

Hi Erik,

Absolutely the same thing on 4.18rc1 and on rc2 too.

Fedora loads without any additional parameters(mysterious).

[ 0.000000] Switched APIC routing to physical flat.
[ 0.002000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.007000] tsc: Fast TSC calibration using PIT
[ 0.008000] tsc: Detected 1996.299 MHz processor
[ 0.008000] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398d0c7513b, max_idle_ns: 881590744042 ns
[ 0.008000] Calibrating delay loop (skipped), value calculated using timer frequency.. 3992.59 BogoMIPS (lpj=1996299)

Heat production may be still present but I can't measure it because there is no temperature values in "sensors" (there is 5 values in Win10).

Created attachment 277069
Fedora loads without noacpi

summary: - Ubuntu 18.04 can't load kernel on Acer Aspire A315 (Ryzen5/Radeon/FHD)
+ Acer Aspire A315 ACPI failure on Ubuntu 18.04 (Ryzen5/Radeon/FHD)
summary: - Acer Aspire A315 ACPI failure on Ubuntu 18.04 (Ryzen5/Radeon/FHD)
+ Acer Aspire A315 ACPI failure on Ubuntu 18.04 (Ryzen5/Radeon)
summary: - Acer Aspire A315 ACPI failure on Ubuntu 18.04 (Ryzen5/Radeon)
+ Acer Aspire A315 ACPI failure on Ubuntu, kernel hangs, can't load 18.04
+ (Ryzen5/Radeon)
summary: - Acer Aspire A315 ACPI failure on Ubuntu, kernel hangs, can't load 18.04
+ Acer Aspire A315 ACPI failure on Ubuntu 18.04, kernel hangs, can't load
(Ryzen5/Radeon)
description: updated
summary: Acer Aspire A315 ACPI failure on Ubuntu 18.04, kernel hangs, can't load
- (Ryzen5/Radeon)
+ (AMD Ryzen 5/Radeon/Raven)
summary: - Acer Aspire A315 ACPI failure on Ubuntu 18.04, kernel hangs, can't load
- (AMD Ryzen 5/Radeon/Raven)
+ Acer Aspire A315 ACPI failure on Ubuntu 18.04, kernel hangs, can't load,
+ kernel freeze (AMD Ryzen 5/Radeon/Raven)

Erik, I think this is in connection with clocksource calibration but I'm not an expert.

This works:
[ 0.007000] tsc: Fast TSC calibration using PIT
[ 0.008000] tsc: Detected 1996.299 MHz processor
[ 0.008000] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398d0c7513b, max_idle_ns: 881590744042 ns

This doesn't:
[...] tsc: Refined tsc clocksource calibration: ...
[...] clocksource: tsc: mask: 0xfff...f (...)

Changed in linux:
importance: Unknown → Medium
status: Unknown → Incomplete

Hi, I was trying another kernel parameters and noapic seems to work. It is not needed to disable the whole ACPI "service", however I don't know how important apic is. On kernel 4.18 even temperature sensors appear.
Power management is almost perfect if cpu governor is set to powersave.

At least amdgpu crashes now so kernel doesn't start without nomodeset. Could this be an acpi problem or I should ask kernel firmware developers?

Hi,
amdgpu doesn't crash on my a315-41g-r40x (BIOS V1.08) with
  linux-next-next-20180713 compiled with VGA_SWITCHEROO=N
and with
  kernel parameters: ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2

gg71, where have you been till now? :D
Thanks, I will try it.

gg71, it works almost perfectly, thanks again. I have been working on this for ca one month. Please write a mail to me if you have any new info.

The solution for Acer A315-41G-* notebooks: (USE AT YOUR OWN RISK - PLS be very careful)

1. Load kernel with these parameters: ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 nomodeset
This is how it can be done (1. answer/first half 1-4): https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter

1/b.(if it is not installed) Install ubuntu and load installed kernel again using the parameters (see 1.)

2. Start a terminal and do these steps:
> cd ~
> mkdir kernelbuild
> cd kernelbuild
> wget -c https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.17.6.tar.xz
> tar -xvf linux-4.17.6.tar.xz
> cd linux-4.17.6
> sudo apt install git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache bison flex
> make menuconfig
+> Save,OK,EXIT
> nano .config
+> ctrl+w and search for CONFIG_VGA_SWITCHEROO=y
+> replace y with n (this is not ideal and should be fixed later)
+> ctrl+o, enter
> make -j4 (this will take a while, be patient)
> make modules_install
> sudo make install
> sudo nano /etc/default/grub
+> Edit the correct line and add the parameters: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2"
+>CTRL+O, enter
>sudo update-grub
+> reboot and start the correct kernel

If you install xsensors (sudo apt install xsensors) and start it (xsensors) you can monitor the temperature values of your notebook. (Recommended)

Richard Baka (bakarichard91) wrote :

Dear Ubuntu maintainers,

couldn't this be fixed by an ubuntu kernel patch? The hardest part is to disable gpu switching at kernel load time. APIC fixing parameters can be hardcoded for these models I think or search for the correct pci controller using a smart script.

This was a hell of an investigation, never again. Thanks for gg71, he/she is a lifesaver.

Hi Richard:

This issue should be related to the buggy BIOS ivrs table.
Kernel panic when found no southbridge device ID.

Could you try boot kernel with "amd_iommu_dump=1 amd_iommu=off" (remove other kernel parameters you tried to solve this issue).

If it works, please attach the dmesg here.
I will try to make a kernel patch to make kernel boot with irq map disabled instead of panic.

Richard Baka (bakarichard91) wrote :

Hi AaronMa,

thanks for the response. I tried it but it didn't work. I think iommu problem is not the main reason of the kernel hang. Otherwise it can be disabled in BIOS and there is no change.

The main reason is: https://bugzilla.kernel.org/attachment.cgi?id=276587 like you can se on this picture is that IOAPIC[4] and IOAPIC[5] are not in the invrs table so we should search the correct pci controllers using lspci and give them to the kernel.

In this way:
LINUX_DEFAULT="quiet splash ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2"

Kernel can be started even with noapic but two sensors will be missing and the advanced touchpad functions will not work. This is the reason of CONFIG_VGA_SWITCHEROO=n compile time kernel parameter.

There is an another problem: this notebook has two GPUs and amdgpu (or the kernel, I don't know) can not handle this correctly so gpu switching has to be disabled

Richard Baka (bakarichard91) wrote :

Kernel can be started even with noapic but two sensors will be missing and the advanced touchpad functions will not work.

!!!This line is not here: This is the reason of CONFIG_VGA_SWITCHEROO=n compile time kernel parameter.

There is an another problem: this notebook has two GPUs and amdgpu (or the kernel, I don't know) can not handle this correctly so gpu switching has to be disabled
!!!But here: This is the reason of CONFIG_VGA_SWITCHEROO=n compile time kernel parameter.

Richard Baka (bakarichard91) wrote :

AaronMa,

This is the iommu debug:

[ 0.000000] AMD-Vi: Using IVHD type 0x11
[ 0.000000] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[ 0.000000] AMD-Vi: mmio-addr: 00000000fd900000
[ 0.000000] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:01.0 flags: 00
[ 0.000000] AMD-Vi: DEV_RANGE_END devid: ff:1f.6
[ 0.000000] AMD-Vi: DEV_ALIAS_RANGE devid: ff:00.0 flags: 00 devid_to: 00:14.4
[ 0.000000] AMD-Vi: DEV_RANGE_END devid: ff:1f.7
[ 0.000000] AMD-Vi: DEV_SPECIAL(HPET[0]) devid: 00:14.0
[ 0.000000] AMD-Vi: DEV_SPECIAL(IOAPIC[33]) devid: 00:14.0
[ 0.000000] AMD-Vi: DEV_SPECIAL(IOAPIC[34]) devid: 00:00.1
[ 0.000000] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found

I will give you the correct iommu "addresses" after dinner :).

Richard Baka (bakarichard91) wrote :

HOT NEWS!!

CONFIG_VGA_SWITCHEROO=n can be avoided using these kernel parameters amdgpu.runpm=0 radeon.modeset=0.
Further investigation is in progress...

Richard Baka (bakarichard91) wrote :

This could be the better solution because of the notebook's lowest heating but I'm not sure.

Richard Baka (bakarichard91) wrote :
Download full text (4.5 KiB)

Hi all,

After a bit of testing the power management seems to be better but it is far away from perfect. I don't see any anomaly watching temperature sensors (instead of ath10k_hwmon-pci(?!??)) but my notebook is definitely warm if I hold it on my lap.
This is more better on win10, I don't know why.

mosomaci@pc:~$ sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +55.0°C (high = +70.0°C)
Tctl: +55.0°C

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx: +0.81 V
fan1: N/A
temp1: +50.0°C (crit = +104000.0°C, hyst = -273.1°C)
power1: 1.13 kW (cap = 28.00 W)

ath10k_hwmon-pci-0300
Adapter: PCI adapter
temp1: +91.0°C

amdgpu-pci-0400
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
fan1: N/A
temp1: +55.0°C (crit = +80.0°C, hyst = +0.0°C)
power1: N/A

Could our APIC fix not a perfect solution for this problem? I know that the DSDT is totally broken:

[ 0.088280] ACPI: Added _OSI(Module Device)
[ 0.088280] ACPI: Added _OSI(Processor Device)
[ 0.088280] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.088280] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.088280] ACPI: Added _OSI(Linux-Dell-Video)
[ 0.092591] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[ 0.100296] ACPI BIOS Error (bug): Failure creating [\_SB.PCI0.LPC0.EC0._Q46], AE_ALREADY_EXISTS (20180531/dswload2-316)
[ 0.100309] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20180531/psobject-221)
[ 0.100313] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100321] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.LPC0.EC0.UX**], AE_NOT_FOUND (20180531/psargs-330)
[ 0.100326] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100332] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.LPC0.EC0.M000], AE_NOT_FOUND (20180531/psargs-330)
[ 0.100336] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100343] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.LPC0.EC0.M049], AE_NOT_FOUND (20180531/psargs-330)
[ 0.100347] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100353] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.LPC0.EC0.M280], AE_NOT_FOUND (20180531/psargs-330)
[ 0.100357] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100364] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.LPC0.EC0.M009], AE_NOT_FOUND (20180531/psargs-330)
[ 0.100369] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100372] ACPI Error: Skipping While/If block (20180531/psloop-594)
[ 0.100378] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.LPC0.EC0.M000], AE_NOT_FOUND (20180531/psargs-330)
[ 0.100383] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100390] ACPI Error: Cannot release Mutex [QMUX], not acquired (20180531/exmutex-359)
[ 0.100394] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
[ 0.100402] ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.GPP2.BCM5], AE_NOT_FOUND (20180531...

Read more...

Richard Baka (bakarichard91) wrote :

*instead of ath10k_hwmon-pci(?!??) -> except of ath10k_hwmon-pci

Richard Baka (bakarichard91) wrote :

Here is a hiDPI scaling script for Gnome3:

#!/bin/bash
gsettings set org.gnome.desktop.interface scaling-factor 2
eval sleep 1;xrandr --output eDP --scale 1.6x1.6 --panning 3072x1728

Richard Baka (bakarichard91) wrote :

Dear Ubuntu Maintainers,

here is the summary:

1. Kernel freeze can be resolved by using the mentioned kernel parameters:
> ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2

It would be the best if the broken DSTD tables were fixed but I think nobody will do it.
The workaround with the parameters seems to be a correct solution.

2. For the amdgpu crash there is a patch what works correctly. It will be merged to the upstream after testing.
https://bugzilla.kernel.org/show_bug.cgi?id=200517

Patch: https://bugzilla.kernel.org/attachment.cgi?id=277375&action=diff&collapsed=&headers=1&format=raw

summary: - Acer Aspire A315 ACPI failure on Ubuntu 18.04, kernel hangs, can't load,
- kernel freeze (AMD Ryzen 5/Radeon/Raven)
+ Acer Aspire A315 IOAPIC failure on Ubuntu 18.04, kernel hangs, can't
+ load, kernel freeze (AMD Ryzen 5/Radeon/Raven) / AMDGPU Hybrid crash
Richard Baka (bakarichard91) wrote :

@@ -, +, @@
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c | 1 +
 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
+++ a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
@@ -575,6 +575,7 @@ static const struct amdgpu_px_quirk amdgpu_px_quirk_list[] = {
  { 0x1002, 0x6900, 0x1002, 0x0124, AMDGPU_PX_QUIRK_FORCE_ATPX },
  { 0x1002, 0x6900, 0x1028, 0x0812, AMDGPU_PX_QUIRK_FORCE_ATPX },
  { 0x1002, 0x6900, 0x1028, 0x0813, AMDGPU_PX_QUIRK_FORCE_ATPX },
+ { 0x1002, 0x6900, 0x1025, 0x125A, AMDGPU_PX_QUIRK_FORCE_ATPX },
  { 0, 0, 0, 0, 0 },
 };

--

Richard Baka (bakarichard91) wrote :
tags: added: patch
Kai-Heng Feng (kaihengfeng) wrote :

Please send that patch to <email address hidden>

Richard Baka (bakarichard91) wrote :

Hi Kai-Heng Feng,

I've received the patch from Alex Deucher. Is it really needed to send to that mail? He said:

"Assuming it fixes the issue, I'll go ahead and apply it to upstream and stable kernels."

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Kai-Heng Feng (kaihengfeng) wrote :

Right then let's wait for the commit lands in mainline.

Zhang Rui,

why have you changed the topic? It was correct, the kernel loads only with noapic or noacpi. The problem is DSDT/SSDT tables can not load because of ALREADY_EXIST problem. They should be fixed by someone correctly.

Sorry, nothing, I changed it before :)

Richard Baka (bakarichard91) wrote :

VGA fix has been released in ver. 4.18 rc7. SSDT will not be fixed I think.

Kai-Heng Feng (kaihengfeng) wrote :

Have you tried latest amdgpu [1]?

Also, please attach acpidump, thanks!

[1] https://cgit.freedesktop.org/~agd5f/linux/ branch amd-staging-drm-next

you (bountou) wrote :

Hi,

Don't have so much knowledges on ubuntu. Tried to install 18.04 on A315-41-r163 bios 1.08 (ryzen 5 2500u) :

tried to make it works by severals methods... the only way to make it start is : "pci=noacpi" in the kernels params.

kernel 4.18rc8 with "ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2" don't work neither. (I get this = http://prntscr.com/kg5t1j )

The only way to make it work is "pci=noapci".

I paste my boot-info if it can help : http://paste.ubuntu.com/p/4qZrHPK8Tz

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu):
status: New → Confirmed
Kai-Heng Feng (kaihengfeng) wrote :

Also please attach ACPI dump.

Richard Baka (bakarichard91) wrote :

you (bountou), please add the output of lspci -vv
Kai-Heng Feng (kaihengfeng), no I tried the mentioned kernel and that works well.

Richard Baka (bakarichard91) wrote :

you (bountou), IOMMU has to be enabled in BIOS. Could you check that too?

Kai-Heng Feng (kaihengfeng) wrote :

Richard, do you mean that the kernel I built solve the issue for you?

Richard Baka (bakarichard91) wrote :

No, I mean that the 4.18 rc7-8 mainline kernel solved the gpu problem. What is special with your kernel what have you changed on mainline?

Kai-Heng Feng (kaihengfeng) wrote :

It uses latest amdgpu branch.

you (bountou) wrote :

Hi.

I have checked and IMOMMU is rightly disabled.
"AMD-SVM=Disabled
AMD-IOMMU=Disabled"

The lspci -vv = https://pastebin.com/46gZfuDE

I'm installing the Kai-Heng's kernel now.

you (bountou) wrote :

lscpi -vv with root : https://pastebin.com/kGgzPVxf

you (bountou) wrote :

kernel installed, exactly same problem (cf screenshot from post #54), the only way to start is still with pci=noacpi

I don't know how to make an acpi dump, I made one but seems in hex, something unreadable -> https://paste.ubuntu.com/p/z37K3QTQwZ/

you (bountou) wrote :

And this is with IOMMU enabled (and without the pci=noacpi) : http://prntscr.com/khn3jw
I can still start with IOMMU enabled if I add "pci=noacpi".

Richard Baka (bakarichard91) wrote :

you (bountou), this is exactly the same problem what I have. Maybe there is a little difference but not much. We will be able to fix it with a little work.
So this is an ACPI problem. ACPI(Advanced Configuration and Power Interface) is a hardware interface what is afforded by the manufacturer for the Operating System to use the notebook's power management features correctly.
There are tables in the ROM memory of your notebook that contain a lot of information for this. For some reason these are broken in Linux. This is what you see on the attached kernel output screenshot.
This is not a perfect situation.

Things you can do:
a)Install Windows. Generally the tables can be understood correctly by Windows. If not then the manufacturer provides drivers which fix them.
b)Disable ACPI: in this case the OS try to guess the correct behavior but it will never be perfect. This is why noacpi is not the correct solution. (temperature anomalies will occur)
c)Do manual address (Correct solution Lite version) to IOMMU and SMBUS controller. This can be done by the kernel parameters what I wrote: ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 for example. This is good for my notebook type but the values can be different by notebook subtypes.
I need the lspci -vv output after you had enabled IOMMU to give you the correct addresses. I can see your SMBUS is on 00:14.0 but I don't find the IOMMU which is necessary. This is a half-correct way.
-The best solution: recompile the DSDT/SSDT tables. This is the hardcore version because you should dump, decompile fix and recompile the proper tables. This is not easy but this can offer a similar energy management what you can experience on WIN 10.

So first thing to do:
Enable IOMMU, start the kernel with noacpi and copy the lspci -vv output for me. IOMMU should be shown there.

Richard Baka (bakarichard91) wrote :

No sorry, instead of pci=noacpi, try to start the kernel with noapic and then check lspci. It there is a kernel panic by this way then use pci=noacpi.

you (bountou) wrote :

Hi.
Thanks for your precious informations.

I start with noapic and there is the lspci output : https://paste.ubuntu.com/p/jm8MBy4qND/

Obviously I can see the same values as you for iommu (0:00.2)... So I'll try to reboot again with "ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2" instead of "quiet splash".

you (bountou) wrote :

Ok. It start well with "ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2" I probably didnt writed right the first time... (oooops)

It seems a good solution as my touchpad is working and my specials keys too... (but as lscpi is capable to find addresses, I don't know why the kernel couldnt find it by is own too. Anyway, I don't have enough knowledges at all for all of this.)

Thanks for your help. I hope it could help more people with this hardware.

Richard Baka (bakarichard91) wrote :

I'm glad you did it.
Why does the kernel not find the correct addres like lspci? This is a good question. Maybe the kernel doesn't know what to search.
I have a somewhat fixed (not perfectly) DSDT/SSDT for A315-41G (Ryzen 5 + Radeon vega 8/Radeon 535) which provides a better cpu power management(CPU temperature can decrease under 50C) like using the kernel parameter only. If you send me a mail I can give it to you but I don't guarantee an error-free behavior.

you (bountou) wrote :

Yes, can I send you my email in PM in some way from here?

BTW, I've made a fresh install of 18.04 with irvs params.

dmesg looks really different with the kernel : (uploaded on jsfiddle as you can see the RED lines quickly)

with default 4.15 kernel : https://jsfiddle.net/qg34bury/
with last 4.18.0 kernel : https://jsfiddle.net/pkchrt6n/

I didnt installed amdgpu pro... Don't know how and I guess it's not really necessary as all is working good. (dont need nomodeset or anything and no crash)

and there is my TEMPS after 30 mins at low usage : https://prnt.sc/khu5dy

Richard Baka (bakarichard91) wrote :

For me the latest 4.18 is the best. AMDGPU PRO is not necessary. Please copy-paste for me the "sensors" output on 4.18 kernel.

you (bountou) wrote :

This morning (still with 4.18) pc getting hard to start, 90 seconds to boot with 23 seconds of CPU stuck... dmesg here : https://paste.ubuntu.com/p/KbywwGWRSF/
after 5/10 minutes he totally freeze so I was forced to hard reboot (power button).

I'll keep testing this 4.18.0 and if I keep getting problems, I'll retry the 4.15.

there is my sensors, start since 1 min :
$ sensors
ath10k_hwmon-pci-0200
Adapter: PCI adapter
temp1: +75.0°C

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
fan1: N/A
temp1: +44.0°C (crit = +80.0°C, hyst = +0.0°C)
power1: N/A

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +44.9°C (high = +70.0°C)
Tctl: +44.9°C

you (bountou) wrote :

Another kernel parameter to add seems : "rcu_nocbs=0-7" (for my 8 cores)

I still get "CPU stuck for 23 seconds" with it but at least "lscpu" give me 2000 mhz instead of 1600 mhz for my CPU which is the real clock of my CPU. (temp seems not higher)

you (bountou) wrote :

I made a stress for 100 seconds on my CPU to check the temps (stress -c 8 -t 100)
there is my temps : https://prnt.sc/kijyrs

$ sensors
ath10k_hwmon-pci-0200
Adapter: PCI adapter
temp1: +88.0°C

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
fan1: N/A
temp1: +59.0°C (crit = +80.0°C, hyst = +0.0°C)
power1: N/A

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +59.2°C (high = +70.0°C)
Tctl: +59.2°C

You got something similar?

siyia (siyia) wrote :

Richard barka please send me your tweaked dsdt tables, i would like to test them my laptop is a315-41g-r1n2 and my email is <email address hidden>

can confirm bug is also present on Acer Aspire A315-41G with ryzen 3 2200u and radeon 535.

with kernel parameters "ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 iommu=pt" on kernel 4.18.5 my a315 41g laptop works flawlessly without crashes or kernel panics, however one issue is that afte resuming from sleep lscpu reports that the cpu runs constantly at turbo frequency

suspend crashes sometimes with a screen freeze only way is to reboot the laptop. kernel 4.18.8

you (bountou) wrote :

So what's new guys? It's clearly unstable with this CPU/mainboard.... It will be fixed soon?
I mean, If I have to use this laptop for my job, I need something trustable.

siyia (siyia) wrote :
you (bountou) wrote :

Thanks for the update Siyia. I just updated it to 1.09.
Did not see any change. (I'm still with the same 4.18 kernel)

In my logs I just can see one error less from before (kvm disabled)

https://jsfiddle.net/ojy4umer/

siyia (siyia) wrote :

damn i cannot see your log from my cellphone do you have kvm disabled? i do not get such an error

siyia (siyia) wrote :

after bios 1.09 do you still need to add ioapic addresses at boot parameters?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.