System doesn't boot properly on Gigabyte AM4 motherboards (AMD Ryzen)

Bug #1671360 reported by Maciej Dziardziel
250
This bug affects 47 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
Medium
Kai-Heng Feng
Zesty
Fix Released
Undecided
Kai-Heng Feng
Artful
Fix Released
Medium
Kai-Heng Feng

Bug Description

[Impact]
Gigabyte AM4 boards users cannot boot Ubuntu successfully.
Commit linux-gpio/fixes babdc22b0ccf4ef5a3075ce6e4afc26b7a279faf "pinctrl/amd: Use regular interrupt instead of chained" can fix the issue.

[Test Case]
All Gigabyte AM4 boards can reproduce the issue.
With the patch, the issue is resolved, per comment #170.

[Regression Potential]
Regression Potential is low. It limits to rather new AMD platform which has pinctrl-amd.
As the commit log says, use chained interrupt is not a good idea. Use regular interrupt is the correct way.

I also test the patch on an AMD laptop, where its touchpad depends on pinctrl-amd. No regression found.

Original bug report:
I'm trying to run ubuntu on Ryzen 1700x with Gigabyte GA-AB350-gaming-3 motherboard,
and it has a load of problems, starting with not being able to boot normally.

During normal boot, on 16.10 as well as 17.04 beta:
system doesn't boot normally, hangs with a lot of "unexpected irq trap at vector 07"
messages displayed.

Following advice from various places, I've tried:disable cpu freq governor and cpu handling in acpi settings

1. add "acpi=off" to boot params

That helps, allowing me to boot into recovery mode, though it leaves me with system seeing only one core, is really slow and still only boots in recovery mode.

2. Compile own kernel using 4.11.rc1 and disabling cpu freq governor and cpu handling in acpi settings. Boot with "quiet loglevel=3" option.

That gets me even further - system sees all cores now. Still only recovery mode though,
but its enough to get info for this bug report.

Some observed problems:

1. dmesg reports *a lot* of messages like this all the time:

[ 163.362068] ->handle_irq(): ffffffff87a7e090,
[ 163.362081] bad_chained_irq+0x0/0x40
[ 163.362089] ->handle_irq(): ffffffff87a7e090,
[ 163.362090] amd_gpio_irq_handler+0x0/0x200
[ 163.362090] ->irq_data.chip(): ffffffff88587e20,
[ 163.362090] ioapic_ir_chip+0x0/0x120
[ 163.362090] ->action(): ffffffff884601c0
[ 163.362091] IRQ_NOPROBE set
[ 163.362099] ->handle_irq(): ffffffff87a7e090,
[ 163.362099] amd_gpio_irq_handler+0x0/0x200
[ 163.362100] ->irq_data.chip(): ffffffff88587e20,
[ 163.362100] ioapic_ir_chip+0x0/0x120
[ 163.362101] ->action(): ffffffff884601c0

I've tried to redirect dmesg to a file, stopped after a short while, it generated 400M of those.

2. Systemd cannot start journald. Perhaps because it cannot cope with amount of kernel logs?

3. Looking at pci, I've noticed something called AMDI0040 (/sys/bus/acpi/devices/AMDI0040, path=_SB_.EMMC), among AMDI0010, AMDI0020, AMDI0030. Those however are mentioned in kernel source, kernel and google are completely silent about AMDI0040.

Phoronix tested ryzen using different motherboard, and it worked better (though not well),
so I suspect it is an issue with motherboard.
---
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
DistroRelease: Ubuntu 17.04
InstallationDate: Installed on 2015-08-06 (581 days ago)
InstallationMedia: Kubuntu 15.10 "Wily Werewolf" - Alpha amd64 (20150728.1)
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=linux
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: zesty
Uname: Linux 4.11.0-rc1-custom x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: Upgraded to zesty on 2017-03-03 (6 days ago)
UserGroups:

_MarkForUpload: True

Revision history for this message
Maciej Dziardziel (fiedzia) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1671360

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
214 comments hidden view all 244 comments
Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

Created attachment 255151
kernel config

I'm trying to run ubuntu on Ryzen 1700x with Gigabyte GA-AB350-gaming-3 motherboard,
and it has a load of problems, starting with not being able to boot normally.

During normal boot, on 16.10 as well as 17.04 beta:
system doesn't boot normally, hangs with a lot of "unexpected irq trap at vector 07"
messages displayed.

Following advice from various places, I've tried to:

1. add "acpi=off" to boot params

That helps, allowing me to boot into recovery mode, though it leaves me with system seeing only one core, is really slow and still only boots in recovery mode.

2. Compile own kernel using 4.11.rc1 and disabling cpu freq governor and cpu handling in acpi settings. Boot with "quiet loglevel=3" option.

That gets me even further - system sees all cores now. Still only recovery mode though,
but its enough to get info for this bug report.

Some observed problems:

1. dmesg reports *a lot* of messages like this all the time:

[ 163.362068] ->handle_irq(): ffffffff87a7e090,
[ 163.362081] bad_chained_irq+0x0/0x40
[ 163.362089] ->handle_irq(): ffffffff87a7e090,
[ 163.362090] amd_gpio_irq_handler+0x0/0x200
[ 163.362090] ->irq_data.chip(): ffffffff88587e20,
[ 163.362090] ioapic_ir_chip+0x0/0x120
[ 163.362090] ->action(): ffffffff884601c0
[ 163.362091] IRQ_NOPROBE set
[ 163.362099] ->handle_irq(): ffffffff87a7e090,
[ 163.362099] amd_gpio_irq_handler+0x0/0x200
[ 163.362100] ->irq_data.chip(): ffffffff88587e20,
[ 163.362100] ioapic_ir_chip+0x0/0x120
[ 163.362101] ->action(): ffffffff884601c0

I've tried to redirect dmesg to a file, stopped after a short while, it generated 400M of those.

2. Systemd cannot start journald. Perhaps because it cannot cope with amount of kernel logs?

3. Looking at pci, I've noticed something called AMDI0040 (/sys/bus/acpi/devices/AMDI0040, path=_SB_.EMMC), among AMDI0010, AMDI0020, AMDI0030. Those however are mentioned in kernel source, kernel and google are completely silent about AMDI0040.

Phoronix tested ryzen using different motherboard, and it worked better (though not well),
so I suspect it is an issue with motherboard.

Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

Created attachment 255153
lspci_vv_nn

Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

Created attachment 255155
dmidecode

Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

Created attachment 255157
find_sys_bus_acpi

Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

Created attachment 255159
content of /sys/bus/acpi/devices/AMDI0040

217 comments hidden view all 244 comments
Revision history for this message
Maciej Dziardziel (fiedzia) wrote : JournalErrors.txt

apport information

tags: added: apport-collected zesty
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Maciej Dziardziel (fiedzia) wrote : Re: System doesn't boot properly on AMD Ryzen / Gigabyte GA-AB350-gaming-3

I've added appport info

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc1/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Maciej Dziardziel (fiedzia) wrote :

The issue had nothing to do with upgrades, its the same problem on 16.10 as well as on 17.04.
I've tested it on my system and on live ubuntu on usb.

I've tried to use mainline kernel and the problem is still there.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Maciej Dziardziel (fiedzia) wrote :

I've found a workaround (requires compiling own kernel, tested on 4.11 and r4.11rc1):

When configuring kernel (I used make menuconfig) disable following things:

Everything under device drivers/gpio support, especially:
memory mapped files/amd promontory support (GPIO_AMDPT)
and pci gpio expanders / amd 8111 gpio driver (GPIO_AMD8111)
but I've turned of everything to make sure nothing is using gpio.

device drivers / pin controllers / amd gpio pin control (PINCTRL_AMD)

Perhaps only one of them could be really necessary, but I didn't tested that.

System boots normally, no problems with journald and no weird messages from dmesg.

214 comments hidden view all 244 comments
Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

I've found a workaround (requires compiling own kernel, tested on 4.11 and r4.11rc1):

When configuring kernel (I used make menuconfig) disable following things:

Everything under device drivers/gpio support, especially:
memory mapped files/amd promontory support (GPIO_AMDPT)
and pci gpio expanders / amd 8111 gpio driver (GPIO_AMD8111)
but I've turned of everything to make sure nothing is using gpio.

device drivers / pin controllers / amd gpio pin control (PINCTRL_AMD)

Perhaps only one of them could be really necessary, but I didn't tested that.

System boots normally, no problems with journald and no weird messages from dmesg.

213 comments hidden view all 244 comments
Revision history for this message
Robert Sandru (rsandru) wrote :

Maciej, do you have any suggestions if there is no previous OS installed? Any way for to fix the installation image so that I can at least get a base installation going?

Revision history for this message
Eric Hartmann (hartmann-eric) wrote :

Guys, I've updated the firmware of my GA-X370-gaming-5 to F4 and with kernel 4.10.1 and acpi=off it's almost working. My last issue is that only one core is available.

I've tested with Fedora 25 (4.8 kernel with backports) and everything is working perfectly : all cores are available so it may due to kernel configuration, but I did not find where the issue is.

Revision history for this message
Maciej Dziardziel (fiedzia) wrote :

> Maciej, do you have any suggestions if there is no previous OS installed?

You may try to boot with those options:

acpi=off quiet loglevel=3

(just acpi=off might be enough). That should get you through installation.

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

Getting same problem on AMD Ryzen X18700 with GIGABYTE AB350-GAMING3 motherboard (F5 - latest bios version)

Setting acpi=off gets me into login screen but fails to authenticate me .. so, I`m stuck there.

This occured when I tried to upgrade from 16.04 to 16.10 ... then I tried installing the 17.04 with same problems ... instal was hitting the IRQ vector 07 problem.

If you guys need any other info, please let me know and I can collect some more.

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

AMD Ryzen X1800 (don`t know how a 7 got in there)

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

authentication does work from terminal ( ctrl-alt-f1 )

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

Just booted 16.04 from disk (try mode) and it works without problems ... attached a screenshot also

Revision history for this message
Heiko Hartmann (yrwyddfa) wrote :

I can confirm that for Ryzen X1700 and same Mainboard as mentioned by Cosmin. Ubuntu Studio 16.04 is working without any problems, Live version of 16.10 messes up with thousands of IRQ traps.

Revision history for this message
Robert Sandru (rsandru) wrote :

Still no dice.

Tried another path:
- Installed on a spare HDD using VMware in Windows (assigned full disk to VMware), no additional drivers or anything changed
- Updated kernel to 4.11rc1 (using prebuild binaries)
- Updated grub parameters: acpi=off quiet loglevel=3
- Rebooted computer with that drive as the boot disk

Still gets stuck during startup. Removing the acpi=off parameter spews again the irq trap messages so same situation.

Setup is 1700X on Gigabyte GA-AX370-Gaming 5 motherboard
For all it's worth I'm also using an NVMe bootdrive - not sure it makes a difference.

Giving up until a kernel dev with the appropriate knowledge looks into this.

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

Installed 16.04 - runs just fine (just like Heiko mentioned above). All 16 cpu`s are visible, however, no way to get temp information (sensors can`t find anything).

Revision history for this message
Robert Sandru (rsandru) wrote :

Cosmin, when you say you installed 16.04, do you mean Ubuntu Studio 16.04 or the standard distribution?

Did you install the OS from scratch using an USB stick of some other media? You were trying to do an update to 17.04 so did 16.04 boot initially correctly? Please provide details, just saying it works isn't helping to pinpoint the issue.

Revision history for this message
Eric Hartmann (hartmann-eric) wrote :

As Maciej Dziardziel suggest, I've rebuild 4.10.1 kernel and deactivated GPIO_AMDPT, GPIO_AMD8111 and PINCTRL_AMD.
The system was booting correctly, however I cannot use any USB keyboard.
So I've rebuild the kernel, deactivating only PINCTRL_AMD and everything is working fine now.

Here is my setup : Aorus GA-X370-Gaming-5 with Ryzen 1800X.

And here is the link of the builds :
* https://mega.nz/#!R493SRIb!UgMpquYIqQuJkmHoY1kgp5rtVSMK7yo7tw_KP3pUKzw
* https://mega.nz/#!YoMmHBRQ!plT0X61pQJWz00rkZ5ANZC50rjNHKID3BZ4v-wia2mE
* https://mega.nz/#!F8801QKD!pRSay0_qWhKqvfl6MKOJSpGSBxf1t-NMspdsYPM5eDQ

Revision history for this message
Huang YangWen (yangwen5301) wrote :

Works find with Ubuntu 16.04 + 4.10.1-041001-generic

The kernel is downloaded from Ubuntu website

motherboard ASUS PRIME B350-PLUS.

Revision history for this message
Huang YangWen (yangwen5301) wrote :

CPU is Ryzen 1700x

However lm-sensor is not working properly.

201 comments hidden view all 244 comments
Revision history for this message
In , fiedzia (fiedzia-linux-kernel-bugs) wrote :

>tested on 4.11 and r4.11rc1

Should be "tested on 4.10 and r4.11rc"

200 comments hidden view all 244 comments
Revision history for this message
Maciej Dziardziel (fiedzia) wrote :

Huang YangWen: can you attach result of those command:

ls -l /sys/bus/acpi/devices/
sudo lspci -vvnn

I wonder what the differences between motherboards are. I suspect Gigabyte has something other boards don't have.

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

@Robert ... sure, let me clarify the events.

1. I had Ubuntu 16.04 LTS for some time already running on my system.
2. I bought AMD Ryzen 1800 + Gigabyte AB350 Gaming 3 and I took the SSD from the old system and pushed it into the new one
3. Everything worked fine, except I had to install NVIDIA drivers (I also bought a better video card)
4. I noticed I couldn`t get CPU temperature, so I tried my luck by upgrading to 16.10 (BAD DECISION)
----> during OS boot the IRQ VECTOR 07 error is repeated over and over
5. I then downloaded the 17.04 on a disk, and tried to install it
----> ended up with IRQ VECTOR 07 error during install (so, no go)
6. I tried the Ubuntu 16.04 LTS from a disk (which was the origin of the OS at step 1 ...) --> worked fine
7. I installed Ubuntu 16.04 LTS and here I am .. back at step 3 :)

I`ve never installed STUDIO, only DESKTOP as in this : http://releases.ubuntu.com/16.04.2/ubuntu-16.04.2-desktop-amd64.iso?_ga=1.156896511.598999195.1489430754

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

"on a disk" I mean on a DVD :)

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

@Maciej : I think you missunderstood ... both Gigabyte (mine) and Asus boards (Huang`s) work fine with 16.04 LTS ... what you need to do .. is ask Huang to try his luck with 16.10 :) or 17.04

So, Huang, can you make 2 DVD`s .. one with 16.10 and another with 17.04 (or 2 bootable flashes) and just "TRY" both of them, without installing ... I wonder if that works for you.

Revision history for this message
Robert Sandru (rsandru) wrote :

Hi Cosmin, thanks a lot for the details.

I can see a couple of differences that I need to look into:
 - Chipset is different B350 vs. X370
 - I attempted the fresh 16.04 install from a USB stick instead of a DVD (I've got no optical drive at all, so can't test that) -> IRQ TRAP.... (same with 16.10 and 17.04 beta)

So my next tentative will be to take the drive out from that computer, install in a different one and perform the 16.04 install and move it back to the Ryzen system. That would be similar to your initial state with a working 16.04.

One last question (:-), was your existing 16.04 running on a special kernel version before you moved your SSD to your new Ryzen system?

Thanks!

Revision history for this message
Robert Sandru (rsandru) wrote :

@Eric: I've managed to boot the system using your custom built kernel!

Not usable really now as the nvidia driver installation is failing but much better...

So clearly the code around PINCTRL_AMD seems to be related.

Revision history for this message
Daniele (protomucca) wrote :

Dear All,

I had exactly the same situation of post #23. Coming from Kubuntu 16.04, bought a new system Ryzen 1800x+GA-AB350 Gaming 3.
First boot, using the "old" SSD worked fine, so I decided to upgrade to version 16.10. The procedure goes well, but after the reboot got plenty of error message "IRQ VECTOR 07".
So I tried with a USB stick and Kubuntu live version 17.04 both beta 1 and Nightly but got the same error.
The only way to make the system working, is to manually select kernel 4.4.0-64-generic from grub.

I've also tried to install ubuntu kernel 4.10 and 4.11rc1, but I always get the same error.

Hope this helps

Revision history for this message
Eric Hartmann (hartmann-eric) wrote :

@Robert, on Ubuntu 16.10 you can install the nvidia-378 drivers that contains the fix for the kernel 4.10 (I'm using it).

The patches are available here also : https://devtalk.nvidia.com/default/topic/995636/linux/-patches-378-13-4-10-and-4-11-rc1/

Revision history for this message
Eric Hartmann (hartmann-eric) wrote :

@Marciej here are the results for GA-X370-Gaming-5 board (with 1800X processor).
Please note that it's the result on 4.10.1 with PINCTRL_AMD removed.

Revision history for this message
Eric Hartmann (hartmann-eric) wrote :

And lspci

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

@Robert : what I have on DVD is an image of 16.04.01 LTS : http://old-releases.ubuntu.com/releases/xenial/ubuntu-16.04.1-desktop-amd64.iso

So, I would suggest you "try / install" this first, and then update to 16.04.02
At least that`s how it worked for me.

Current kernel that I have (didn`t performed any tricks to get this in place, it`s just from installing 16.04.01 and upgrading to 16.04.02 via normal upgrade procedure) :

$ uname -a
Linux alpha-desktop 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Cosmin Mutu (cosmin-mutu) wrote :

By normal upgrade procedure I mean :

1. Right Click on "Power Icon" (top right corner)
2. Select "About this computer"
3. On opened pop-up in the bottom right corner there should be a button which either says "Update" or "System Up-To-Date"

Let me know if the 16.04.01 install worked ... hope it does!

Revision history for this message
anders_c_ (anders-c-) wrote :

Ryzen 1700
Gigabyte ab350 Gaming 3

I created a usb with 16.04.1 and it starts without problems. When i tried to install it asked if I wanted UEFI or bios mode, UEFI failed to install but bios worked.

Revision history for this message
blarg (blargblarg) wrote :

Disable the CPU based USB ports - That's all of the version 3.0 ones. Probably most of the rear and front ports.

Only use the 3.1 or 2.0 USB ports associated with the B350 southbridge.

Revision history for this message
anders_c_ (anders-c-) wrote :

@blarg
Looked around in bios but found no way to disable the CPU USBs.
Tried to boot with nothing connected to them but it failed with same error messages as before.

Revision history for this message
Marc Singer (nextized) wrote :

I can replicate the issue on a Gigabyte GA370x Gaming 5 Mainboard with AMD 1700x. After I upgraded Mint to Linux 4.10.3 the bug even happened on older kernel versions installed with Ukuu (I could only boot on 4.4.0 used as default on Mint 18.2)

Revision history for this message
James Willcox (snorp) wrote :

I have this issue with a Ryzen 1800X and a Gigabyte GA-AB350 Gaming (not 3) mainboard. I can boot the live desktop from USB if I pass acpi=off, but /proc/cpuinfo lists only one core.

Revision history for this message
Eric Joslyn (mirth99) wrote :

I have exactly the same problem with a Gigabyte AB350-Gaming 3 motherboard and a Ryzen 1800+.

- Ubuntu variations (Ubuntu, Neon, Kubuntu, Budgie) and openSUSE do not work out of the box. They stall with a bunch of 'irq trap at vector 07' messages.
- For all of these, after adding acpi=off to the kernel parameters, it runs/installs, but runs on only one core. The system locks up when you try to shutdown.
- I tried other acpi parameters but nothing seems to make a difference. Only acpi=off has an effect.
- Fedora and KaOSx run and install with no obvious problem.

Revision history for this message
Roland Kaiser (roland-kaiser666) wrote :

Here is some additional information. I built upstream kernel 4.11-rc5 with

    CONFIG_PINCTRL_AMD=m
    CONFIG_DEBUG_PINCTRL=y
    CONFIG_DEBUG_GPIO=y

and blacklisted pinctrl-amd so as to able to boot that kernel and set up netconsole. Doing a

    modprobe pinctrl-amd

then results in a crash (obviously) and output along the lines of what you can see in the attached files. Case A seems to be more common than B. In case B, the system can still be used for a few seconds before becoming unresponsive.

summary: - System doesn't boot properly on AMD Ryzen / Gigabyte GA-AB350-gaming-3
+ System doesn't boot properly on Gigabyte AM4 motherboards (AMD Ryzen)
tags: added: xenial yakkety
tags: added: regression-release
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
status: Incomplete → Confirmed
tags: added: kernel-da-key
tags: added: bitesize kernel-oops
183 comments hidden view all 244 comments
Revision history for this message
In , jonnyboysmithy (jonnyboysmithy-linux-kernel-bugs) wrote :

Link to what appears to be the same bug over at the ubuntu launchpad site: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1671360

(its linked there to here, but not vice versa, so doing it above^ )

Revision history for this message
In , mtanski (mtanski-linux-kernel-bugs) wrote :

This appears to impact all of Gigabyte's AMD AM4 motherboards. I have attempted to contact the vendor on their forums here: http://forum.gigabyte.us/thread/886/am4-beta-bios-thread?page=47&scrollTo=4429 . Promised to look into it, but haven't heard back. Offered to fix it myself in upstream in a PM if given enough info.

Revision history for this message
In , dfk_7677 (dfk7677-linux-kernel-bugs) wrote :

Knowing that kernel 4.4 boots normally (Ubuntu 16.04) and 4.8 is the first not booting (Ubuntu 16.10+), can't we assume that the problem is in the commits for 4.8 after 16/09/15?

https://github.com/torvalds/linux/commits/v4.8/drivers/pinctrl/pinctrl-amd.c
https://github.com/torvalds/linux/commits/v4.4/drivers/pinctrl/pinctrl-amd.c

description: updated
Seth Forshee (sforshee)
Changed in linux (Ubuntu Artful):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
Changed in linux (Ubuntu Artful):
status: Confirmed → Fix Committed
Revision history for this message
In , adebeus (adebeus-linux-kernel-bugs) wrote :

I'm getting a similar error on an Acer E5-553G laptop (AMD Excavator, not Ryzen). It doesn't prevent the system from booting but does make the ELAN touchpad stop working shortly after boot, with similar dmesg output (though seemingly on a smaller volume - the bad_chained_irq errors take awhile to page through with less, but don't come anywhere near filling a 400MB file, and stop completely after the touchpad switches off). This occurs with all recent kernel versions I've tried (4.9.30, 4.10.17, 4.11.3) except 4.12-rc2, which doesn't get this error but also doesn't recognize the touchpad at all. With 4.12-rc3, it is back to the same behavior as the 4.11.3 kernel.

The only kernel I have found that works is the Ubuntu 4.4 kernel. The stock 4.4.70 kernel gets the same error, but if I replace the pinctrl-amd.c file with the version from the Ubuntu 16.04 kernel, it works fine. I haven't tried using the Ubuntu 4.4 pinctrl-amd.c file with newer kernels, but presume that it would be incompatible. I did look at the Ubuntu 17.04 kernel tree (4.10) on github and noticed that there's no difference between that pinctrl-amd.c and the upstream one, so it looks like Ubuntu has only fixed this issue in the 4.4 kernel series.

I have not yet tried updating the BIOS since Acer only provides the update as a Windows EXE.

Revision history for this message
In , adebeus (adebeus-linux-kernel-bugs) wrote :

I should also note that I'm using OpenRC and not systemd, which might be why I'm able to complete the boot process.

Changed in linux (Ubuntu Zesty):
status: New → Confirmed
Changed in linux (Ubuntu Zesty):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Stefan Bader (smb)
Changed in linux (Ubuntu Zesty):
status: Fix Released → Fix Committed
tags: added: verification-needed-zesty
tags: added: verification-done-zesty
removed: verification-needed-zesty
22 comments hidden view all 244 comments
Revision history for this message
Christos (christosmichaelas) wrote :

Hi Kai,

Sorry for the late reply.

Kernel v4.11.3

More info of the iso: https://www.archlinux.org/releng/releases/2017.06.01/

Hope this helps!

Christos

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Christos,

What's the kernel version of 17.10 live USB?
Linux version 4.10.0-25 has the fix but 4.10.0-24 does not.

Revision history for this message
Christos (christosmichaelas) wrote :

Hi Kai,

Kernel version for the image I'm using is: 4.10.0-22-generic.

I downloaded the image from the following URL:

http://cdimage.ubuntu.com/daily-live/pending/

Should I download again as I see it was modified today, where as I downloaded it on the 26th.

Thanks

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.7 KiB)

This bug was fixed in the package linux - 4.10.0-26.30

---------------
linux (4.10.0-26.30) zesty; urgency=low

  * linux: 4.10.0-26.30 -proposed tracker (LP: #1700528)

  * CVE-2017-1000364
    - Revert "UBUNTU: SAUCE: mm: Only expand stack if guard area is hit"
    - Revert "mm: do not collapse stack gap into THP"
    - Revert "mm: enlarge stack guard gap"
    - mm: larger stack guard gap, between vmas
    - mm: fix new crash in unmapped_area_topdown()
    - Allow stack to grow up to address space limit

linux (4.10.0-25.29) zesty; urgency=low

  * linux: 4.10.0-25.29 -proposed tracker (LP: #1699028)

  * CVE-2017-1000364
    - SAUCE: mm: Only expand stack if guard area is hit

  * CVE-2017-9074
    - ipv6: Prevent overrun when parsing v6 header options
    - ipv6: Check ip6_find_1stfragopt() return value properly.

  * [Zesty] QDF2400 ARM64 server - NMI watchdog: BUG: soft lockup - CPU#8 stuck
    for 22s! (LP: #1680549)
    - iommu/dma: Stop getting dma_32bit_pfn wrong
    - iommu/dma: Implement PCI allocation optimisation
    - iommu/dma: Convert to address-based allocation
    - iommu/dma: Clean up MSI IOVA allocation
    - iommu/dma: Plumb in the per-CPU IOVA caches
    - iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range

  * Zesty update to 4.10.17 stable release (LP: #1692898)
    - xen: adjust early dom0 p2m handling to xen hypervisor behavior
    - target: Fix compare_and_write_callback handling for non GOOD status
    - target/fileio: Fix zero-length READ and WRITE handling
    - iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
    - usb: xhci: bInterval quirk for TI TUSB73x0
    - usb: host: xhci: print correct command ring address
    - USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
    - USB: Proper handling of Race Condition when two USB class drivers try to
      call init_usb_class simultaneously
    - USB: Revert "cdc-wdm: fix "out-of-sync" due to missing notifications"
    - staging: vt6656: use off stack for in buffer USB transfers.
    - staging: vt6656: use off stack for out buffer USB transfers.
    - staging: gdm724x: gdm_mux: fix use-after-free on module unload
    - staging: wilc1000: Fix problem with wrong vif index
    - staging: comedi: jr3_pci: fix possible null pointer dereference
    - staging: comedi: jr3_pci: cope with jiffies wraparound
    - usb: misc: add missing continue in switch
    - usb: gadget: legacy gadgets are optional
    - usb: Make sure usb/phy/of gets built-in
    - usb: hub: Fix error loop seen after hub communication errors
    - usb: hub: Do not attempt to autosuspend disconnected devices
    - x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
    - selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug
    - x86, pmem: Fix cache flushing for iovec write < 8 bytes
    - um: Fix PTRACE_POKEUSER on x86_64
    - perf/x86: Fix Broadwell-EP DRAM RAPL events
    - KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
    - KVM: arm/arm64: fix races in kvm_psci_vcpu_on
    - arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses
    - block: fix blk_integrity_register to use templ...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
1 comments hidden view all 244 comments
Revision history for this message
Christos (christosmichaelas) wrote :

Will this fix be implemented in the downloadable daily images? I have downloaded the most recently modified (29/06/2017) and kernel 4.10.0-22 is still in use.

Thanks

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

When Artful (17.10) switch to 4.11, the daily image will also get new kernel.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.7 KiB)

This bug was fixed in the package linux - 4.10.0-26.30

---------------
linux (4.10.0-26.30) zesty; urgency=low

  * linux: 4.10.0-26.30 -proposed tracker (LP: #1700528)

  * CVE-2017-1000364
    - Revert "UBUNTU: SAUCE: mm: Only expand stack if guard area is hit"
    - Revert "mm: do not collapse stack gap into THP"
    - Revert "mm: enlarge stack guard gap"
    - mm: larger stack guard gap, between vmas
    - mm: fix new crash in unmapped_area_topdown()
    - Allow stack to grow up to address space limit

linux (4.10.0-25.29) zesty; urgency=low

  * linux: 4.10.0-25.29 -proposed tracker (LP: #1699028)

  * CVE-2017-1000364
    - SAUCE: mm: Only expand stack if guard area is hit

  * CVE-2017-9074
    - ipv6: Prevent overrun when parsing v6 header options
    - ipv6: Check ip6_find_1stfragopt() return value properly.

  * [Zesty] QDF2400 ARM64 server - NMI watchdog: BUG: soft lockup - CPU#8 stuck
    for 22s! (LP: #1680549)
    - iommu/dma: Stop getting dma_32bit_pfn wrong
    - iommu/dma: Implement PCI allocation optimisation
    - iommu/dma: Convert to address-based allocation
    - iommu/dma: Clean up MSI IOVA allocation
    - iommu/dma: Plumb in the per-CPU IOVA caches
    - iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range

  * Zesty update to 4.10.17 stable release (LP: #1692898)
    - xen: adjust early dom0 p2m handling to xen hypervisor behavior
    - target: Fix compare_and_write_callback handling for non GOOD status
    - target/fileio: Fix zero-length READ and WRITE handling
    - iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
    - usb: xhci: bInterval quirk for TI TUSB73x0
    - usb: host: xhci: print correct command ring address
    - USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
    - USB: Proper handling of Race Condition when two USB class drivers try to
      call init_usb_class simultaneously
    - USB: Revert "cdc-wdm: fix "out-of-sync" due to missing notifications"
    - staging: vt6656: use off stack for in buffer USB transfers.
    - staging: vt6656: use off stack for out buffer USB transfers.
    - staging: gdm724x: gdm_mux: fix use-after-free on module unload
    - staging: wilc1000: Fix problem with wrong vif index
    - staging: comedi: jr3_pci: fix possible null pointer dereference
    - staging: comedi: jr3_pci: cope with jiffies wraparound
    - usb: misc: add missing continue in switch
    - usb: gadget: legacy gadgets are optional
    - usb: Make sure usb/phy/of gets built-in
    - usb: hub: Fix error loop seen after hub communication errors
    - usb: hub: Do not attempt to autosuspend disconnected devices
    - x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
    - selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug
    - x86, pmem: Fix cache flushing for iovec write < 8 bytes
    - um: Fix PTRACE_POKEUSER on x86_64
    - perf/x86: Fix Broadwell-EP DRAM RAPL events
    - KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
    - KVM: arm/arm64: fix races in kvm_psci_vcpu_on
    - arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses
    - block: fix blk_integrity_register to use templ...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Artful should switch to linux kernel 4.11 pretty soon.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Ok, Artful live image (the pending one) uses 4.10.0-26.30 now. If you still can't boot, then it should be another issue.

14 comments hidden view all 244 comments
Revision history for this message
In , rootexpression (rootexpression-linux-kernel-bugs) wrote :

Using Gigabyte X370 K7 (this issue exists on all gigabyte B350 and X370 boards as far as I know).

I get this issue, and am completely unable to boot in Ubuntu 16.04, and have found reports of the same issue on 17.04. I can however install and boot in Manjaro (arch) just fine. The system runs fine mostly, but has what appears to be a memory leak related to vector 7 and xorg (a conflict possibly?). Issue persists in kernel 4.12 RC7 on arch base.

Found this: https://bugzilla.proxmox.com/show_bug.cgi?id=1366

Revision history for this message
In , rootexpression (rootexpression-linux-kernel-bugs) wrote :

I should note that at Gigabyte, they were able to get Ubuntu 16.04 to boot, and supposedly install on the F4 bios. I could not replicate their results.

Revision history for this message
In , edisonalvaringo (edisonalvaringo-linux-kernel-bugs) wrote :

I get this

irq 7: nobody cared (try booting with the "irqpoll" option)
Jul 15 19:17:29 linux-x5uw kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-2.g2399a91-default #1
Jul 15 19:17:29 linux-x5uw kernel: Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming/AB350-Gaming-CF, BIOS F5 06/19/2017
Jul 15 19:17:29 linux-x5uw kernel: Call Trace:
Jul 15 19:17:29 linux-x5uw kernel: <IRQ>
Jul 15 19:17:29 linux-x5uw kernel: dump_stack+0x5c/0x84
Jul 15 19:17:29 linux-x5uw kernel: __report_bad_irq+0x30/0xc0
Jul 15 19:17:29 linux-x5uw kernel: note_interrupt+0x23e/0x290
Jul 15 19:17:29 linux-x5uw kernel: handle_irq_event_percpu+0x41/0x50
Jul 15 19:17:29 linux-x5uw kernel: handle_irq_event+0x37/0x60
Jul 15 19:17:29 linux-x5uw kernel: handle_fasteoi_irq+0x95/0x160
Jul 15 19:17:29 linux-x5uw kernel: handle_irq+0x19/0x30
Jul 15 19:17:29 linux-x5uw kernel: do_IRQ+0x41/0xc0
Jul 15 19:17:29 linux-x5uw kernel: common_interrupt+0x8c/0x8c
Jul 15 19:17:29 linux-x5uw kernel: </IRQ>
Jul 15 19:17:29 linux-x5uw kernel: RIP: 0010:native_safe_halt+0x2/0x10
Jul 15 19:17:29 linux-x5uw kernel: RSP: 0018:ffffffffa8e03df8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc8
Jul 15 19:17:29 linux-x5uw kernel: RAX: 0000000000000000 RBX: ffff9cea5661a000 RCX: 0000000000000034
Jul 15 19:17:29 linux-x5uw kernel: RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa8ed9cc0 RDI: ffff9cea5661a064
Jul 15 19:17:29 linux-x5uw kernel: RBP: ffff9cea5661a064 R08: cccccccccccccccd R09: 0000000000000008
Jul 15 19:17:29 linux-x5uw kernel: R10: 000000000000000a R11: 000000000000000a R12: 0000000000000001
Jul 15 19:17:29 linux-x5uw kernel: R13: 0000000000000001 R14: 0000000000000001 R15: 00000004118ee7d0
Jul 15 19:17:29 linux-x5uw kernel: acpi_safe_halt.part.5+0xa/0x20
Jul 15 19:17:29 linux-x5uw kernel: acpi_idle_enter+0xf6/0x290
Jul 15 19:17:29 linux-x5uw kernel: cpuidle_enter_state+0xef/0x2e0
Jul 15 19:17:29 linux-x5uw kernel: do_idle+0x184/0x1e0
Jul 15 19:17:29 linux-x5uw kernel: cpu_startup_entry+0x5d/0x60
Jul 15 19:17:29 linux-x5uw kernel: start_kernel+0x422/0x42a
Jul 15 19:17:29 linux-x5uw kernel: ? early_idt_handler_array+0x120/0x120
Jul 15 19:17:29 linux-x5uw kernel: x86_64_start_kernel+0x12c/0x13b
Jul 15 19:17:29 linux-x5uw kernel: secondary_startup_64+0x9f/0x9f
Jul 15 19:17:29 linux-x5uw kernel: handlers:
Jul 15 19:17:29 linux-x5uw kernel: [<ffffffffc0604030>] amd_gpio_irq_handler [pinctrl_amd]
Jul 15 19:17:29 linux-x5uw kernel: Disabling IRQ #7

Revision history for this message
In , rootexpression (rootexpression-linux-kernel-bugs) wrote :

(In reply to edisonalvaringo from comment #14)
> I get this
>
> irq 7: nobody cared (try booting with the "irqpoll" option)
> Jul 15 19:17:29 linux-x5uw kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.12.0-2.g2399a91-default #1
> Jul 15 19:17:29 linux-x5uw kernel: Hardware name: Gigabyte Technology Co.,
> Ltd. AB350-Gaming/AB350-Gaming-CF, BIOS F5 06/19/2017
> Jul 15 19:17:29 linux-x5uw kernel: Call Trace:
> Jul 15 19:17:29 linux-x5uw kernel: <IRQ>
> Jul 15 19:17:29 linux-x5uw kernel: dump_stack+0x5c/0x84
> Jul 15 19:17:29 linux-x5uw kernel: __report_bad_irq+0x30/0xc0
> Jul 15 19:17:29 linux-x5uw kernel: note_interrupt+0x23e/0x290
> Jul 15 19:17:29 linux-x5uw kernel: handle_irq_event_percpu+0x41/0x50
> Jul 15 19:17:29 linux-x5uw kernel: handle_irq_event+0x37/0x60
> Jul 15 19:17:29 linux-x5uw kernel: handle_fasteoi_irq+0x95/0x160
> Jul 15 19:17:29 linux-x5uw kernel: handle_irq+0x19/0x30
> Jul 15 19:17:29 linux-x5uw kernel: do_IRQ+0x41/0xc0
> Jul 15 19:17:29 linux-x5uw kernel: common_interrupt+0x8c/0x8c
> Jul 15 19:17:29 linux-x5uw kernel: </IRQ>
> Jul 15 19:17:29 linux-x5uw kernel: RIP: 0010:native_safe_halt+0x2/0x10
> Jul 15 19:17:29 linux-x5uw kernel: RSP: 0018:ffffffffa8e03df8 EFLAGS:
> 00000246 ORIG_RAX: ffffffffffffffc8
> Jul 15 19:17:29 linux-x5uw kernel: RAX: 0000000000000000 RBX:
> ffff9cea5661a000 RCX: 0000000000000034
> Jul 15 19:17:29 linux-x5uw kernel: RDX: 4ec4ec4ec4ec4ec5 RSI:
> ffffffffa8ed9cc0 RDI: ffff9cea5661a064
> Jul 15 19:17:29 linux-x5uw kernel: RBP: ffff9cea5661a064 R08:
> cccccccccccccccd R09: 0000000000000008
> Jul 15 19:17:29 linux-x5uw kernel: R10: 000000000000000a R11:
> 000000000000000a R12: 0000000000000001
> Jul 15 19:17:29 linux-x5uw kernel: R13: 0000000000000001 R14:
> 0000000000000001 R15: 00000004118ee7d0
> Jul 15 19:17:29 linux-x5uw kernel: acpi_safe_halt.part.5+0xa/0x20
> Jul 15 19:17:29 linux-x5uw kernel: acpi_idle_enter+0xf6/0x290
> Jul 15 19:17:29 linux-x5uw kernel: cpuidle_enter_state+0xef/0x2e0
> Jul 15 19:17:29 linux-x5uw kernel: do_idle+0x184/0x1e0
> Jul 15 19:17:29 linux-x5uw kernel: cpu_startup_entry+0x5d/0x60
> Jul 15 19:17:29 linux-x5uw kernel: start_kernel+0x422/0x42a
> Jul 15 19:17:29 linux-x5uw kernel: ? early_idt_handler_array+0x120/0x120
> Jul 15 19:17:29 linux-x5uw kernel: x86_64_start_kernel+0x12c/0x13b
> Jul 15 19:17:29 linux-x5uw kernel: secondary_startup_64+0x9f/0x9f
> Jul 15 19:17:29 linux-x5uw kernel: handlers:
> Jul 15 19:17:29 linux-x5uw kernel: [<ffffffffc0604030>] amd_gpio_irq_handler
> [pinctrl_amd]
> Jul 15 19:17:29 linux-x5uw kernel: Disabling IRQ #7

what kernel is that on? what distro, etc? havn't seen that before.

Revision history for this message
In , edisonalvaringo (edisonalvaringo-linux-kernel-bugs) wrote :

Opensuse 42.2.
linux-4.11.0-2.g057f66f from Opensuse tumbleweed repos.

Revision history for this message
In , edisonalvaringo (edisonalvaringo-linux-kernel-bugs) wrote :

booting with irqpoll it make thing much worse and spams dmesg with "lost hpet rtc xxxx interrupts"

Revision history for this message
In , edisonalvaringo (edisonalvaringo-linux-kernel-bugs) wrote :
Download full text (4.0 KiB)

I don't know if it is related but I'm having trouble resuming from hibernation sometimes the usb ports will stop working .

I get this message also

[ 877.231395] ------------[ cut here ]------------
[ 877.231395] WARNING: CPU: 0 PID: 4060 at ../arch/x86/kernel/cpu/mcheck/mce_amd.c:191 mce_amd_feature_init+0x27d/0x2c0
[ 877.231396] Modules linked in: ccm dm_crypt dm_mod loop ppdev parport_pc parport vmw_vsock_vmci_transport vsock vmw_vmci nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4 iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack libcrc32c ip6table_filter ip6_tables x_tables it87 hwmon_vid joydev fuse snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec edac_mce_amd snd_hda_core snd_hwdep snd_pcm snd_timer kvm i2c_designware_platform snd irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ath9k ath9k_common ath9k_hw ath mac80211
[ 877.231405] cfg80211 ghash_clmulni_intel rfkill r8169 mii pcbc soundcore ccp pinctrl_amd shpchp aesni_intel i2c_piix4 aes_x86_64 acpi_cpufreq tpm_infineon tpm_tis tpm_tis_core tpm gpio_amdpt gpio_generic pcspkr crypto_simd glue_helper cryptd button i2c_designware_core wmi arc4 ppp_mppe hid_generic usbhid bcache amdgpu i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm xhci_pci xhci_hcd sr_mod drm cdrom usbcore msr sg ppp_generic slhc efivarfs
[ 877.231412] CPU: 0 PID: 4060 Comm: systemd-sleep Tainted: G W 4.12.0-2.g2399a91-default #1
[ 877.231413] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming/AB350-Gaming-CF, BIOS F5 06/19/2017
[ 877.231413] task: ffff995671398000 task.stack: ffffbc5a81a0c000
[ 877.231414] RIP: 0010:mce_amd_feature_init+0x27d/0x2c0
[ 877.231414] RSP: 0018:ffffbc5a81a0fd08 EFLAGS: 00010096
[ 877.231415] RAX: 0000000000000028 RBX: 0000000000000005 RCX: ffffffff98e5abe8
[ 877.231415] RDX: ffffffff98e5abe8 RSI: 0000000000000096 RDI: 0000000000000002
[ 877.231415] RBP: 00000000c000205d R08: 0000000000000000 R09: 0000000000000028
[ 877.231416] R10: ffffbc5a81a0fc80 R11: 0000000000000000 R12: 0000000000000006
[ 877.231416] R13: 0000000000000000 R14: 0000000000000000 R15: ffffbc5a81a0fd24
[ 877.231417] FS: 00007f96b749e740(0000) GS:ffff9956d6600000(0000) knlGS:0000000000000000
[ 877.231417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 877.231417] CR2: 00007f420b571926 CR3: 0000000213bc5000 CR4: 00000000003406f0
[ 877.231418] Call Trace:
[ 877.231419] mce_syscore_resume+0x1e/0x30
[ 877.231420] syscore_resume+0x4b/0x1a0
[ 877.231421] suspend_devices_and_enter+0x488/0x7e0
[ 877.231422] pm_suspend+0x31a/0x390
[ 877.231423] state_store+0x42/0x90
[ 877.231424] kernfs_fop_write+0xfa/0x180
[ 877.231424] __vfs_write+0x23/0x140
[ 877.231426] ? __getnstimeofday64+0x3b/0xd0
[ 877.231427] ? getnstimeofday64+0xa/0x20
[ 877.231427] ? __audit_syscall_entry+0xab/0x100
[ 877.231428] ? security_file_permission+0x36/0xb0
[ 877.231429] v...

Read more...

Revision history for this message
In , edisonalvaringo (edisonalvaringo-linux-kernel-bugs) wrote :

Should I open a new bug? the irq problem is similar on pinctrl_amd, if compile a kernel without pinctrl there are no strange messages.

Revision history for this message
In , ldkxingzhe (ldkxingzhe-linux-kernel-bugs) wrote :

I got the same problem. Arch, kernel: 4.13, CPU: Ryzen 7 1700, motherboard: B350 Plus.

Revision history for this message
In , rootexpression (rootexpression-linux-kernel-bugs) wrote :

https://www.phoronix.com/scan.php?page=news_item&px=Ryzen-Segv-Response

Perhaps this is related to the hardware bug from that first batch of processors.

Just don't want to see you wasting your time fixing something with software that you can't fix with software.

Revision history for this message
In , ldkxingzhe (ldkxingzhe-linux-kernel-bugs) wrote :

maybe. but my win10 is fine.

Revision history for this message
In , rootexpression (rootexpression-linux-kernel-bugs) wrote :

(In reply to ldk from comment #22)
> maybe. but my win10 is fine.

Uh huh. Please read up on the hardware bug.

"AMD engineers found the problem to be very complex and characterize it as a performance marginality problem exclusive to certain workloads on Linux."

24 comments hidden view all 244 comments
Revision history for this message
LCID Fire (lcid-fire) wrote :

I encountered a similar or same problem on Gigabyte AB350 with Ubuntu 16.04 running `4.13.0-21-generic #24~16.04.1-Ubuntu SMP Mon Dec 18 19:39:31 UTC 2017`. It even shows up on 18.04 Alpha.
Only thing that makes it halfway work is setting `acpi=off`.

Isn't this supposed to be fixed by now?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote : Re: [Bug 1671360] Re: System doesn't boot properly on Gigabyte AM4 motherboards (AMD Ryzen)
Download full text (4.6 KiB)

> On 25 Dec 2017, at 4:44 AM, LCID Fire <email address hidden> wrote:
>
> I encountered a similar or same problem on Gigabyte AB350 with Ubuntu 16.04 running `4.13.0-21-generic #24~16.04.1-Ubuntu SMP Mon Dec 18 19:39:31 UTC 2017`. It even shows up on 18.04 Alpha.
> Only thing that makes it halfway work is setting `acpi=off`.
>
> Isn't this supposed to be fixed by now?

Please file a new bug with log attached.

>
> --
> You received this bug notification because you are a member of AMD Team,
> which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1671360
>
> Title:
> System doesn't boot properly on Gigabyte AM4 motherboards (AMD Ryzen)
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Fix Released
> Status in linux source package in Zesty:
> Fix Released
> Status in linux source package in Artful:
> Fix Released
>
> Bug description:
> [Impact]
> Gigabyte AM4 boards users cannot boot Ubuntu successfully.
> Commit linux-gpio/fixes babdc22b0ccf4ef5a3075ce6e4afc26b7a279faf "pinctrl/amd: Use regular interrupt instead of chained" can fix the issue.
>
> [Test Case]
> All Gigabyte AM4 boards can reproduce the issue.
> With the patch, the issue is resolved, per comment #170.
>
> [Regression Potential]
> Regression Potential is low. It limits to rather new AMD platform which has pinctrl-amd.
> As the commit log says, use chained interrupt is not a good idea. Use regular interrupt is the correct way.
>
> I also test the patch on an AMD laptop, where its touchpad depends on
> pinctrl-amd. No regression found.
>
> Original bug report:
> I'm trying to run ubuntu on Ryzen 1700x with Gigabyte GA-AB350-gaming-3 motherboard,
> and it has a load of problems, starting with not being able to boot normally.
>
> During normal boot, on 16.10 as well as 17.04 beta:
> system doesn't boot normally, hangs with a lot of "unexpected irq trap at vector 07"
> messages displayed.
>
> Following advice from various places, I've tried:disable cpu freq
> governor and cpu handling in acpi settings
>
> 1. add "acpi=off" to boot params
>
> That helps, allowing me to boot into recovery mode, though it leaves
> me with system seeing only one core, is really slow and still only
> boots in recovery mode.
>
> 2. Compile own kernel using 4.11.rc1 and disabling cpu freq governor
> and cpu handling in acpi settings. Boot with "quiet loglevel=3"
> option.
>
> That gets me even further - system sees all cores now. Still only recovery mode though,
> but its enough to get info for this bug report.
>
> Some observed problems:
>
> 1. dmesg reports *a lot* of messages like this all the time:
>
> [ 163.362068] ->handle_irq(): ffffffff87a7e090,
> [ 163.362081] bad_chained_irq+0x0/0x40
> [ 163.362089] ->handle_irq(): ffffffff87a7e090,
> [ 163.362090] amd_gpio_irq_handler+0x0/0x200
> [ 163.362090] ->irq_data.chip(): ffffffff88587e20,
> [ 163.362090] ioapic_ir_chip+0x0/0x120
> [ 163.362090] ->action(): ffffffff884601c0
> [ 163.362091] IRQ_NOPROBE set
> [ 163.362099] ->handle_irq(): ffffffff87a7e090,
> [ 163.362099] amd_gpio_irq_handler+0x0/0x200
> [ 163.362100] ->irq_data....

Read more...

24 comments hidden view all 244 comments
Revision history for this message
In , jpujades (jpujades-linux-kernel-bugs) wrote :

Similar problem with Acer TravelMate B117-M (SSD disk).
Upgraded the BIOS system from 1.07 to 1.13 and 1.15 (the latest)
Lubuntu 16.04 LTS HWE 64 bit

System works fine with 4.10.0-42
System shows the messages at start-up when using 4.13.0-26 or 4.13.0-31 kernels.

Curiously, if I turn off the system, connect/disconnect the battery (battery reset hole at the bottom of the laptop) and turn on the system, NO errors.

At next boot, errors reappear.

However, the system seems to work without problems, despite the start-up IRQ errors. I have to test more. I don't know if I will have performance problems or other kind of troubles.

Revision history for this message
In , jpujades (jpujades-linux-kernel-bugs) wrote :

It seems to be related with https://bugzilla.kernel.org/show_bug.cgi?id=194945#c73

I can see also bad interrupts for chv-gpio

No more IRQ errors with kernel 4.14.15-041415-generic

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14.15/

Revision history for this message
In , jpujades (jpujades-linux-kernel-bugs) wrote :

"System shows the messages at start-up when using 4.13.0-26 or 4.13.0-31 kernels." I'm sorry, the messages were:

unexpected irq trap at vector 73

Tested on HP Convertible x360 11-ab0XX with same results. No more messages about IRQ errors with kernel 4.14.15-041415-generic

Revision history for this message
In , rootexpression (rootexpression-linux-kernel-bugs) wrote :

(In reply to Josep Pujadas-Jubany from comment #24)
> Similar problem with Acer TravelMate B117-M (SSD disk).
> Upgraded the BIOS system from 1.07 to 1.13 and 1.15 (the latest)
> Lubuntu 16.04 LTS HWE 64 bit
>
> System works fine with 4.10.0-42
> System shows the messages at start-up when using 4.13.0-26 or 4.13.0-31
> kernels.
>
> Curiously, if I turn off the system, connect/disconnect the battery (battery
> reset hole at the bottom of the laptop) and turn on the system, NO errors.
>
> At next boot, errors reappear.
>
> However, the system seems to work without problems, despite the start-up IRQ
> errors. I have to test more. I don't know if I will have performance
> problems or other kind of troubles.

This bug report here is for AMD Ryzen cpus. You'll need to find or make a bug report for your laptop. The bug discussed here is specific to only AMD Ryzen CPUs.

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Changed in linux:
status: Confirmed → Fix Released
Displaying first 40 and last 40 comments. View all 244 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.