kernel crashes during boot unless IOMMU is disabled on Ryzen 1800X

Bug #1747463 reported by Peridot on 2018-02-05
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Medium
Joseph Salisbury
Bionic
Medium
Joseph Salisbury
Cosmic
Medium
Joseph Salisbury

Bug Description

I'm on a Ryzen 1800X and Biostar B350GT5 on bionic kubuntu.

There are lots of AMD-Vi logged events and I get irq crashes or acpi hangups with a 'normal' boot. I got it to boot by disabling IOMMU in the BIOS and adding "iommu=soft" to the kernel booting options in grub.

linux can then detect everything properly (all cores) and I've had zero crashes. The only issue is that it's using software IOMMU which could have a performance penalty because it has to copy all the data of some PCI devices to sub 4G regions.

Alternatively it boots with the kernel option "acpi=off" but only detects a single core/thread.

I attached a kernel log.

I believe(d) this might be related to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1671360
and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085
---
ApportVersion: 2.20.8-0ubuntu8
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: fixme 1487 F.... pulseaudio
 /dev/snd/controlC0: fixme 1487 F.... pulseaudio
CurrentDesktop: KDE
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=UUID=bc971fcc-8e63-4fa5-a149-af4af6c8eece
InstallationDate: Installed on 2018-01-31 (4 days ago)
InstallationMedia: Kubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180131)
IwConfig:
 lo no wireless extensions.

 enp3s0 no wireless extensions.
MachineType: BIOSTAR Group B350GT5
Package: linux (not installed)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-32-generic.efi.signed root=/dev/mapper/kubuntu--vg-root ro iommu=soft quiet splash vt.handoff=1
ProcVersionSignature: Ubuntu 4.13.0-32.35-generic 4.13.13
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-32-generic N/A
 linux-backports-modules-4.13.0-32-generic N/A
 linux-firmware 1.170
RfKill:

Tags: bionic
Uname: Linux 4.13.0-32-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 11/30/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 5.13
dmi.board.asset.tag: None
dmi.board.name: B350GT5
dmi.board.vendor: BIOSTAR Group
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr5.13:bd11/30/2017:svnBIOSTARGroup:pnB350GT5:pvr:rvnBIOSTARGroup:rnB350GT5:rvr:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: None
dmi.product.name: B350GT5
dmi.sys.vendor: BIOSTAR Group

Peridot (peridot) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1747463

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Peridot (peridot) on 2018-02-05
Changed in linux (Ubuntu):
status: Incomplete → Confirmed

apport information

tags: added: apport-collected bionic
description: updated
Peridot (peridot) wrote : CRDA.txt

apport information

apport information

apport information

Peridot (peridot) wrote : Lspci.txt

apport information

Peridot (peridot) wrote : Lsusb.txt

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Peridot (peridot) wrote : UdevDb.txt

apport information

apport information

Peridot (peridot) on 2018-02-07
summary: - kernel crashes unless IOMMU is disabled on Ryzen 1800X
+ kernel crashes during boot unless IOMMU is disabled on Ryzen 1800X
description: updated
Peridot (peridot) on 2018-02-07
description: updated
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Peridot (peridot) wrote :

this is the result of booting with the upstream kernel with IOMMU turned on in the bios

Peridot (peridot) wrote :

When I exit the above pictured initram prompt I get a full crash.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Peridot (peridot) wrote :

I found it also boots with IOMMU turned on in the bios as long as you set iommu=soft, both with the ubuntu kernel and the mainline kernel.

Kai-Heng Feng (kaihengfeng) wrote :

Is AMD-V enabled in BIOS?

Peridot (peridot) wrote :

AMD-v was not enabled but with it enabled booting without iommu=soft results in a crash that only lasts about a second and then loses HDMI connection. This is with the Ubuntu Kernel

Peridot (peridot) wrote :

And with the mainline kernel

Peridot (peridot) wrote :

Though with iommu=soft it boots on both kernels with IOMMU and AMD-v enabled in the BIOS

Kai-Heng Feng (kaihengfeng) wrote :

Looks like there's a patch but not upstreamed yet,
https://patchwork.freedesktop.org/patch/157327/

Peridot (peridot) wrote :

The upstream bug https://bugs.freedesktop.org/show_bug.cgi?id=101029 makes reference to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683184 which is marked as fix released, just a heads up.

Kai-Heng Feng (kaihengfeng) wrote :

I don't see the patch is in upstream Linux though, so still worth a try.

Peridot (peridot) wrote :

I agree, I just meant that that bug might also be useful with debugging. It was closed because zesty reached EOL.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
status: New → Triaged
Changed in linux (Ubuntu Cosmic):
status: Confirmed → Triaged
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

The patch mentioned in the upstream bug report and comment #25 never landed in mainline. I tried to apply it to Bionic, but it does not apply cleanly. I'll work on back porting it. I'll post a test kernel shortly. We can then update upstream with testing results.

Joseph Salisbury (jsalisbury) wrote :

Can you see if this bug still exists in current mainline:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc5/

Hello Arindam,

There is a bug report[0] that you created a patch[1] for a while back.
However, the patch never landed in mainline.  There is a bug reporter in
Ubuntu[2] that is affected by this bug and is willing to test the
patch.  I attempted to build a test kernel with the patch, but it does
not apply to currently mainline cleanly.  Do you still think this patch
may resolve this bug?  If so, is there a version of your patch available
that will apply to current mainline?

Thanks,

Joe

[0] https://bugs.freedesktop.org/show_bug.cgi?id=101029
[1] https://patchwork.freedesktop.org/patch/157327/
[2] http://pad.lv/1747463

Arindam Nath (arindam-nath) wrote :

Adding Tom.

Hi Joe,

My original patch was never accepted. Tom and Joerg worked on another patch series which was supposed to fix the issue in question in addition to do some code cleanups. I believe their patches are already in the mainline. If I remember correctly, one of the patches disabled PCI ATS for the graphics card which was causing the issue.

Do you still see the issue with latest mainline kernel?

BR,
Arindam

-----Original Message-----
From: Joseph Salisbury [mailto:<email address hidden>]
Sent: Tuesday, May 15, 2018 1:17 AM
To: Nath, Arindam <email address hidden>
Cc: <email address hidden>; Bridgman, John <email address hidden>; joro@8bytes.org; <email address hidden>; <email address hidden>; <email address hidden>; Suthikulpanit, Suravee <email address hidden>; Deucher, Alexander <email address hidden>; Kuehling, Felix <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>
Subject: iommu/amd: flush IOTLB for specific domains only (v2)

Hello Arindam,

There is a bug report[0] that you created a patch[1] for a while back. However, the patch never landed in mainline.  There is a bug reporter in Ubuntu[2] that is affected by this bug and is willing to test the patch.  I attempted to build a test kernel with the patch, but it does not apply to currently mainline cleanly.  Do you still think this patch may resolve this bug?  If so, is there a version of your patch available that will apply to current mainline?

Thanks,

Joe

[0] https://bugs.freedesktop.org/show_bug.cgi?id=101029
[1] https://patchwork.freedesktop.org/patch/157327/
[2] http://pad.lv/1747463

Joseph Salisbury (jsalisbury) wrote :

On 05/15/2018 04:03 AM, Nath, Arindam wrote:
> Adding Tom.
>
> Hi Joe,
>
> My original patch was never accepted. Tom and Joerg worked on another patch series which was supposed to fix the issue in question in addition to do some code cleanups. I believe their patches are already in the mainline. If I remember correctly, one of the patches disabled PCI ATS for the graphics card which was causing the issue.
>
> Do you still see the issue with latest mainline kernel?
>
> BR,
> Arindam
>
> -----Original Message-----
> From: Joseph Salisbury [mailto:<email address hidden>]
> Sent: Tuesday, May 15, 2018 1:17 AM
> To: Nath, Arindam <email address hidden>
> Cc: <email address hidden>; Bridgman, John <email address hidden>; joro@8bytes.org; <email address hidden>; <email address hidden>; <email address hidden>; Suthikulpanit, Suravee <email address hidden>; Deucher, Alexander <email address hidden>; Kuehling, Felix <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>
> Subject: iommu/amd: flush IOTLB for specific domains only (v2)
>
> Hello Arindam,
>
> There is a bug report[0] that you created a patch[1] for a while back. However, the patch never landed in mainline.  There is a bug reporter in Ubuntu[2] that is affected by this bug and is willing to test the patch.  I attempted to build a test kernel with the patch, but it does not apply to currently mainline cleanly.  Do you still think this patch may resolve this bug?  If so, is there a version of your patch available that will apply to current mainline?
>
> Thanks,
>
> Joe
>
> [0] https://bugs.freedesktop.org/show_bug.cgi?id=101029
> [1] https://patchwork.freedesktop.org/patch/157327/
> [2] http://pad.lv/1747463
>
Hi Arindam,

Thanks for the feedback.  Yes, the latest mainline kernel was tested,
and it is reported the bug still happens in the Ubuntu kernel bug[0].
Is there any specific diagnostic info we can collect that might help?

Thanks,

Joe

[0] http://pad.lv/1747463

Arindam Nath (arindam-nath) wrote :

> -----Original Message-----
> From: Joseph Salisbury [mailto:<email address hidden>]
> Sent: Tuesday, May 15, 2018 5:40 PM
> To: Nath, Arindam <email address hidden>
> Cc: <email address hidden>; Bridgman, John
> <email address hidden>; joro@8bytes.org; amd-
> <email address hidden>; <email address hidden>; <email address hidden>;
> Suthikulpanit, Suravee <email address hidden>; Deucher,
> Alexander <email address hidden>; Kuehling, Felix
> <email address hidden>; <email address hidden>; <email address hidden>;
> <email address hidden>; Lendacky, Thomas
> <email address hidden>
> Subject: Re: iommu/amd: flush IOTLB for specific domains only (v2)
>
> On 05/15/2018 04:03 AM, Nath, Arindam wrote:
> > Adding Tom.
> >
> > Hi Joe,
> >
> > My original patch was never accepted. Tom and Joerg worked on another
> patch series which was supposed to fix the issue in question in addition to do
> some code cleanups. I believe their patches are already in the mainline. If I
> remember correctly, one of the patches disabled PCI ATS for the graphics
> card which was causing the issue.
> >
> > Do you still see the issue with latest mainline kernel?
> >
> > BR,
> > Arindam
> >
> > -----Original Message-----
> > From: Joseph Salisbury [mailto:<email address hidden>]
> > Sent: Tuesday, May 15, 2018 1:17 AM
> > To: Nath, Arindam <email address hidden>
> > Cc: <email address hidden>; Bridgman, John
> > <email address hidden>; joro@8bytes.org;
> > <email address hidden>; <email address hidden>;
> <email address hidden>;
> > Suthikulpanit, Suravee <email address hidden>; Deucher,
> > Alexander <email address hidden>; Kuehling, Felix
> > <email address hidden>; <email address hidden>; <email address hidden>;
> > <email address hidden>
> > Subject: iommu/amd: flush IOTLB for specific domains only (v2)
> >
> > Hello Arindam,
> >
> > There is a bug report[0] that you created a patch[1] for a while back.
> However, the patch never landed in mainline.  There is a bug reporter in
> Ubuntu[2] that is affected by this bug and is willing to test the patch.  I
> attempted to build a test kernel with the patch, but it does not apply to
> currently mainline cleanly.  Do you still think this patch may resolve this
> bug?  If so, is there a version of your patch available that will apply to current
> mainline?
> >
> > Thanks,
> >
> > Joe
> >
> > [0] https://bugs.freedesktop.org/show_bug.cgi?id=101029
> > [1] https://patchwork.freedesktop.org/patch/157327/
> > [2] http://pad.lv/1747463
> >
> Hi Arindam,
>
> Thanks for the feedback.  Yes, the latest mainline kernel was tested, and it is
> reported the bug still happens in the Ubuntu kernel bug[0]. Is there any
> specific diagnostic info we can collect that might help?

Joe, I believe all the information needed is already provided in [2]. Let us wait for inputs from Tom and Joerg.

I could take a look at the issue locally, but it will take me some really long time since I am occupied with other assignments right now.

BR,
Arindam

>
> Thanks,
>
> Joe
>
> [0] http://pad.lv/1747463

Joseph Salisbury (jsalisbury) wrote :
Download full text (3.8 KiB)

On 05/15/2018 09:08 AM, Tom Lendacky wrote:
> On 5/15/2018 7:34 AM, Nath, Arindam wrote:
>>
>>> -----Original Message-----
>>> From: Joseph Salisbury [mailto:<email address hidden>]
>>> Sent: Tuesday, May 15, 2018 5:40 PM
>>> To: Nath, Arindam <email address hidden>
>>> Cc: <email address hidden>; Bridgman, John
>>> <email address hidden>; joro@8bytes.org; amd-
>>> <email address hidden>; <email address hidden>; <email address hidden>;
>>> Suthikulpanit, Suravee <email address hidden>; Deucher,
>>> Alexander <email address hidden>; Kuehling, Felix
>>> <email address hidden>; <email address hidden>; <email address hidden>;
>>> <email address hidden>; Lendacky, Thomas
>>> <email address hidden>
>>> Subject: Re: iommu/amd: flush IOTLB for specific domains only (v2)
>>>
>>> On 05/15/2018 04:03 AM, Nath, Arindam wrote:
>>>> Adding Tom.
>>>>
>>>> Hi Joe,
>>>>
>>>> My original patch was never accepted. Tom and Joerg worked on another
>>> patch series which was supposed to fix the issue in question in addition to do
>>> some code cleanups. I believe their patches are already in the mainline. If I
>>> remember correctly, one of the patches disabled PCI ATS for the graphics
>>> card which was causing the issue.
>>>> Do you still see the issue with latest mainline kernel?
>>>>
>>>> BR,
>>>> Arindam
>>>>
>>>> -----Original Message-----
>>>> From: Joseph Salisbury [mailto:<email address hidden>]
>>>> Sent: Tuesday, May 15, 2018 1:17 AM
>>>> To: Nath, Arindam <email address hidden>
>>>> Cc: <email address hidden>; Bridgman, John
>>>> <email address hidden>; joro@8bytes.org;
>>>> <email address hidden>; <email address hidden>;
>>> <email address hidden>;
>>>> Suthikulpanit, Suravee <email address hidden>; Deucher,
>>>> Alexander <email address hidden>; Kuehling, Felix
>>>> <email address hidden>; <email address hidden>; <email address hidden>;
>>>> <email address hidden>
>>>> Subject: iommu/amd: flush IOTLB for specific domains only (v2)
>>>>
>>>> Hello Arindam,
>>>>
>>>> There is a bug report[0] that you created a patch[1] for a while back.
>>> However, the patch never landed in mainline.  There is a bug reporter in
>>> Ubuntu[2] that is affected by this bug and is willing to test the patch.  I
>>> attempted to build a test kernel with the patch, but it does not apply to
>>> currently mainline cleanly.  Do you still think this patch may resolve this
>>> bug?  If so, is there a version of your patch available that will apply to current
>>> mainline?
>>>> Thanks,
>>>>
>>>> Joe
>>>>
>>>> [0] https://bugs.freedesktop.org/show_bug.cgi?id=101029
>>>> [1] https://patchwork.freedesktop.org/patch/157327/
>>>> [2] http://pad.lv/1747463
>>>>
>>> Hi Arindam,
>>>
>>> Thanks for the feedback.  Yes, the latest mainline kernel was tested, and it is
>>> reported the bug still happens in the Ubuntu kernel bug[0]. Is there any
>>> specific diagnostic info we can collect that might help?
>> Joe, I believe all the information needed is already provided in [2]. Let us wait for inputs from Tom and Joerg.
>>
>> I could take a look at the issue locally, but it will take me some really long time since I am occupied with oth...

Read more...

Joseph Salisbury (jsalisbury) wrote :

@Peridot, I know you responded that the current mainline kernel still exhibits the bug on IRC. However, could you also add that test result to this bug report for upstream tracking?

Peridot (peridot) wrote :

I tested with 4.17 rc4 and the problem persists

Joseph Salisbury (jsalisbury) wrote :

@Peridot,
Request from Upstream:

For the original 4.13 kernel, I don't
see any attachments that have the AMD-Vi messages in question. Were they
completion timeouts (like in the later mainline kernel test, which I'll
get to in a bit) or I/O page fault messages? Without that information it
is hard to determine what the issue really is.

(Just as an FYI, if the IOMMU is disabled in BIOS, then iommu=soft is not
 necessary on the kernel command line).

For the upstream kernel test, since this is a Ryzen system, it's possible
that the BIOS does not have a requisite fix for SME and IOMMU (see [1]).
On the upstream kernel, if memory encryption is active by default without
this BIOS fix, then the result is AMD-Vi completion-wait timeout messages.
Try booting with mem_encrypt=off on the kernel command line or build a
kernel with CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n and see if that
allows the kernel to boot.

Thanks,
Tom

[1] https://bugzilla.kernel.org/show_bug.cgi?id=199513

Changed in linux (Ubuntu Bionic):
status: Triaged → Incomplete
Changed in linux (Ubuntu Cosmic):
status: Triaged → Incomplete
Peridot (peridot) wrote :

The attached screenshots are the result of booting 4.18 rc1 kernel

Peridot (peridot) wrote :

4.18 rc1

Peridot (peridot) wrote :

4.18 rc1 + mem_encrypt=off

Peridot (peridot) wrote :

4.18 rc1 + mem_encrypt=off

Peridot (peridot) wrote :

4.18 rc1 + mem_encrypt=off

Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Cosmic):
status: Incomplete → Confirmed
Peridot (peridot) wrote :

All the tests were done with iommu turned off and SVE turned on in the bios, and it does not boot without iommu=soft

When booting with SVE and IOMMU enabled in the bios I got an endless screen of text and I couldn't make up anything from it.

booting with iommu turned on in the kernel and SVE turned off gives the screenshot below

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.