Kernel crash during boot with IOMMU - on Dual GPU AMD system using Ryzen 7 1700

Bug #1823547 reported by calcatinge
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have this issue with a Ryzen 7 1700 with the following setups:

Setup 1:
Motherboard: Gigabyte B450 Aorus M (Latest BIOS installed)
RAM: 32 GB DDR4 Crucial 3000 Mhz
GPU1: Sapphire Radeon RX 580 Nitro+ Special Edition 8GB GDDR5
GPU2: Asus Mining Radeon RX 470 4G GDDR5
SSD: 240 GB WD Green

Setup 2:
Motherboard: ASRock AB350M Pro4 (Latest BIOS installed)
RAM: 32 GB DDR4 Crucial 3000 Mhz
GPU1: Sapphire Radeon RX 580 Nitro+ Special Edition 8GB GDDR5
GPU2: Sapphire Radeon RX 550 2G GDDR5
SSD: 240 GB WD Green

The issue first started after installing the second GPU on each of the systems. I can't even boot the system, as the errors appear right after BIOS initialization.

The errors are like:

[exerpt]
AMD-Vi: Completion-Wait loop timed out
AMD-Vi: Completion-Wait loop timed out
AMD-Vi: Completion-Wait loop timed out

...

AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb7560]

Entering emergecy mode. Exit the shell to continue.
Type "journalctl" to view system logs.

....

AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb75d0]
AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb7700]
AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb7630]
AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb7660]
AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb7690]
AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb76c0]
AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=22:00.0 address=0x0000000174bb76f0]

...

AMD-Vi: Completion-Wait loop timed out
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.3 domain=0x000 address=0x00000000fffd3990 flags=0x0070]

...

Buffer I/O error on dev dm-0, logical block 104857472, async page read
Buffer I/O error on dev dm-0, logical block 104857473, async page read
Buffer I/O error on dev dm-0, logical block 104857474, async page read
Buffer I/O error on dev dm-0, logical block 104857475, async page read
Buffer I/O error on dev dm-0, logical block 104857476, async page read
Buffer I/O error on dev dm-0, logical block 104857477, async page read

.....

I can't even install any new system, because the errors are the same...

After some research I discovered that it is a IOMMU issue.
I turned IOMMU off on both motherboards, and I managed to boot the system.

I have been trying with CentOS 7, Ubuntu 18.04.2 and Fedora 29.

I am an architect and I use blender for GPU rendering (this is the idea of having two cards), but the AMDGPU-Pro driver from the AMD's website (the only one that Blender uses for GPU rendering, as it can't use the open one) affects the way Gnome works on Xorg. In Ubuntu 18.04 Gnome doesn't start at all, it hangs up stating an error about snapd. Even after purging snapd, it still doesn't start... Fedora 29 is the only one that works, but the official AMDGPU-Pro don't work in it, thus I don't have the GPU rendering available.

Any ideas? Two years have passed, or probably even more, and this IOMMU issues on AMD are yet not resolved... I have two different chipsets on my MBs (B350 and B450) and the issues are still there...

Please help. Last resort is to start using Windows, after 15 years of using only Linux, but I don't really find this very appealing...

Thanks.

Revision history for this message
calcatinge (calcatinge) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1823547

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
calcatinge (calcatinge) wrote :

I cannot add a kernel log, as I wasn't even able to boot the system and install the Operating System.

description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v5.1-rc3 kernel [0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed”, and attach dmesg.

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.1-rc3/

tags: added: bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.