amd_iommu conflict with Marvell 88SE9230 SATA Controller

Bug #1810239 reported by Steven Ellis on 2019-01-02
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Debian)
New
Unknown
linux (Fedora)
Unknown
Unknown
linux (Ubuntu)
Low
Unassigned

Bug Description

Booting with kernel 4.18.0-13.14~18.04.1-generic shows errors:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810239/comments/153

WORKAROUND: Use kernel boot parameter:
amd_iommu=off

---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mythfe 1170 F.... pulseaudio
CurrentDesktop: XFCE
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=/dev/mythnew/swap
InstallationDate: Installed on 2016-07-02 (913 days ago)
InstallationMedia: Mythbuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
IwConfig:
 lo no wireless extensions.

 enp9s0 no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. B450M S2H
Package: linux (not installed)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.18.0-13-generic root=/dev/mapper/mythnew-root ro verbose drm_kms_helper.edid_firmware=HDMI-A-3:edid/panasonic.edid
ProcVersionSignature: Ubuntu 4.18.0-13.14~18.04.1-generic 4.18.17
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-13-generic N/A
 linux-backports-modules-4.18.0-13-generic N/A
 linux-firmware 1.173.2
RfKill:

Tags: bionic
Uname: Linux 4.18.0-13-generic x86_64
UpgradeStatus: Upgraded to bionic on 2018-07-27 (158 days ago)
UserGroups: adm cdrom dip lpadmin mythtv nopasswdlogin plugdev sambashare sudo video
_MarkForUpload: True
dmi.bios.date: 12/04/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F2c
dmi.board.asset.tag: Default string
dmi.board.name: B450M S2H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF2c:bd12/04/2018:svnGigabyteTechnologyCo.,Ltd.:pnB450MS2H:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB450MS2H:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B450M S2H
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Created attachment 72217
Output of `dmesg' command

I have a MSI Z68A-GD80 B3 motherboard and when I try to enable Intel's IOMMU (kernel booted with intel_iommu=on), integrated Marvell 88SE9128 SATA controller doesn't work.

To reproduce:
1. Compile and prepare kernel with Intel IOMMU support enabled (CONFIG_INTEL_IOMMU=y).
2. Reboot the computer.
3. Enter BIOS and enable VT-d.
4. Boot the kernel with intel_iommu=on parameter.

Right after boot, kernel reports the following errors (SATA controller is at 0b:00.0):

[ 2.639774] DRHD: handling fault status reg 3
[ 2.639782] DMAR:[DMA Read] Request device [0b:00.1] fault addr fff00000
[ 2.639783] DMAR:[fault reason 02] Present bit in context entry is clear

After a while these entries appear:

[ 7.625837] ata14.00: qc timeout (cmd 0xa1)
[ 7.628341] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.935483] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 17.908407] ata14.00: qc timeout (cmd 0xa1)
[ 17.910935] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 17.912276] ata14: limiting SATA link speed to 1.5 Gbps
[ 18.219077] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 48.134607] ata14.00: qc timeout (cmd 0xa1)
[ 48.137508] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 48.444646] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

When there is a disk connected to the controller it does not work. When there are none, computer starts normally, apart from the huge lag caused by, presumably, probing the device.

Since this is the secondary controller on these motherboards, to eliminate those symptoms you can just plug disk in one of available ports of the built-in Intel SATA controller and disable Marvell's one using BIOS. The other work-around, if you need to use eSATA capabilities of the latter, is to disable VT-d techonology also using BIOS.

Created attachment 72218
Output of `lspci -knnv' command

Created attachment 72219
Kernel config

The same problem occurs on a Z68A-GD65 MSI G3 system Marvell 88SE91xx.

grep DMAR:

ACPI: DMAR beaff508 000B0 (v01 ALASKA A M I 00000001 INTL 00000001)
DMAR: Host address width 36
DMAR: DRHD base: 0x000000fed91000 flags: 0x1
DMAR: RMRR base: 0x000000bf4cc000 end: 0x000000bf4eefff
DMAR: No ATSR found
DMAR:[DMA Read] Request device [03:00.1] fault addr fffc0000
DMAR:[fault reason 02] Present bit in context entry is clear

grep IOMMU:

Intel-IOMMU: enabled
IOMMU 0: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
IOMMU 0 0xfed91000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
Intel-IOMMU: enabled
IOMMU 0: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
IOMMU 0 0xfed91000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]

grep ata8:

ata8: SATA max UDMA/133 abar m2048@0xfa310000 port 0xfa310180 irq 48
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: limiting SATA link speed to 3.0 Gbps
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)

Created attachment 72419
config file / kernel 3.2.2

Kernel config

Created attachment 72420
Output of `lspci -knnv' command

Output of `lspci -knnv' command

I confirm this bug with kernel 3.2.6: same error with VT-d enabled in bios.

With mainboard "Asus Rampage III Gene", Z68, onboard Marvell; CPU Xeon L5520; 3x4GB Ram. Logs/Printouts follow this evening.

Created attachment 72733
kernel config

above bug confirmed with 3.2.13

Created attachment 72734
dmesg intel z68, asus rampage III gene, vt-d enable

Created attachment 72735
lspci, asus rampage III gene, z68, vt-d enable, 3.2.13

(In reply to comment #6)
> I confirm this bug with kernel 3.2.6: same error with VT-d enabled in bios.
>
> With mainboard "Asus Rampage III Gene", Z68, onboard Marvell; CPU Xeon L5520;
> 3x4GB Ram. Logs/Printouts follow this evening.

Also confirmed for current kernel 3.2.13.

From a pdf file by Intel with title "Intel® Virtualization Technology for Directed I/O
Architecture Specification":
--snip--
3.6.1.4 PCI Express Devices Using Phantom Functions
To increase the maximum possible number of outstanding requests requiring completion, PCI Express allows a device to use function numbers not assigned to implemented functions to logically extend the Tag identifier. Unclaimed function numbers are referred to as Phantom Function Numbers (PhFN). A device reports its support for phantom functions through the Device Capability configuration register, and requires software to explicitly enable use of phantom functions through the Device Control configuration register.

Since the function number is part of the requester-id used to locate the context-entry for processing a DMA request, when assigning PCI Express devices with phantom functions enabled, software must program multiple context entries, each corresponding to the PhFN enabled for use by the device function. Each of these context-entries must be programmed identically to ensure the DMA requests with any of these requester-ids are processed identically.
--snip--

grep -ri phant says pci_regs.h knows about the capability, but it doesn't appear anywhere else in the kernel as far as I can see. Look for PCI_EXP_DEVCAP_PHANTOM and PCI_EXP_DEVCTL_PHANTOM.

Unfortunately, lspci indicates that the Marvell chip is not using phantom functions (lspci upload to follow), so at this point I can't tell if I'm on the right trail.

Caveat lector: I don't have any previous experience with low-level PCI stuff.

Created attachment 73265
lspci output including device capabilities

I'm seeing similar errors with AMD-Vi (AMD's IOMMU implementation) and a couple of Marvell 88SE9128-based cards, and can confirm that it is still present in 3.7.0 builds.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768

This problem happens here as well. Asus P9X79 WS, BIOS 3306, X79, i7-3930K. Running kernel 3.7.3. In addition to being unable to use the Marvel SATA controller ports, this causes a ~40s hang during boot.

I tried contacting Asus about this, as I think this could be fixed by a BIOS update, but they replied to me in horrible English they do not support Linux. I'll think twice before buying Asus again in the future, but it would be nice if a workaround could be implemented in the kernel.

Created attachment 91521
dmesg on Asus P9X79 WS, kernel 3.7.3

Created attachment 91531
lspci -knvv on Asus P9X79 WS, kernel 3.7.3

FWIW, I still have this issue with 3.7.8 and 3.8-rc7. BIOS update 3401 for the P9X79 WS didn't help. Additionally the hang during boot becomes worse (up to ~65 seconds), when a hard drive is connected. Since the drive is unusable anyway, I hacked the AHCI driver to ignore the Marvell controller. While no solution to this problem, at least my boot time is back to normal (<30s).

Same problem with Marvell 88SE9172 SATA Controller.
I have Gigabyte GA-Z77X-UD5H with two Marvell 88SE9172 SATA controllers and Intel E3-1245v2 CPU. VT-d is enabled. When running normal Debian 7 or >Ubuntu 12.04 i can see HDDs and SSDs connected to Marvell ports. After installing XenServer 6.1 and Xen Cloud Platform 1.6 - HDDs and SSDs are not detected, but lspci showing that Marvell 88SE9172 controllers are detected.

The root cause of this bug seems to be : the device illegally accessed the memory that should be reserved for IOMMU module, and this changed iommu registers.

ZhenHua, can you elaborate on this? Do you mean a device accessed the MMIO space used to program the IOMMU itself? If so, how did you conclude that? I doubt the IOMMU space is at address 0xfff00000.

Based on the following data:

  Paweł:
    DMAR:[DMA Read] Request device [0b:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear
    0b:00.0 [0106]: Marvell [1b4b:9123]
  Korneliusz:
    DMAR:[DMA Read] Request device [03:00.1] fault addr fffc0000
    DMAR:[fault reason 02] Present bit in context entry is clear
    03:00.0 [0106]: Marvell 88SE9123 SATA [1b4b:9123]
  Daniel:
    IOMMU identity map errors (assuming unrelated for now)
    DMAR:[DMA Read] Request device [01:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear
    01:00.0 [0106]: Marvell 88SE9123 SATA [1b4b:9123]
  Stijn:
    dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear
    07:00.0 0106: 1b4b:9130 (rev 11) (prog-if 01 [AHCI 1.0])

in each case the IOMMU saw a DMA read to an address that wasn't mapped for the requesting device. In each case, the requester is function .1, the kernel doesn't know about a .1 function, and there is a Marvell 912x SATA control at the corresponding .0 function.

Andrew's Phantom Function theory seems like a good direction to explore. Maybe these devices incorrectly report Phantom Function support in the Device Capability & Control, and we just need some sort of quirk to work around that.

It would be interesting to know whether the .0 Marvell function has valid IOMMU mappings for the fault addresses (0xfff00000 or 0xfffc0000), or whether there is really anything at those addresses. They seem like dubious targets for DMA.

Hi guys,

    1. Since there are only lspci running in "intel_iommu=on", could you paste lspci -vvv and lspci -t, lspci -n when intel_iommu is not set to on?

Thanks
ZhenHua

Created attachment 109981
Patch with quirk for incorrect PCI requester IDs

Here's a patch that provides a quirk for what I believe to be the root cause: devices that use incorrect PCI requester IDs, including Marvell 91xx controllers.

Various revisions have been sent to LKML and IOMMU-list in the past and a number of people have reported that it solved their problem and I've been running this on two boxes for months. I'm not sure why it hasn't been accepted.

Note that there are several devices that suffer from the same affliction, i.e., using incorrect PCI requester IDs in when their transactions. The Marvell devices use both xx:yy.0 and xx:yy.1, possibly related to the SATA port number. Other devices, like Ricoh's R5C832 PCIe IEEE 1394 Controller commonly found in T410 and T420 Thinkpads use a single incorrect requester ID.

Please try this patch and let me know if it works for you.

Each context_entry has a present bit. If a context entry is used for a device, but its present bit is not set to 1, an error with fault number 2 will occur.

I tested on my PC, comment a line "context_set_present(context);" will cause the same error. So I guess the devices that has the error may be using a context entry with present bit 0.

75 comments hidden view all 155 comments

Created attachment 154941
dmesg intremap=off

Created attachment 154951
dmesg intel_iommu=off

Created attachment 154961
dmesg intel_iommu=on

Created attachment 154971
dmesg pci=nomsi

I tried with all of the kernel options you recommended, and also included dmesg with intel_iommu=on and intel_iommu=off for comparison. Regardless of the other kernel options, the SSD is not visible when intel_iommu=on

Created attachment 155441
dmesg linux 3.17

Arch just pushed the 3.17 kernel, bug is still present. Would you like me to re-run the intremap and nomsi kernel options?

Created attachment 156481
attachment-10507-0.html

I can't see any complaint about present bit being cleared in comment 94.
There are
likely entries for both function 0 and 1. It seems like you have another
problem...

Did you use the controller to boot the kernel? I noticed issues when using
the Marvell
controller as boot device. My best guess is that the BIOS assigned memory
to the
controller that it is still accessing. Problem is that the kernel wasn't
informed about it.
Could your problem be the same?

2014-10-27 16:22 GMT+01:00 <email address hidden>:

> https://bugzilla.kernel.org/show_bug.cgi?id=42679
>
> --- Comment #104 from Elliott <email address hidden> ---
> Created attachment 155441
> --> https://bugzilla.kernel.org/attachment.cgi?id=155441&action=edit
> dmesg linux 3.17
>
> Arch just pushed the 3.17 kernel, bug is still present. Would you like me
> to
> re-run the intremap and nomsi kernel options?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>

Accidentally created an attachment. Can't seem to find any way to remove it.
Sorry about that... Please feel free to remove it if possible.

My kernel lives on another disk drive. /dev/sda1 is my EFI system partition, /dev/sda2 is the MSR, /dev/sda3 is NTFS Windows 7, /dev/sda4 is my / partition. Marvell controller is the SSD on /dev/sdb. I don't know what you mean by "preset bit" (sorry, I'm not so fluent in C).

I'm using the SSD with an embedded Marvell controller as a caching device (enhanceio when I posted to this bug, but I just switched to bcache) for a slower hard drive. I did briefly consider enhanceio might be the problem, so I disabled it completely to test. This didn't make a difference; with intel_iommu, the kernel throws the dmar errors, and I can't access /dev/sdb.

The quirk installs entries for both function numbers. If function 1 would have been unknown, you would have seen warnings about presence bit not set (see comment 78 as example). The lack of those messages indicates that you successfully installed entries for both function 0 and 1, hence that the patch is working.

You can still run into problems if the chip tries to read/write memory that isn't allocated by the driver module. The problems I saw was related to the controller being initiated and used by the BIOS during boot. It tried to read memory that didn't belong to it (as fas as the linux kernel was concerned). The controller stopped working when the DMA read failed (blocked by the iommu).

It is not necessarily an error that the controller is assigned memory during boot. Although these memory regions must be presented to the operating system. This is where the vt-d support seems to fail on many consumer boards.

Is there any progress ?

I'm hitting this error on Fedora 3.17.8-200.fc20 kernel, which makes my system pretty much unusable :(

07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])
        DeviceName: Marvell 9230 AHCI controller
        Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230]
        Flags: bus master, fast devsel, latency 0, IRQ 92
        I/O ports at b050 [size=8]
        I/O ports at b040 [size=4]
        I/O ports at b030 [size=8]
        I/O ports at b020 [size=4]
        I/O ports at b000 [size=32]
        Memory at 90610000 (32-bit, non-prefetchable) [size=2K]
        Expansion ROM at 90600000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [70] Express Legacy Endpoint, MSI 00
        Capabilities: [e0] SATA HBA v0.0
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: ahci

Motherboard is Supermicro X10SBA - http://www.supermicro.nl/products/motherboard/celeron/X10/X10SBA.cfm

(In reply to frollic from comment #109)
> Is there any progress ?
>
> I'm hitting this error on Fedora 3.17.8-200.fc20 kernel, which makes my
> system pretty much unusable :(
>
> 07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe
> SATA 6Gb/s Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])

It should have been fixed in v3.16 by cc346a4714 for this device. Are you sure you're seeing the same error? What are the symptoms?

Actually, refreshing my memory in the comments here, others are also reporting that issues for 1b4b:9230 persist, but they're different than the problem we're trying to fix here and suggest either broken hardware or broken driver (or both). As suggested previously, if you're not getting DMAR faults, file a new bug.

Download full text (23.6 KiB)

Indeed, I don't have DMAR errors in my syslog.

Drives are 3 * WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133 running
soft-RAID5.
One SAMSUNG SSD SM841 mSATA 128GB, DXM43D0Q, max UDMA/133 in a mSAT->SATA case/converter.

Feb 4 19:09:43 atlantis kernel: [ 464.228813] ata3: failed to read log page 10h (errno=-5)
Feb 4 19:09:43 atlantis kernel: [ 464.231988] ata3.00: exception Emask 0x1 SAct 0xc000 SErr 0x0 action 0x0
Feb 4 19:09:43 atlantis kernel: [ 464.235233] ata3.00: irq_stat 0x40000008
Feb 4 19:09:43 atlantis kernel: ata3: failed to read log page 10h (errno=-5)
Feb 4 19:09:43 atlantis kernel: ata3.00: exception Emask 0x1 SAct 0xc000 SErr 0x0 action 0x0
Feb 4 19:09:43 atlantis kernel: ata3.00: irq_stat 0x40000008
Feb 4 19:09:43 atlantis kernel: [ 464.238596] ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: [ 464.242000] ata3.00: cmd 60/00:70:90:3b:bc/04:00:0c:00:00/40 tag 14 ncq 524288 in
Feb 4 19:09:43 atlantis kernel: [ 464.242000] res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: ata3.00: cmd 60/00:70:90:3b:bc/04:00:0c:00:00/40 tag 14 ncq 524288 in
         res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: [ 464.248733] ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: [ 464.252192] ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: [ 464.255558] ata3.00: cmd 60/00:78:90:3f:bc/04:00:0c:00:00/40 tag 15 ncq 524288 in
Feb 4 19:09:43 atlantis kernel: [ 464.255558] res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: ata3.00: cmd 60/00:78:90:3f:bc/04:00:0c:00:00/40 tag 15 ncq 524288 in
         res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: [ 464.262523] ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: [ 464.272877] ata3.00: revalidation failed (errno=-2)
Feb 4 19:09:43 atlantis kernel: [ 464.276284] ata3: hard resetting link
Feb 4 19:09:43 atlantis kernel: ata3.00: revalidation failed (errno=-2)
Feb 4 19:09:43 atlantis kernel: ata3: hard resetting link
Feb 4 19:09:44 atlantis kernel: [ 464.586712] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 19:09:44 atlantis kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 19:09:44 atlantis kernel: [ 464.593370] ata3.00: configured for UDMA/133
Feb 4 19:09:44 atlantis kernel: [ 464.596855] ata3: EH complete
Feb 4 19:09:44 atlantis kernel: ata3.00: configured for UDMA/133
Feb 4 19:09:44 atlantis kernel: ata3: EH complete
Feb 4 19:10:03 atlantis kernel: [ 484.234979] ata3: failed to read log page 10h (errno=-5)
Feb 4 19:10:03 atlantis kernel: [ 484.238484] ata3.00: exception Emask 0x1 SAct 0xc000000 SErr 0x0 action 0x0
Feb 4 19:10:03 atlantis kernel: [ 484.242039] ata3.00: irq_stat 0x40000008
Fe...

In addition, mobo is brand new (doesn't mean it can't be faulty), WDC drives are 2 months old (installed just before X-mas last year). The SSD was purchased used, so I can't tell you how old that is.

All of the hardware, except for the Samsung SSD, ran just fine on my Supermicro X7SPA-H, before I swapped mobo just two days ago.

(In reply to Alex Williamson from comment #95)

I encountered same problem on PX-G128M6e (Plextor M6e series SSD) and resolved it by the patch.
(actually, I used the 4.0.5 kernel patched with the code described in https://lkml.org/lkml/2015/2/2/226 )

Booting with the ssd and passthrough the ssd to a guest OS both work correctly.

My system is Asus H97M-PLUS with Bios 2501 and PX-G128M6e with firmware revision 1.06.
The kernel .config is Arch's linux 4.0.5-1 package.

Download full text (3.2 KiB)

Created attachment 179951
dmesg of 4.0.5 vanilla kernel with iommu=on

`grep -i -e dmar -e iommu` is below

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux-vanilla root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] ACPI: DMAR 0x00000000DAC6CED0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-linux-vanilla root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] Intel-IOMMU: enabled
[ 0.107086] dmar: Host address width 39
[ 0.107098] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.107123] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.107138] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.107154] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.107169] dmar: RMRR base: 0x000000dbe7b000 end: 0x000000dbe89fff
[ 0.107179] dmar: RMRR base: 0x000000dd000000 end: 0x000000df1fffff
[ 0.107191] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.685402] DMAR: No ATSR found
[ 0.685642] IOMMU: dmar0 using Queued invalidation
[ 0.685651] IOMMU: dmar1 using Queued invalidation
[ 0.685662] IOMMU: Setting RMRR:
[ 0.685694] IOMMU: Setting identity map for device 0000:00:02.0 [0xdd000000 - 0xdf1fffff]
[ 0.686154] IOMMU: Setting identity map for device 0000:00:14.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686215] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686268] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686308] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.686329] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 0.847930] dmar: DRHD: handling fault status reg 2
[ 0.848264] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 1.161006] dmar: DRHD: handling fault status reg 3
[ 1.161963] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 6.159656] dmar: DRHD: handling fault status reg 2
[ 6.160750] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 6.472980] dmar: DRHD: handling fault status reg 3
[ 6.473513] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 11.471329] dmar: DRHD: handling fault status reg 2
[ 11.471661] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 11.784476] dmar: DRHD: handling fault status reg 3
[ 11.785472] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 16.783038] dmar: DRHD: handling fault status reg 2
[ 16.783646] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:...

Read more...

Created attachment 179961
dmesg of 4.0.5 patched kernel with iommu=on

`grep -i -e dmar -e iommu` is below

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux-m6e root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] ACPI: DMAR 0x00000000DAC6CED0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-linux-m6e root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] Intel-IOMMU: enabled
[ 0.107025] dmar: Host address width 39
[ 0.107037] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.107060] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.107075] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.107092] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.107107] dmar: RMRR base: 0x000000dbe7b000 end: 0x000000dbe89fff
[ 0.107117] dmar: RMRR base: 0x000000dd000000 end: 0x000000df1fffff
[ 0.107129] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.688999] DMAR: No ATSR found
[ 0.689240] IOMMU: dmar0 using Queued invalidation
[ 0.689249] IOMMU: dmar1 using Queued invalidation
[ 0.689259] IOMMU: Setting RMRR:
[ 0.689292] IOMMU: Setting identity map for device 0000:00:02.0 [0xdd000000 - 0xdf1fffff]
[ 0.689754] IOMMU: Setting identity map for device 0000:00:14.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689816] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689868] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689908] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.689930] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 66.222474] [drm] DMAR active, disabling use of stolen memory

Download full text (5.8 KiB)

`lscpi -nnvv`

02:00.0 SATA controller [0106]: Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. Device [1b4b:9183]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Interrupt: pin A routed to IRQ 30
 Region 0: I/O ports at e050 [size=8]
 Region 1: I/O ports at e040 [size=4]
 Region 2: I/O ports at e030 [size=8]
 Region 3: I/O ports at e020 [size=4]
 Region 4: I/O ports at e000 [size=32]
 Region 5: Memory at f7c20000 (32-bit, non-prefetchable) [size=512]
 Expansion ROM at f7c00000 [disabled] [size=128K]
 Capabilities: [40] Power Management version 3
  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee00378 Data: 0000
 Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
  DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
   ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 128 bytes, MaxReadReq 512 bytes
  DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
   ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
  LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
  LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
    Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
    Compliance De-emphasis: -6dB
  LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
    EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
 Capabilities: [100 v1] Advanced Error Reporting
  UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
  UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
  UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
  CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout+ NonFatalErr-
  CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
  AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
 Kernel driver in use: ahci
 Kernel modules: ahci

----

`lscpi -nnvv` on the host with passthrough the ssd to a guest OS

02:00.0 SATA controller [0106]: Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. Device [1b...

Read more...

I believe I am affected by the same bug with the Marvell 88SE9120 controller on an ASRock 990FX Extreme 4 motherboard.
Although there are no DMAR errors in dmesg, when AMD's IOMMU is enabled in the bios I get the following a couple of times, before it gives up

[ 117.616423] ata9: hard resetting link
[ 117.632972] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=0x0000000000020440 flags=0x0070]
[ 117.632982] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=0x0000000000020450 flags=0x0070]
[ 118.340472] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=0x0000000000020000 flags=0x0050]
[ 122.616621] ata9: softreset failed (1st FIS failed)
[ 122.616632] ata9: reset failed, giving up
[ 122.616640] ata9: EH complete

Once the controller's dev ID was added to drivers/pci/quirks.c everything worked as expected in kernel 4.1 from git.kernel.org (23b7776290b10297fe2cae0fb5f166a4f2c68121)

[ 1520.100391] ata9: hard resetting link
[ 1526.038156] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 330)
[ 1526.044554] ata9.00: ATA-7: SAMSUNG HD502IJ, 1AA01112, max UDMA7
[ 1526.044559] ata9.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1526.050996] ata9.00: configured for UDMA/133
[ 1526.051007] ata9: EH complete

And here is the patch

--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3589,6 +3589,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x91a0,
 /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
     quirk_dma_func1_alias);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9120,
+ quirk_dma_func1_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
     quirk_dma_func1_alias);
 /* https://bugs.gentoo.org/show_bug.cgi?id=497630 */

Could this device id be added to the list of affected devices?

(In reply to Tasos Sahanidis from comment #118)
>
> Could this device id be added to the list of affected devices?

It's already queued in the pull request for v4.2:

http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/quirks.c?id=247de694349c2eeea11b8d8936541f5012a09318

(In reply to Alex Williamson from comment #119)
> It's already queued in the pull request for v4.2:
>
> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/drivers/
> pci/quirks.c?id=247de694349c2eeea11b8d8936541f5012a09318

Apologies for that, did not see it.
Thank you for your time!

Hi. Old Newbie to kernel things here. I see from Alex's (initial?) patch at https://github.com/awilliam/linux-vfio/blob/02f8c6aee8df3cdc935e9bdd4f2d020306035dbe/drivers/ata/ahci.c that my 88SE9128 is in the quirks list.

However, exploring at https://github.com/awilliam/linux-vfio/blob/02f8c6aee8df3cdc935e9bdd4f2d020306035dbe/drivers/ata/ahci.c I don't see it.

So - I'm probably looking in all the wrong places.

I've just set up Fedora 22 4.1.3-200.fc22.x86_64. I'm getting this fatal error.

ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.00: failed command: WRITE DMA
ata10.00: cmd ca/00:01:08:08:00/00:00:00:00:00/e0 tag 5 dma 512 out#012
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.00: status: { DRDY }
ata10: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata10: COMRESET failed (errno=-16)
ata10: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.00: qc timeout (cmd 0xec)
ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata10.00: revalidation failed (errno=-5)
ata10: hard resetting link
ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata11.00: failed command: READ DMA EXT
ata11.00: cmd 25/00:10:20:d5:c5/00:00:12:00:00/e0 tag 24 dma 8192 in#012
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata11.00: status: { DRDY }
ata11: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata11: link is slow to respond, please be patient (ready=0)
ata10: COMRESET failed (errno=-16)
ata10: hard resetting link
ata11: COMRESET failed (errno=-16)
ata11: hard resetting link

This is a StarTech PEXSAT31E1 add-on so it's not booting the system. It's connected to a external cabinet, and I'm using mdadm for RAID-5. All drives report the same issues (logging not included here) which is what had me looking at the controller.

I am really hoping it's not included yet - which would both explain the issue and the fact that 'the fix is in'.

I've not built a kernel since - well, a long time ago - Ubuntu 6.10 or so. Now I might get a chance to try it on Fedora.

Please let me know if it would help if I provided more info. Sure looks like I'm just like most others here...

Can anyone Help?

Many Thanks :-)
/Bill

*bump*

I'm down here. I'm contemplating getting a 3ware and going the hardware route. I've had pretty horrid experience with Highpoint support (non-existent) and the Marvell controllers seem to be dysfunctional. Vendor who sold me the card could not provide any drivers or firmware updates, so this is my only possible path to a solution using this type of controller - the kernel patch(es).

Thanks.

(In reply to frollic from comment #123)
> For the 9230 you might want to check the updated BIOS we've discussed at:
> http://homeservershow.com/forums/index.php?/topic/9179-marvell-9230-firmware-
> updates-and-such/

I had found that thread in a websearch as I have encountered similar issues as you had, also using a Supermicro X10SBA. I had contacted Supermicro about this, but support did not really seem to be aware of this issue, and no update for the controller was sent to me. The thread you refer to does not state the outcome of applying the firmware to the X10SBA, does it solve the issue?

(In reply to oh-itsme from comment #124)
> I had found that thread in a websearch as I have encountered similar issues
> as you had, also using a Supermicro X10SBA. I had contacted Supermicro about
> this, but support did not really seem to be aware of this issue, and no
> update for the controller was sent to me.

I was in touch with the dutch support of Supermicro, they were very helpful, it took them about 10 days to obtain the update from Marvell.
The person I was in contact with wrote that the update would be posted along with the next BIOS update for the motherboard, but I don't think it actually happened :(

> The thread you refer to does not state the outcome of applying the firmware
> to the X10SBA, does it solve the issue?

Yes it helpmed me, the soft-RAID is running fine now, even though I get occasional mismatch_cnt is not 0 on /dev/mdXXX when running raid-check.

There seems to have been a regression sometime after the 4.3 tag (6a13feb9c82803e2b815eca72fa7a9f5561d7861) and before one of the commits on 2015-11-07 (as that's when my kernel was compiled), which causes the same errors in dmesg as Comment #118.
This results in the drives attached to the controller becoming inaccessible.

Please note that this time the quirk for my device is present in drivers/pci/quirks.c but it seems to have no effect.

Hi There

Just want to address a problem with Asrock Extreme 9 X79 with BIOS P4.00 platform and its Marvell 88SE9220 controller.

I expecience the same faults as the above DMAR faults when this controller is enabled.
However the problem appears to be resolved by adding a new entry in quirks.c

DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
                        quirk_dma_func1_alias);

Let me know if you need me to attach any logs of faults, at the moment I'm using a custom compiled kernel with the above fix on Arch Linux but can switch to a standard kernel.

Kind Regards,

If you've got the quirk fix and done the testing then I would see Documentation/process/submittingpatches.rst and submit your quirk fix as a patch with an explanation of what it fixes. The change looks correct to me.

Send it to <email address hidden> and it should get reviewed and merged

Alan

I can confirm that this issue occurs with the Marvell 88SE9128 controller on my Gigabyte GA-X59A-UD7 (rev2.0) motherboard. As with Kevin Hunt above, adding a new entry in quirks.c appears to resolve the issue.

Given the name of this bug, I was surprised that the 9128 wasn't in there.

Addendum to the above:

The 9128 *does* appear to be in quirks file for mainline, but not in the kernel provided by Arch Linux (4.15.15). It seems that was either added in 4.16 or Arch's patches removed it for some reason.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aa0082066343 for Marvell 9128 appeared in v4.16-rc1.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=832e4e1f76b8 for Marvell 88SE9220 appeared in v4.17-rc1.

Are there any devices that are still broken in v4.17-rc1? If not, maybe we can close this bug?

(In reply to Bjorn Helgaas from comment #131)
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=aa0082066343 for Marvell 9128 appeared in v4.16-rc1.
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=832e4e1f76b8 for Marvell 88SE9220 appeared in v4.17-rc1.
>
> Are there any devices that are still broken in v4.17-rc1? If not, maybe we
> can close this bug?

I still have this issue with a Marvell 88SE9230 and kernel v4.16.8 under Arch Linux. It's probably worth checking all their SATA Controllers before closing this bug: https://www.marvell.com/storage/system-solutions/

v4.16 already contains a quirk for the Marvell 88SE9230 (added by cc346a4714a5 ("PCI: Add function 1 DMA alias quirk for Marvell devices") way back in v3.16).

But from comment #44 and comments #49-#58, it sounds like the 9230 has other problems in addition to this one, so I suspect you're seeing those other problems. If so, can you open a new bug for that and copy Joshua and Alex? I took a quick look and didn't see a definitive resolution for the problems Joshua reported.

I'm going to close this one and if people see more problems that are resolved by quirk_dma_func1_alias(), they can add them here and reopen the bug.

I have this issue with "Marvell Technology Group Ltd. 88SS9183 PCIe SSD Controller" in my "Asus Rog Strix Z370-F Gaming" and solved it by adding "DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9183,
quirk_dma_func1_alias);" to "quirk_dma_func1_alias()".

149 comments hidden view all 155 comments

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1810239

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

root@mythfe-amd:~# lspci -knnv -s 01:00.0
01:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 11) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230]
 Flags: bus master, fast devsel, latency 0, IRQ 56
 I/O ports at f050 [size=8]
 I/O ports at f040 [size=4]
 I/O ports at f030 [size=8]
 I/O ports at f020 [size=4]
 I/O ports at f000 [size=32]
 Memory at f7d10000 (32-bit, non-prefetchable) [size=2K]
 Expansion ROM at f7d00000 [disabled] [size=64K]
 Capabilities: [40] Power Management version 3
 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [70] Express Legacy Endpoint, MSI 00
 Capabilities: [e0] SATA HBA v0.0
 Capabilities: [100] Advanced Error Reporting
 Kernel driver in use: ahci
 Kernel modules: ahci

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: cosmic

apport information

tags: added: apport-collected bionic
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.20 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20/

Changed in linux:
importance: Unknown → Medium
status: Unknown → Fix Released
135 comments hidden view all 155 comments

I attempted a boot with the following upstream kernel packages

  linux-image-unsigned-4.20.0-042000-generic_4.20.0-042000.201812232030_amd64.deb
  linux-modules-4.20.0-042000-generic_4.20.0-042000.201812232030_amd64.deb

On boot I see the following errors

Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xef)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to set xfermode (err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4: limiting SATA link speed to 1.5 Gbps
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xa1)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: revalidation failed (errno=-5)
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xa1)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: revalidation failed (errno=-5)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: disabled
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
Jan 02 22:09:23 mythfe-amd kernel: ata10: SATA link down (SStatus 0 SControl 330)
Jan 02 22:09:23 mythfe-amd kernel: ata13: SATA link down (SStatus 0 SControl 330)
Jan 02 22:09:23 mythfe-amd kernel: ata14: SATA link down (SStatus 0 SControl 330)

Rebooted with the 4.20.0-042000-generic and the option "amd_iommu=off" and the card works

Jan 02 22:10:52 mythfe-amd kernel: ata8.00: ATAPI: MARVELL VIRTUAL, 1.09, max UDMA/66
Jan 02 22:10:52 mythfe-amd kernel: ata8.00: configured for UDMA/66
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: ATA-8: ST3500418AS, CC46, max UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: configured for UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: ATA-7: ST3250820AS, 3.AAE, max UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: configured for UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: scsi 1:0:0:0: Direct-Access ATA ST3250820AS E PQ: 0 ANSI: 5
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: Attached scsi generic sg0 type 0
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/233 GiB)
Jan 02 22:10:52 mythfe-amd kernel: scsi 3:0:0:0: Direct-Access ATA ST3500418AS CC46 PQ: 0 ANSI: 5
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: [sda] Write Protect is off

Changed in linux (Debian):
status: Unknown → New
description: updated
tags: added: kernel-bug-exists-upstream-4.20 latest-bios-f2
summary: - amd_iommu conflict with Marvell Sata controller
+ amd_iommu conflict with Marvell 88SE9230 SATA Controller

Steven Ellis, for you personally:

1) Did this problem not occur in a prior Ubuntu or kernel release, and if so which?

2) If this issue has always occured, could you please advise to the earliest kernel you tested?

3) To keep this relevant to upstream, one will want to test the latest mainline kernel as it is released (now 5.0-rc2). Could you please advise?

Changed in linux (Ubuntu):
importance: Undecided → Low
Displaying first 40 and last 40 comments. View all 155 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.