amd_iommu conflict with Marvell 88SE9230 SATA Controller

Bug #1810239 reported by Steven Ellis
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Debian)
Fix Released
Unknown
linux (Fedora)
Unknown
Unknown
linux (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

Booting with kernel 4.18.0-13.14~18.04.1-generic shows errors:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810239/comments/153

WORKAROUND: Use kernel boot parameter:
amd_iommu=off

---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mythfe 1170 F.... pulseaudio
CurrentDesktop: XFCE
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=/dev/mythnew/swap
InstallationDate: Installed on 2016-07-02 (913 days ago)
InstallationMedia: Mythbuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
IwConfig:
 lo no wireless extensions.

 enp9s0 no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. B450M S2H
Package: linux (not installed)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.18.0-13-generic root=/dev/mapper/mythnew-root ro verbose drm_kms_helper.edid_firmware=HDMI-A-3:edid/panasonic.edid
ProcVersionSignature: Ubuntu 4.18.0-13.14~18.04.1-generic 4.18.17
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-13-generic N/A
 linux-backports-modules-4.18.0-13-generic N/A
 linux-firmware 1.173.2
RfKill:

Tags: bionic
Uname: Linux 4.18.0-13-generic x86_64
UpgradeStatus: Upgraded to bionic on 2018-07-27 (158 days ago)
UserGroups: adm cdrom dip lpadmin mythtv nopasswdlogin plugdev sambashare sudo video
_MarkForUpload: True
dmi.bios.date: 12/04/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F2c
dmi.board.asset.tag: Default string
dmi.board.name: B450M S2H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF2c:bd12/04/2018:svnGigabyteTechnologyCo.,Ltd.:pnB450MS2H:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB450MS2H:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B450M S2H
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
In , pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote :

Created attachment 72217
Output of `dmesg' command

I have a MSI Z68A-GD80 B3 motherboard and when I try to enable Intel's IOMMU (kernel booted with intel_iommu=on), integrated Marvell 88SE9128 SATA controller doesn't work.

To reproduce:
1. Compile and prepare kernel with Intel IOMMU support enabled (CONFIG_INTEL_IOMMU=y).
2. Reboot the computer.
3. Enter BIOS and enable VT-d.
4. Boot the kernel with intel_iommu=on parameter.

Right after boot, kernel reports the following errors (SATA controller is at 0b:00.0):

[ 2.639774] DRHD: handling fault status reg 3
[ 2.639782] DMAR:[DMA Read] Request device [0b:00.1] fault addr fff00000
[ 2.639783] DMAR:[fault reason 02] Present bit in context entry is clear

After a while these entries appear:

[ 7.625837] ata14.00: qc timeout (cmd 0xa1)
[ 7.628341] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.935483] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 17.908407] ata14.00: qc timeout (cmd 0xa1)
[ 17.910935] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 17.912276] ata14: limiting SATA link speed to 1.5 Gbps
[ 18.219077] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 48.134607] ata14.00: qc timeout (cmd 0xa1)
[ 48.137508] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 48.444646] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

When there is a disk connected to the controller it does not work. When there are none, computer starts normally, apart from the huge lag caused by, presumably, probing the device.

Since this is the secondary controller on these motherboards, to eliminate those symptoms you can just plug disk in one of available ports of the built-in Intel SATA controller and disable Marvell's one using BIOS. The other work-around, if you need to use eSATA capabilities of the latter, is to disable VT-d techonology also using BIOS.

Revision history for this message
In , pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote :

Created attachment 72218
Output of `lspci -knnv' command

Revision history for this message
In , pawel.zaq (pawel.zaq-linux-kernel-bugs) wrote :

Created attachment 72219
Kernel config

Revision history for this message
In , public (public-linux-kernel-bugs) wrote :

The same problem occurs on a Z68A-GD65 MSI G3 system Marvell 88SE91xx.

grep DMAR:

ACPI: DMAR beaff508 000B0 (v01 ALASKA A M I 00000001 INTL 00000001)
DMAR: Host address width 36
DMAR: DRHD base: 0x000000fed91000 flags: 0x1
DMAR: RMRR base: 0x000000bf4cc000 end: 0x000000bf4eefff
DMAR: No ATSR found
DMAR:[DMA Read] Request device [03:00.1] fault addr fffc0000
DMAR:[fault reason 02] Present bit in context entry is clear

grep IOMMU:

Intel-IOMMU: enabled
IOMMU 0: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
IOMMU 0 0xfed91000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
Intel-IOMMU: enabled
IOMMU 0: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
IOMMU 0 0xfed91000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf4cc000 - 0xbf4eefff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]

grep ata8:

ata8: SATA max UDMA/133 abar m2048@0xfa310000 port 0xfa310180 irq 48
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: limiting SATA link speed to 3.0 Gbps
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
ata8.00: qc timeout (cmd 0xec)
ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)

Revision history for this message
In , public (public-linux-kernel-bugs) wrote :

Created attachment 72419
config file / kernel 3.2.2

Kernel config

Revision history for this message
In , public (public-linux-kernel-bugs) wrote :

Created attachment 72420
Output of `lspci -knnv' command

Output of `lspci -knnv' command

Revision history for this message
In , listenmitglied (listenmitglied-linux-kernel-bugs) wrote :

I confirm this bug with kernel 3.2.6: same error with VT-d enabled in bios.

With mainboard "Asus Rampage III Gene", Z68, onboard Marvell; CPU Xeon L5520; 3x4GB Ram. Logs/Printouts follow this evening.

Revision history for this message
In , listenmitglied (listenmitglied-linux-kernel-bugs) wrote :

Created attachment 72733
kernel config

above bug confirmed with 3.2.13

Revision history for this message
In , listenmitglied (listenmitglied-linux-kernel-bugs) wrote :

Created attachment 72734
dmesg intel z68, asus rampage III gene, vt-d enable

Revision history for this message
In , listenmitglied (listenmitglied-linux-kernel-bugs) wrote :

Created attachment 72735
lspci, asus rampage III gene, z68, vt-d enable, 3.2.13

Revision history for this message
In , listenmitglied (listenmitglied-linux-kernel-bugs) wrote :

(In reply to comment #6)
> I confirm this bug with kernel 3.2.6: same error with VT-d enabled in bios.
>
> With mainboard "Asus Rampage III Gene", Z68, onboard Marvell; CPU Xeon L5520;
> 3x4GB Ram. Logs/Printouts follow this evening.

Also confirmed for current kernel 3.2.13.

Revision history for this message
In , acooks (acooks-linux-kernel-bugs) wrote :

From a pdf file by Intel with title "Intel® Virtualization Technology for Directed I/O
Architecture Specification":
--snip--
3.6.1.4 PCI Express Devices Using Phantom Functions
To increase the maximum possible number of outstanding requests requiring completion, PCI Express allows a device to use function numbers not assigned to implemented functions to logically extend the Tag identifier. Unclaimed function numbers are referred to as Phantom Function Numbers (PhFN). A device reports its support for phantom functions through the Device Capability configuration register, and requires software to explicitly enable use of phantom functions through the Device Control configuration register.

Since the function number is part of the requester-id used to locate the context-entry for processing a DMA request, when assigning PCI Express devices with phantom functions enabled, software must program multiple context entries, each corresponding to the PhFN enabled for use by the device function. Each of these context-entries must be programmed identically to ensure the DMA requests with any of these requester-ids are processed identically.
--snip--

grep -ri phant says pci_regs.h knows about the capability, but it doesn't appear anywhere else in the kernel as far as I can see. Look for PCI_EXP_DEVCAP_PHANTOM and PCI_EXP_DEVCTL_PHANTOM.

Unfortunately, lspci indicates that the Marvell chip is not using phantom functions (lspci upload to follow), so at this point I can't tell if I'm on the right trail.

Caveat lector: I don't have any previous experience with low-level PCI stuff.

Revision history for this message
In , acooks (acooks-linux-kernel-bugs) wrote :

Created attachment 73265
lspci output including device capabilities

Revision history for this message
In , grythumn (grythumn-linux-kernel-bugs) wrote :

I'm seeing similar errors with AMD-Vi (AMD's IOMMU implementation) and a couple of Marvell 88SE9128-based cards, and can confirm that it is still present in 3.7.0 builds.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768

Revision history for this message
In , stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote :

This problem happens here as well. Asus P9X79 WS, BIOS 3306, X79, i7-3930K. Running kernel 3.7.3. In addition to being unable to use the Marvel SATA controller ports, this causes a ~40s hang during boot.

I tried contacting Asus about this, as I think this could be fixed by a BIOS update, but they replied to me in horrible English they do not support Linux. I'll think twice before buying Asus again in the future, but it would be nice if a workaround could be implemented in the kernel.

Revision history for this message
In , stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote :

Created attachment 91521
dmesg on Asus P9X79 WS, kernel 3.7.3

Revision history for this message
In , stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote :

Created attachment 91531
lspci -knvv on Asus P9X79 WS, kernel 3.7.3

Revision history for this message
In , stijn+bugs (stijn+bugs-linux-kernel-bugs) wrote :

FWIW, I still have this issue with 3.7.8 and 3.8-rc7. BIOS update 3401 for the P9X79 WS didn't help. Additionally the hang during boot becomes worse (up to ~65 seconds), when a hard drive is connected. Since the drive is unusable anyway, I hacked the AHCI driver to ignore the Marvell controller. While no solution to this problem, at least my boot time is back to normal (<30s).

Revision history for this message
In , tradofox (tradofox-linux-kernel-bugs) wrote :

Same problem with Marvell 88SE9172 SATA Controller.
I have Gigabyte GA-Z77X-UD5H with two Marvell 88SE9172 SATA controllers and Intel E3-1245v2 CPU. VT-d is enabled. When running normal Debian 7 or >Ubuntu 12.04 i can see HDDs and SSDs connected to Marvell ports. After installing XenServer 6.1 and Xen Cloud Platform 1.6 - HDDs and SSDs are not detected, but lspci showing that Marvell 88SE9172 controllers are detected.

Revision history for this message
In , lizhenhua (lizhenhua-linux-kernel-bugs) wrote :

The root cause of this bug seems to be : the device illegally accessed the memory that should be reserved for IOMMU module, and this changed iommu registers.

Revision history for this message
In , bhelgaas (bhelgaas-linux-kernel-bugs) wrote :

ZhenHua, can you elaborate on this? Do you mean a device accessed the MMIO space used to program the IOMMU itself? If so, how did you conclude that? I doubt the IOMMU space is at address 0xfff00000.

Based on the following data:

  Paweł:
    DMAR:[DMA Read] Request device [0b:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear
    0b:00.0 [0106]: Marvell [1b4b:9123]
  Korneliusz:
    DMAR:[DMA Read] Request device [03:00.1] fault addr fffc0000
    DMAR:[fault reason 02] Present bit in context entry is clear
    03:00.0 [0106]: Marvell 88SE9123 SATA [1b4b:9123]
  Daniel:
    IOMMU identity map errors (assuming unrelated for now)
    DMAR:[DMA Read] Request device [01:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear
    01:00.0 [0106]: Marvell 88SE9123 SATA [1b4b:9123]
  Stijn:
    dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear
    07:00.0 0106: 1b4b:9130 (rev 11) (prog-if 01 [AHCI 1.0])

in each case the IOMMU saw a DMA read to an address that wasn't mapped for the requesting device. In each case, the requester is function .1, the kernel doesn't know about a .1 function, and there is a Marvell 912x SATA control at the corresponding .0 function.

Andrew's Phantom Function theory seems like a good direction to explore. Maybe these devices incorrectly report Phantom Function support in the Device Capability & Control, and we just need some sort of quirk to work around that.

It would be interesting to know whether the .0 Marvell function has valid IOMMU mappings for the fault addresses (0xfff00000 or 0xfffc0000), or whether there is really anything at those addresses. They seem like dubious targets for DMA.

Revision history for this message
In , zhen-hual (zhen-hual-linux-kernel-bugs) wrote :

Hi guys,

    1. Since there are only lspci running in "intel_iommu=on", could you paste lspci -vvv and lspci -t, lspci -n when intel_iommu is not set to on?

Thanks
ZhenHua

Revision history for this message
In , acooks (acooks-linux-kernel-bugs) wrote :

Created attachment 109981
Patch with quirk for incorrect PCI requester IDs

Here's a patch that provides a quirk for what I believe to be the root cause: devices that use incorrect PCI requester IDs, including Marvell 91xx controllers.

Various revisions have been sent to LKML and IOMMU-list in the past and a number of people have reported that it solved their problem and I've been running this on two boxes for months. I'm not sure why it hasn't been accepted.

Note that there are several devices that suffer from the same affliction, i.e., using incorrect PCI requester IDs in when their transactions. The Marvell devices use both xx:yy.0 and xx:yy.1, possibly related to the SATA port number. Other devices, like Ricoh's R5C832 PCIe IEEE 1394 Controller commonly found in T410 and T420 Thinkpads use a single incorrect requester ID.

Please try this patch and let me know if it works for you.

Revision history for this message
In , zhen-hual (zhen-hual-linux-kernel-bugs) wrote :

Each context_entry has a present bit. If a context entry is used for a device, but its present bit is not set to 1, an error with fault number 2 will occur.

I tested on my PC, comment a line "context_set_present(context);" will cause the same error. So I guess the devices that has the error may be using a context entry with present bit 0.

86 comments hidden view all 166 comments
Revision history for this message
In , alex.williamson (alex.williamson-linux-kernel-bugs) wrote :

(In reply to frollic from comment #109)
> Is there any progress ?
>
> I'm hitting this error on Fedora 3.17.8-200.fc20 kernel, which makes my
> system pretty much unusable :(
>
> 07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe
> SATA 6Gb/s Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])

It should have been fixed in v3.16 by cc346a4714 for this device. Are you sure you're seeing the same error? What are the symptoms?

Revision history for this message
In , alex.williamson (alex.williamson-linux-kernel-bugs) wrote :

Actually, refreshing my memory in the comments here, others are also reporting that issues for 1b4b:9230 persist, but they're different than the problem we're trying to fix here and suggest either broken hardware or broken driver (or both). As suggested previously, if you're not getting DMAR faults, file a new bug.

Revision history for this message
In , frollic (frollic-linux-kernel-bugs) wrote :
Download full text (23.6 KiB)

Indeed, I don't have DMAR errors in my syslog.

Drives are 3 * WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133 running
soft-RAID5.
One SAMSUNG SSD SM841 mSATA 128GB, DXM43D0Q, max UDMA/133 in a mSAT->SATA case/converter.

Feb 4 19:09:43 atlantis kernel: [ 464.228813] ata3: failed to read log page 10h (errno=-5)
Feb 4 19:09:43 atlantis kernel: [ 464.231988] ata3.00: exception Emask 0x1 SAct 0xc000 SErr 0x0 action 0x0
Feb 4 19:09:43 atlantis kernel: [ 464.235233] ata3.00: irq_stat 0x40000008
Feb 4 19:09:43 atlantis kernel: ata3: failed to read log page 10h (errno=-5)
Feb 4 19:09:43 atlantis kernel: ata3.00: exception Emask 0x1 SAct 0xc000 SErr 0x0 action 0x0
Feb 4 19:09:43 atlantis kernel: ata3.00: irq_stat 0x40000008
Feb 4 19:09:43 atlantis kernel: [ 464.238596] ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: [ 464.242000] ata3.00: cmd 60/00:70:90:3b:bc/04:00:0c:00:00/40 tag 14 ncq 524288 in
Feb 4 19:09:43 atlantis kernel: [ 464.242000] res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: ata3.00: cmd 60/00:70:90:3b:bc/04:00:0c:00:00/40 tag 14 ncq 524288 in
         res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: [ 464.248733] ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: [ 464.252192] ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: [ 464.255558] ata3.00: cmd 60/00:78:90:3f:bc/04:00:0c:00:00/40 tag 15 ncq 524288 in
Feb 4 19:09:43 atlantis kernel: [ 464.255558] res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 19:09:43 atlantis kernel: ata3.00: cmd 60/00:78:90:3f:bc/04:00:0c:00:00/40 tag 15 ncq 524288 in
         res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
Feb 4 19:09:43 atlantis kernel: [ 464.262523] ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: ata3.00: status: { DRDY }
Feb 4 19:09:43 atlantis kernel: [ 464.272877] ata3.00: revalidation failed (errno=-2)
Feb 4 19:09:43 atlantis kernel: [ 464.276284] ata3: hard resetting link
Feb 4 19:09:43 atlantis kernel: ata3.00: revalidation failed (errno=-2)
Feb 4 19:09:43 atlantis kernel: ata3: hard resetting link
Feb 4 19:09:44 atlantis kernel: [ 464.586712] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 19:09:44 atlantis kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 19:09:44 atlantis kernel: [ 464.593370] ata3.00: configured for UDMA/133
Feb 4 19:09:44 atlantis kernel: [ 464.596855] ata3: EH complete
Feb 4 19:09:44 atlantis kernel: ata3.00: configured for UDMA/133
Feb 4 19:09:44 atlantis kernel: ata3: EH complete
Feb 4 19:10:03 atlantis kernel: [ 484.234979] ata3: failed to read log page 10h (errno=-5)
Feb 4 19:10:03 atlantis kernel: [ 484.238484] ata3.00: exception Emask 0x1 SAct 0xc000000 SErr 0x0 action 0x0
Feb 4 19:10:03 atlantis kernel: [ 484.242039] ata3.00: irq_stat 0x40000008
Fe...

Revision history for this message
In , frollic (frollic-linux-kernel-bugs) wrote :

In addition, mobo is brand new (doesn't mean it can't be faulty), WDC drives are 2 months old (installed just before X-mas last year). The SSD was purchased used, so I can't tell you how old that is.

All of the hardware, except for the Samsung SSD, ran just fine on my Supermicro X7SPA-H, before I swapped mobo just two days ago.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

(In reply to Alex Williamson from comment #95)

I encountered same problem on PX-G128M6e (Plextor M6e series SSD) and resolved it by the patch.
(actually, I used the 4.0.5 kernel patched with the code described in https://lkml.org/lkml/2015/2/2/226 )

Booting with the ssd and passthrough the ssd to a guest OS both work correctly.

My system is Asus H97M-PLUS with Bios 2501 and PX-G128M6e with firmware revision 1.06.
The kernel .config is Arch's linux 4.0.5-1 package.

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :
Download full text (3.2 KiB)

Created attachment 179951
dmesg of 4.0.5 vanilla kernel with iommu=on

`grep -i -e dmar -e iommu` is below

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux-vanilla root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] ACPI: DMAR 0x00000000DAC6CED0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-linux-vanilla root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] Intel-IOMMU: enabled
[ 0.107086] dmar: Host address width 39
[ 0.107098] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.107123] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.107138] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.107154] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.107169] dmar: RMRR base: 0x000000dbe7b000 end: 0x000000dbe89fff
[ 0.107179] dmar: RMRR base: 0x000000dd000000 end: 0x000000df1fffff
[ 0.107191] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.685402] DMAR: No ATSR found
[ 0.685642] IOMMU: dmar0 using Queued invalidation
[ 0.685651] IOMMU: dmar1 using Queued invalidation
[ 0.685662] IOMMU: Setting RMRR:
[ 0.685694] IOMMU: Setting identity map for device 0000:00:02.0 [0xdd000000 - 0xdf1fffff]
[ 0.686154] IOMMU: Setting identity map for device 0000:00:14.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686215] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686268] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.686308] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.686329] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 0.847930] dmar: DRHD: handling fault status reg 2
[ 0.848264] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 1.161006] dmar: DRHD: handling fault status reg 3
[ 1.161963] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 6.159656] dmar: DRHD: handling fault status reg 2
[ 6.160750] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 6.472980] dmar: DRHD: handling fault status reg 3
[ 6.473513] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 11.471329] dmar: DRHD: handling fault status reg 2
[ 11.471661] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 11.784476] dmar: DRHD: handling fault status reg 3
[ 11.785472] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:[fault reason 02] Present bit in context entry is clear
[ 16.783038] dmar: DRHD: handling fault status reg 2
[ 16.783646] dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr fffe0000
               DMAR:...

Read more...

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :

Created attachment 179961
dmesg of 4.0.5 patched kernel with iommu=on

`grep -i -e dmar -e iommu` is below

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux-m6e root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] ACPI: DMAR 0x00000000DAC6CED0 0000B8 (v01 INTEL BDW 00000001 INTL 00000001)
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-linux-m6e root=UUID=8445003e-6304-4d86-b970-2afa31781a9b rw intel_iommu=on
[ 0.000000] Intel-IOMMU: enabled
[ 0.107025] dmar: Host address width 39
[ 0.107037] dmar: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.107060] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.107075] dmar: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.107092] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008c20660462 ecap f010da
[ 0.107107] dmar: RMRR base: 0x000000dbe7b000 end: 0x000000dbe89fff
[ 0.107117] dmar: RMRR base: 0x000000dd000000 end: 0x000000df1fffff
[ 0.107129] IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.688999] DMAR: No ATSR found
[ 0.689240] IOMMU: dmar0 using Queued invalidation
[ 0.689249] IOMMU: dmar1 using Queued invalidation
[ 0.689259] IOMMU: Setting RMRR:
[ 0.689292] IOMMU: Setting identity map for device 0000:00:02.0 [0xdd000000 - 0xdf1fffff]
[ 0.689754] IOMMU: Setting identity map for device 0000:00:14.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689816] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689868] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdbe7b000 - 0xdbe89fff]
[ 0.689908] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.689930] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 66.222474] [drm] DMAR active, disabling use of stolen memory

Revision history for this message
In , kernel (kernel-linux-kernel-bugs) wrote :
Download full text (5.8 KiB)

`lscpi -nnvv`

02:00.0 SATA controller [0106]: Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. Device [1b4b:9183]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Interrupt: pin A routed to IRQ 30
 Region 0: I/O ports at e050 [size=8]
 Region 1: I/O ports at e040 [size=4]
 Region 2: I/O ports at e030 [size=8]
 Region 3: I/O ports at e020 [size=4]
 Region 4: I/O ports at e000 [size=32]
 Region 5: Memory at f7c20000 (32-bit, non-prefetchable) [size=512]
 Expansion ROM at f7c00000 [disabled] [size=128K]
 Capabilities: [40] Power Management version 3
  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee00378 Data: 0000
 Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
  DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
   ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 128 bytes, MaxReadReq 512 bytes
  DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
   ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
  LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
  LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
    Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
    Compliance De-emphasis: -6dB
  LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
    EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
 Capabilities: [100 v1] Advanced Error Reporting
  UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
  UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
  UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
  CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout+ NonFatalErr-
  CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
  AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
 Kernel driver in use: ahci
 Kernel modules: ahci

----

`lscpi -nnvv` on the host with passthrough the ssd to a guest OS

02:00.0 SATA controller [0106]: Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] [1c28:0122] (rev 14) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. Device [1b...

Read more...

Revision history for this message
In , tasos (tasos-linux-kernel-bugs) wrote :

I believe I am affected by the same bug with the Marvell 88SE9120 controller on an ASRock 990FX Extreme 4 motherboard.
Although there are no DMAR errors in dmesg, when AMD's IOMMU is enabled in the bios I get the following a couple of times, before it gives up

[ 117.616423] ata9: hard resetting link
[ 117.632972] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=0x0000000000020440 flags=0x0070]
[ 117.632982] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=0x0000000000020450 flags=0x0070]
[ 118.340472] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.1 domain=0x0000 address=0x0000000000020000 flags=0x0050]
[ 122.616621] ata9: softreset failed (1st FIS failed)
[ 122.616632] ata9: reset failed, giving up
[ 122.616640] ata9: EH complete

Once the controller's dev ID was added to drivers/pci/quirks.c everything worked as expected in kernel 4.1 from git.kernel.org (23b7776290b10297fe2cae0fb5f166a4f2c68121)

[ 1520.100391] ata9: hard resetting link
[ 1526.038156] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 330)
[ 1526.044554] ata9.00: ATA-7: SAMSUNG HD502IJ, 1AA01112, max UDMA7
[ 1526.044559] ata9.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1526.050996] ata9.00: configured for UDMA/133
[ 1526.051007] ata9: EH complete

And here is the patch

--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3589,6 +3589,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x91a0,
 /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
     quirk_dma_func1_alias);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9120,
+ quirk_dma_func1_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642,
     quirk_dma_func1_alias);
 /* https://bugs.gentoo.org/show_bug.cgi?id=497630 */

Could this device id be added to the list of affected devices?

Revision history for this message
In , alex.williamson (alex.williamson-linux-kernel-bugs) wrote :

(In reply to Tasos Sahanidis from comment #118)
>
> Could this device id be added to the list of affected devices?

It's already queued in the pull request for v4.2:

http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/quirks.c?id=247de694349c2eeea11b8d8936541f5012a09318

Revision history for this message
In , tasos (tasos-linux-kernel-bugs) wrote :

(In reply to Alex Williamson from comment #119)
> It's already queued in the pull request for v4.2:
>
> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/drivers/
> pci/quirks.c?id=247de694349c2eeea11b8d8936541f5012a09318

Apologies for that, did not see it.
Thank you for your time!

Revision history for this message
In , bill.hudacek (bill.hudacek-linux-kernel-bugs) wrote :

Hi. Old Newbie to kernel things here. I see from Alex's (initial?) patch at https://github.com/awilliam/linux-vfio/blob/02f8c6aee8df3cdc935e9bdd4f2d020306035dbe/drivers/ata/ahci.c that my 88SE9128 is in the quirks list.

However, exploring at https://github.com/awilliam/linux-vfio/blob/02f8c6aee8df3cdc935e9bdd4f2d020306035dbe/drivers/ata/ahci.c I don't see it.

So - I'm probably looking in all the wrong places.

I've just set up Fedora 22 4.1.3-200.fc22.x86_64. I'm getting this fatal error.

ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.00: failed command: WRITE DMA
ata10.00: cmd ca/00:01:08:08:00/00:00:00:00:00/e0 tag 5 dma 512 out#012
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.00: status: { DRDY }
ata10: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata10: COMRESET failed (errno=-16)
ata10: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.00: qc timeout (cmd 0xec)
ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata10.00: revalidation failed (errno=-5)
ata10: hard resetting link
ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata11.00: failed command: READ DMA EXT
ata11.00: cmd 25/00:10:20:d5:c5/00:00:12:00:00/e0 tag 24 dma 8192 in#012
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata11.00: status: { DRDY }
ata11: hard resetting link
ata10: link is slow to respond, please be patient (ready=0)
ata11: link is slow to respond, please be patient (ready=0)
ata10: COMRESET failed (errno=-16)
ata10: hard resetting link
ata11: COMRESET failed (errno=-16)
ata11: hard resetting link

This is a StarTech PEXSAT31E1 add-on so it's not booting the system. It's connected to a external cabinet, and I'm using mdadm for RAID-5. All drives report the same issues (logging not included here) which is what had me looking at the controller.

I am really hoping it's not included yet - which would both explain the issue and the fact that 'the fix is in'.

I've not built a kernel since - well, a long time ago - Ubuntu 6.10 or so. Now I might get a chance to try it on Fedora.

Please let me know if it would help if I provided more info. Sure looks like I'm just like most others here...

Can anyone Help?

Many Thanks :-)
/Bill

Revision history for this message
In , bill.hudacek (bill.hudacek-linux-kernel-bugs) wrote :

*bump*

I'm down here. I'm contemplating getting a 3ware and going the hardware route. I've had pretty horrid experience with Highpoint support (non-existent) and the Marvell controllers seem to be dysfunctional. Vendor who sold me the card could not provide any drivers or firmware updates, so this is my only possible path to a solution using this type of controller - the kernel patch(es).

Thanks.

Revision history for this message
In , frollic (frollic-linux-kernel-bugs) wrote :

For the 9230 you might want to check the updated BIOS we've discussed at:
http://homeservershow.com/forums/index.php?/topic/9179-marvell-9230-firmware-updates-and-such/

Revision history for this message
In , oh-itsme (oh-itsme-linux-kernel-bugs) wrote :

(In reply to frollic from comment #123)
> For the 9230 you might want to check the updated BIOS we've discussed at:
> http://homeservershow.com/forums/index.php?/topic/9179-marvell-9230-firmware-
> updates-and-such/

I had found that thread in a websearch as I have encountered similar issues as you had, also using a Supermicro X10SBA. I had contacted Supermicro about this, but support did not really seem to be aware of this issue, and no update for the controller was sent to me. The thread you refer to does not state the outcome of applying the firmware to the X10SBA, does it solve the issue?

Revision history for this message
In , frollic (frollic-linux-kernel-bugs) wrote :

(In reply to oh-itsme from comment #124)
> I had found that thread in a websearch as I have encountered similar issues
> as you had, also using a Supermicro X10SBA. I had contacted Supermicro about
> this, but support did not really seem to be aware of this issue, and no
> update for the controller was sent to me.

I was in touch with the dutch support of Supermicro, they were very helpful, it took them about 10 days to obtain the update from Marvell.
The person I was in contact with wrote that the update would be posted along with the next BIOS update for the motherboard, but I don't think it actually happened :(

> The thread you refer to does not state the outcome of applying the firmware
> to the X10SBA, does it solve the issue?

Yes it helpmed me, the soft-RAID is running fine now, even though I get occasional mismatch_cnt is not 0 on /dev/mdXXX when running raid-check.

Revision history for this message
In , tasos (tasos-linux-kernel-bugs) wrote :

There seems to have been a regression sometime after the 4.3 tag (6a13feb9c82803e2b815eca72fa7a9f5561d7861) and before one of the commits on 2015-11-07 (as that's when my kernel was compiled), which causes the same errors in dmesg as Comment #118.
This results in the drives attached to the controller becoming inaccessible.

Please note that this time the quirk for my device is present in drivers/pci/quirks.c but it seems to have no effect.

Revision history for this message
In , kevosev23194 (kevosev23194-linux-kernel-bugs) wrote :

Hi There

Just want to address a problem with Asrock Extreme 9 X79 with BIOS P4.00 platform and its Marvell 88SE9220 controller.

I expecience the same faults as the above DMAR faults when this controller is enabled.
However the problem appears to be resolved by adding a new entry in quirks.c

DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9220,
                        quirk_dma_func1_alias);

Let me know if you need me to attach any logs of faults, at the moment I'm using a custom compiled kernel with the above fix on Arch Linux but can switch to a standard kernel.

Kind Regards,

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

If you've got the quirk fix and done the testing then I would see Documentation/process/submittingpatches.rst and submit your quirk fix as a patch with an explanation of what it fixes. The change looks correct to me.

Send it to <email address hidden> and it should get reviewed and merged

Alan

Revision history for this message
In , microsoftenator (microsoftenator-linux-kernel-bugs) wrote :

I can confirm that this issue occurs with the Marvell 88SE9128 controller on my Gigabyte GA-X59A-UD7 (rev2.0) motherboard. As with Kevin Hunt above, adding a new entry in quirks.c appears to resolve the issue.

Given the name of this bug, I was surprised that the 9128 wasn't in there.

Revision history for this message
In , microsoftenator (microsoftenator-linux-kernel-bugs) wrote :

Addendum to the above:

The 9128 *does* appear to be in quirks file for mainline, but not in the kernel provided by Arch Linux (4.15.15). It seems that was either added in 4.16 or Arch's patches removed it for some reason.

Revision history for this message
In , bhelgaas (bhelgaas-linux-kernel-bugs) wrote :

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aa0082066343 for Marvell 9128 appeared in v4.16-rc1.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=832e4e1f76b8 for Marvell 88SE9220 appeared in v4.17-rc1.

Are there any devices that are still broken in v4.17-rc1? If not, maybe we can close this bug?

Revision history for this message
In , k8wtaylnuuz7 (k8wtaylnuuz7-linux-kernel-bugs) wrote :

(In reply to Bjorn Helgaas from comment #131)
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=aa0082066343 for Marvell 9128 appeared in v4.16-rc1.
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=832e4e1f76b8 for Marvell 88SE9220 appeared in v4.17-rc1.
>
> Are there any devices that are still broken in v4.17-rc1? If not, maybe we
> can close this bug?

I still have this issue with a Marvell 88SE9230 and kernel v4.16.8 under Arch Linux. It's probably worth checking all their SATA Controllers before closing this bug: https://www.marvell.com/storage/system-solutions/

Revision history for this message
In , bhelgaas (bhelgaas-linux-kernel-bugs) wrote :

v4.16 already contains a quirk for the Marvell 88SE9230 (added by cc346a4714a5 ("PCI: Add function 1 DMA alias quirk for Marvell devices") way back in v3.16).

But from comment #44 and comments #49-#58, it sounds like the 9230 has other problems in addition to this one, so I suspect you're seeing those other problems. If so, can you open a new bug for that and copy Joshua and Alex? I took a quick look and didn't see a definitive resolution for the problems Joshua reported.

I'm going to close this one and if people see more problems that are resolved by quirk_dma_func1_alias(), they can add them here and reopen the bug.

Revision history for this message
In , f.bluethner (f.bluethner-linux-kernel-bugs) wrote :

I have this issue with "Marvell Technology Group Ltd. 88SS9183 PCIe SSD Controller" in my "Asus Rog Strix Z370-F Gaming" and solved it by adding "DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9183,
quirk_dma_func1_alias);" to "quirk_dma_func1_alias()".

149 comments hidden view all 166 comments
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1810239

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Revision history for this message
Steven Ellis (steven-openmedia) wrote : Re: amd_iommu conflict with Marvell Sata controller

root@mythfe-amd:~# lspci -knnv -s 01:00.0
01:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 11) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230]
 Flags: bus master, fast devsel, latency 0, IRQ 56
 I/O ports at f050 [size=8]
 I/O ports at f040 [size=4]
 I/O ports at f030 [size=8]
 I/O ports at f020 [size=4]
 I/O ports at f000 [size=32]
 Memory at f7d10000 (32-bit, non-prefetchable) [size=2K]
 Expansion ROM at f7d00000 [disabled] [size=64K]
 Capabilities: [40] Power Management version 3
 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [70] Express Legacy Endpoint, MSI 00
 Capabilities: [e0] SATA HBA v0.0
 Capabilities: [100] Advanced Error Reporting
 Kernel driver in use: ahci
 Kernel modules: ahci

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: cosmic
Revision history for this message
Steven Ellis (steven-openmedia) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected bionic
description: updated
Revision history for this message
Steven Ellis (steven-openmedia) wrote : CRDA.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : Lspci.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : Lsusb.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : ProcEnviron.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : ProcModules.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : PulseList.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : UdevDb.txt

apport information

Revision history for this message
Steven Ellis (steven-openmedia) wrote : WifiSyslog.txt

apport information

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote : Re: amd_iommu conflict with Marvell Sata controller

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.20 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20/

Changed in linux:
importance: Unknown → Medium
status: Unknown → Fix Released
135 comments hidden view all 166 comments
Revision history for this message
Steven Ellis (steven-openmedia) wrote : Re: amd_iommu conflict with Marvell Sata controller

Looks like there is a new upstream issue with
 - https://bugzilla.kernel.org/show_bug.cgi?id=199733

Revision history for this message
Steven Ellis (steven-openmedia) wrote :

I attempted a boot with the following upstream kernel packages

  linux-image-unsigned-4.20.0-042000-generic_4.20.0-042000.201812232030_amd64.deb
  linux-modules-4.20.0-042000-generic_4.20.0-042000.201812232030_amd64.deb

On boot I see the following errors

Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xef)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to set xfermode (err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4: limiting SATA link speed to 1.5 Gbps
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xa1)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: revalidation failed (errno=-5)
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: qc timeout (cmd 0xa1)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: revalidation failed (errno=-5)
Jan 02 22:09:23 mythfe-amd kernel: ata8.00: disabled
Jan 02 22:09:23 mythfe-amd kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: qc timeout (cmd 0xec)
Jan 02 22:09:23 mythfe-amd kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 02 22:09:23 mythfe-amd kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
Jan 02 22:09:23 mythfe-amd kernel: ata10: SATA link down (SStatus 0 SControl 330)
Jan 02 22:09:23 mythfe-amd kernel: ata13: SATA link down (SStatus 0 SControl 330)
Jan 02 22:09:23 mythfe-amd kernel: ata14: SATA link down (SStatus 0 SControl 330)

Revision history for this message
Steven Ellis (steven-openmedia) wrote :

Rebooted with the 4.20.0-042000-generic and the option "amd_iommu=off" and the card works

Jan 02 22:10:52 mythfe-amd kernel: ata8.00: ATAPI: MARVELL VIRTUAL, 1.09, max UDMA/66
Jan 02 22:10:52 mythfe-amd kernel: ata8.00: configured for UDMA/66
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: ATA-8: ST3500418AS, CC46, max UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 02 22:10:52 mythfe-amd kernel: ata4.00: configured for UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: ATA-7: ST3250820AS, 3.AAE, max UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 02 22:10:52 mythfe-amd kernel: ata2.00: configured for UDMA/133
Jan 02 22:10:52 mythfe-amd kernel: scsi 1:0:0:0: Direct-Access ATA ST3250820AS E PQ: 0 ANSI: 5
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: Attached scsi generic sg0 type 0
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/233 GiB)
Jan 02 22:10:52 mythfe-amd kernel: scsi 3:0:0:0: Direct-Access ATA ST3500418AS CC46 PQ: 0 ANSI: 5
Jan 02 22:10:52 mythfe-amd kernel: sd 1:0:0:0: [sda] Write Protect is off

Changed in linux (Debian):
status: Unknown → New
penalvch (penalvch)
description: updated
tags: added: kernel-bug-exists-upstream-4.20 latest-bios-f2
penalvch (penalvch)
summary: - amd_iommu conflict with Marvell Sata controller
+ amd_iommu conflict with Marvell 88SE9230 SATA Controller
Revision history for this message
penalvch (penalvch) wrote :

Steven Ellis, for you personally:

1) Did this problem not occur in a prior Ubuntu or kernel release, and if so which?

2) If this issue has always occured, could you please advise to the earliest kernel you tested?

3) To keep this relevant to upstream, one will want to test the latest mainline kernel as it is released (now 5.0-rc2). Could you please advise?

Changed in linux (Ubuntu):
importance: Undecided → Low
Revision history for this message
Steven Ellis (steven-openmedia) wrote :

I've only recently traced the issue to the iommu kernel option. This device has been unstable since I bought it and I pull it out occasionally to see if the driver issues have been addressed.

I'm afraid that the test system I'm using is currently unavailable. I'll post an update when I have a chance for fresh testing.

Revision history for this message
piktogramm (piktogramm) wrote :
Download full text (3.6 KiB)

Hi,
I had similar Problems with my Marvell 88EE9230. I was able to improve the situation quite a lot by updating the firmware of the controller itself. In General all firmware versions beyond version 2.3.xxx improved the situation quite a lot. The remaining problem is, that I get failures on ata6 which is the only port which is not connected to any drive at all. Any drive connected to the marvell controller itself is perfectly stable (24/7 for +400 days).

Source for Firmwares: https://www.station-drivers.com/index.php?option=com_remository&Itemid=352&func=select&id=347&lang=en

May 05 03:16:10 doomsdaydevice kernel: ata6.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x6 frozen
May 05 03:16:11 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 03:16:11 doomsdaydevice kernel: ata6.00: cmd 61/10:08:18:6a:14/00:00:01:00:00/40 tag 1 ncq dma 8192 out
                                                res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
May 05 03:16:11 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 03:16:11 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 03:16:11 doomsdaydevice kernel: ata6.00: cmd 61/10:10:10:e8:5e/00:00:3a:00:00/40 tag 2 ncq dma 8192 out
                                                res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
May 05 03:16:11 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 03:16:11 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 03:16:11 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 06:01:33 doomsdaydevice kernel: ata6.00: exception Emask 0x0 SAct 0x60 SErr 0x0 action 0x6 frozen
May 05 06:01:33 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 06:01:33 doomsdaydevice kernel: ata6.00: cmd 61/08:28:90:da:14/00:00:23:00:00/40 tag 5 ncq dma 4096 out
                                                res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
May 05 06:01:33 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 06:01:33 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 06:01:33 doomsdaydevice kernel: ata6.00: cmd 61/08:30:a8:fd:20/00:00:05:00:00/40 tag 6 ncq dma 4096 out
                                                res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
May 05 06:01:33 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 06:01:33 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 06:01:33 doomsdaydevice kernel: ata6.00: supports DRM functions and may not be fully accessible
May 05 06:37:03 doomsdaydevice kernel: ata6.00: exception Emask 0x0 SAct 0x30 SErr 0x0 action 0x6 frozen
May 05 06:37:03 doomsdaydevice kernel: ata6.00: failed command: WRITE FPDMA QUEUED
May 05 06:37:03 doomsdaydevice kernel: ata6.00: cmd 61/08:20:48:08:10/00:00:00:00:00/40 tag 4 ncq dma 4096 out
                                                res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
May 05 06:37:03 doomsdaydevice kernel: ata6.00: status: { DRDY }
May 05 06:37:03 doomsdaydevice kernel: ata6.00: fail...

Read more...

Revision history for this message
penalvch (penalvch) wrote :

Johannes (piktogrammdd+ubuntu), it will help immensely if you use Ubuntu with the computer the problem is reproducible with and file a new report via a terminal to provide necessary debugging logs:
ubuntu-bug linux

Please feel free to subscribe me to it.

tags: added: bios-outdated-f.40
removed: latest-bios-f2
tags: added: needs-upstream-testing
Revision history for this message
piktogramm (piktogramm) wrote :

Christoper, I filed the bug. Anyway I made a mistake. I took the output from lshw where scsi@6 was not populated and I took for granted, that ata6 equals scsi@6 which isn't the case. Therefore I get the mentioned errors on my boot drive.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832383

Revision history for this message
In , LK7S2ED64JHGLKj75shg9klejHWG49h5hk (lk7s2ed64jhglkj75shg9klejhwg49h5hk-linux-kernel-bugs) wrote :

"Marvell Technology Group Ltd. 88SS9215 PCIe SSD Controller" have the same bug.
Fixed by:

DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9215,
    quirk_dma_func1_alias);

Changed in linux (Debian):
status: New → Fix Released
Revision history for this message
In , sam (sam-linux-kernel-bugs) wrote :

Also "Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller [1b4b:9125]" - fixed with:

DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
    quirk_dma_func1_alias);

Is this sufficient or should I open a new bug?

Revision history for this message
In , alan (alan-linux-kernel-bugs) wrote :

Even better would be to make a git diff of it and then submit it with explanation to

<email address hidden> and cc <email address hidden>

See:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html

Revision history for this message
In , biergaizi2009 (biergaizi2009-linux-kernel-bugs) wrote :

(In reply to sbingner from comment #136)
> Also "Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller
> [1b4b:9125]" - fixed with:
>
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
> quirk_dma_func1_alias);
>
> Is this sufficient or should I open a new bug?

I have the same hardware and was able to test and confirm the bug. I just submitted the patch to the Linux kernel maintainers. Hopefully it'll be accepted soon.

https://patchwork.kernel.org/project/linux-pci/patch/YZPA+gSsGWI6+xBP@work/

Revision history for this message
In , biergaizi2009 (biergaizi2009-linux-kernel-bugs) wrote :

(In reply to Tom Li from comment #138)
> (In reply to sbingner from comment #136)
> > Also "Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller
> > [...]
> > Is this sufficient or should I open a new bug?
>
> I have the same hardware and was able to test and confirm the bug. I just
> submitted the patch to the Linux kernel maintainers. Hopefully it'll be
> accepted soon.
>
> https://patchwork.kernel.org/project/linux-pci/patch/YZPA+gSsGWI6+xBP@work/

Patch for 88SE9125 has been merged into the upstream kernel since Linux v5.17-rc1.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e445375882883f69018aa669b67cbb37ec873406

Greg K.H. has also queued this patch for Linux 4.4, 4.9, 4.14, 5.4, 5.10, 5.15, 5.16. The patch should appear in the next stable kernel release in each branch.

Revision history for this message
In , biergaizi2009 (biergaizi2009-linux-kernel-bugs) wrote :

(In reply to Tom Li from comment #139)
> Patch for 88SE9125 has been merged into the upstream kernel since Linux
> v5.17-rc1.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=e445375882883f69018aa669b67cbb37ec873406
>
> Greg K.H. has also queued this patch for Linux 4.4, 4.9, 4.14, 5.4, 5.10,
> 5.15, 5.16. The patch should appear in the next stable kernel release in
> each branch.

My patch has just been included in Linux 4.4.300, 4.9.298, 4.14.263, 4.19.226, 5.4.174, 5.10.94, 5.15.17, and 5.16.3.

Revision history for this message
kogiyuuki (kogichan) wrote :

I have encountered similar problem in Linux 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux .

Here is my hardware infomation:
```
$ sudo lspci -knnv -s 3:00.0
03:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller [1b4b:9230] (rev 11) (prog-if 01 [AHCI 1.0])
 Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller [1b4b:9230]
 Flags: bus master, fast devsel, latency 0, IRQ 39, IOMMU group 13
 I/O ports at e050 [size=8]
 I/O ports at e040 [size=4]
 I/O ports at e030 [size=8]
 I/O ports at e020 [size=4]
 I/O ports at e000 [size=32]
 Memory at fc410000 (32-bit, non-prefetchable) [size=2K]
 Expansion ROM at fc400000 [disabled] [size=64K]
 Capabilities: [40] Power Management version 3
 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [70] Express Legacy Endpoint, MSI 00
 Capabilities: [e0] SATA HBA v0.0
 Capabilities: [100] Advanced Error Reporting
 Kernel driver in use: ahci
 Kernel modules: ahci
```
and my syslog(snipped):
```
kernel: [ 2.076538] ata14.00: ATAPI: MARVELL VIRTUAL, 1.09, max UDMA/66
kernel: [ 2.082784] ahci 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xcffe3840 flags=0x0050]
kernel: [ 74.990910] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
kernel: [ 80.032244] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
kernel: [ 80.532224] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)

```
I haven't reproduced this in Linux5.15.0.
Does anyone have information about this?

Displaying first 40 and last 40 comments. View all 166 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.