SATA card SiL 3112 doesn't work well

Bug #1690632 reported by psl
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Ubuntu 14.04.5 i386
Ubuntu 16.04.2 i386

I have a small DIY server for long time and I used PCI card with chip "SiL 3112" to connect SATA drives, to have RAID volume; I use SW RAID (mdadm). I am sure that when I built this system long time ago, it worked fine but it reports many DMA errors now, too many errors and it works slowly because there are many retries. I changed all my SATA cables, etc but that was not a source of the trouble. Real trouble is chip SiL3112. I believe that some change in Linux kernel module sata_sil started the trouble. I cannot tell when it happened, 2017, 2016 or even 2015?

Anyway, I replaced my SATA card with card based on SiL 3114 and it works fine!

$ lspci | grep 3114
00:0d.0 Mass storage controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)

I see this message in dmesg output (14.04.5, 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:06:36 UTC 2017 i686):

[ 3.543370] sata_sil 0000:00:0d.0: version 2.4
[ 3.544339] sata_sil 0000:00:0d.0: Applying R_ERR on DMA activate FIS errata fix

Could be the same "workarround" applied to SiL 3112 card? I know that when SiL 3112 is in my PC, I cannot find any message "Applying R_ERR on DMA activate FIS errata fix" in dmesg output.

BTW, I have found some comments on the net, that these cards are "broken" and should be avoided as plague, etc. Well, I think that my SiL 3112 worked fine in the past and SiL 3114 works fine just now, so it is driver problem (and some HW design bug, so the performance of chips is limited by this SW workaround).

This report is from Ubuntu 14.04.4 but I already tried 16.04.2 but it was not better, DMA errors were there but were more visible (annoying).

I can update this report with details about SiL 3112 card but it is out of my PC now...

SATA card with SiL 3112 is loaded with Non-RAID BIOS v 4.2.84.
http://www.latticesemi.com/view_document?document_id=51771
file b4284.bin

---
ApportVersion: 2.14.1-0ubuntu3.23
Architecture: i386
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp', '/dev/snd/controlC0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/hwC0D0', '/dev/snd/midiC0D0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=468d6e78-760e-4825-9e16-3b8e4110f981
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Compaq Deskpro EN Series SFF
Package: linux (not installed)
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-117-generic root=UUID=afab7acb-1aa5-4ac8-a8ff-4cb3c57b5eb7 ro quiet splash acpi=force nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw vt.handoff=7
ProcVersionSignature: Ubuntu 3.13.0-117.164-generic 3.13.11-ckt39
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-117-generic N/A
 linux-backports-modules-3.13.0-117-generic N/A
 linux-firmware 1.127.23
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.13.0-117-generic i686
UpgradeStatus: Upgraded to trusty on 2015-03-08 (797 days ago)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 08/21/99
dmi.bios.vendor: Compaq
dmi.bios.version: 686T5
dmi.board.name: 042Ch
dmi.board.vendor: Compaq
dmi.chassis.asset.tag: 8031DLCC0314
dmi.chassis.type: 3
dmi.chassis.vendor: Compaq
dmi.modalias: dmi:bvnCompaq:bvr686T5:bd08/21/99:svnCompaq:pnDeskproENSeriesSFF:pvr:rvnCompaq:rn042Ch:rvr:cvnCompaq:ct3:cvr:
dmi.product.name: Deskpro EN Series SFF
dmi.sys.vendor: Compaq

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1690632

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
psl (slansky) wrote :

The DMA errors are visible when RAID-1 do resync (and both SATA disks, 2TB each, are connected to the same SiL 3112 card). Maybe it is the problem why I have not noticed this problem in the past. When I write new data to the volume, I cannot see DMA errors. I first noticed trouble when RAID-1 array was degraded and I tried to do resync. It was not possible, resync process failed. Once I activated internal bitmap (mdadm -G /dev/md0 -b internal), it helped and I was able to synchronize disks, I needed several tries.

Revision history for this message
psl (slansky) wrote :

I replaced SiL3114 with SiL 3112 and these are details:

# lspci -n | grep 3112
00:0d.0 0180: 1095:3112 (rev 02)

# lspci -v # fragment
00:0d.0 Mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
 Subsystem: Silicon Image, Inc. SiI 3112 SATALink Controller
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 64, Cache Line Size: 32 bytes
 Interrupt: pin A routed to IRQ 11
 Region 0: I/O ports at 2050 [size=8]
 Region 1: I/O ports at 2060 [size=4]
 Region 2: I/O ports at 2058 [size=8]
 Region 3: I/O ports at 2064 [size=4]
 Region 4: I/O ports at 2040 [size=16]
 Region 5: Memory at 42200000 (32-bit, non-prefetchable) [size=512]
 [virtual] Expansion ROM at 20100000 [disabled] [size=512K]
 Capabilities: [60] Power Management version 2
  Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
 Kernel driver in use: sata_sil

Several DMA errors from dmesg output:

[ 65.792416] ata4.00: configured for UDMA/66
[ 65.792466] ata4: EH complete
[ 65.807896] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 65.808245] ata4.00: BMDMA2 stat 0x6d0009
[ 65.808425] ata4.00: failed command: READ DMA
[ 65.808631] ata4.00: cmd c8/00:18:af:8a:22/00:00:00:00:00/e0 tag 0 dma 12288 in
[ 65.808631] res 51/04:18:af:8a:22/00:00:00:00:00/e0 Emask 0x1 (device error)
[ 65.809272] ata4.00: status: { DRDY ERR }
[ 65.809445] ata4.00: error: { ABRT }
[ 65.960423] ata4.00: configured for UDMA/66
[ 65.960472] ata4: EH complete
[ 65.973628] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 65.973922] ata4.00: BMDMA2 stat 0x6d0009
[ 65.974100] ata4.00: failed command: READ DMA
[ 65.974305] ata4.00: cmd c8/00:18:af:8a:22/00:00:00:00:00/e0 tag 0 dma 12288 in
[ 65.974305] res 51/04:18:af:8a:22/00:00:00:00:00/e0 Emask 0x1 (device error)
[ 65.974946] ata4.00: status: { DRDY ERR }
[ 65.975120] ata4.00: error: { ABRT }

I am going to generate detail report.

tags: added: apport-collected
description: updated
Revision history for this message
psl (slansky) wrote : AlsaInfo.txt

apport information

Revision history for this message
psl (slansky) wrote : BootDmesg.txt

apport information

Revision history for this message
psl (slansky) wrote : CRDA.txt

apport information

Revision history for this message
psl (slansky) wrote : CurrentDmesg.txt

apport information

Revision history for this message
psl (slansky) wrote : IwConfig.txt

apport information

Revision history for this message
psl (slansky) wrote : Lspci.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcEnviron.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcInterrupts.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcModules.txt

apport information

Revision history for this message
psl (slansky) wrote : UdevDb.txt

apport information

Revision history for this message
psl (slansky) wrote : UdevLog.txt

apport information

Revision history for this message
psl (slansky) wrote : WifiSyslog.txt

apport information

psl (slansky)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
psl (slansky) wrote :

Previous report was for 14.02.5.

I added disk with Ubuntu 16.04.2 and booted from it. Kernel is:

4.4.0-77-generic #98-Ubuntu SMP Wed Apr 26 08:33:44 UTC 2017 i686

I manually degraded RAID /dev/md0, I set /dev/sdc1 as faulty. After that I added it back and resync process was started and dmesg (and console) is filled with DMA errors. I am going to create detail report.

tags: added: xenial
description: updated
Revision history for this message
psl (slansky) wrote : AlsaDevices.txt

apport information

Revision history for this message
psl (slansky) wrote : CRDA.txt

apport information

Revision history for this message
psl (slansky) wrote : CurrentDmesg.txt

apport information

Revision history for this message
psl (slansky) wrote : JournalErrors.txt

apport information

Revision history for this message
psl (slansky) wrote : Lspci.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcEnviron.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcInterrupts.txt

apport information

Revision history for this message
psl (slansky) wrote : ProcModules.txt

apport information

Revision history for this message
psl (slansky) wrote : UdevDb.txt

apport information

Revision history for this message
psl (slansky) wrote : WifiSyslog.txt

apport information

psl (slansky)
description: updated
Revision history for this message
psl (slansky) wrote :

dmesg is full of DMA errors; I attach kern.log that has some useful information.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12-rc1/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
psl (slansky) wrote :

SIL3114 was working OK for some time (5 years?) but it failed today. I replaced it with that old SIL3112 card and I see DMA errors, so this bug was not fixed in the newer kernel. I replaced Ubuntu with Debian some time ago (as Ubuntu do not support 32-bit anymore). This is my current kernel and SIL3112 is still troublemaker...

Linux server 5.10.0-23-686-pae #1 SMP Debian 5.10.179-1 (2023-05-12) i686 GNU/Linux

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.