SATA Multiplier Timeout instead of automatic spin-up severly prolongs boot

Bug #1193809 reported by Rene Schickbauer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

I have an external housing with 15 disks, (with 3 SATA multipliers a 5 disks each as far as i can tell).

[ 5.544035] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 5.544331] ata4.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[ 5.545460] ata4.00: hard resetting link

This housing is configured to NOT spin up the disks on power on.

Booting up take extremly long, on the order of 5 minutes or so. The problem is, instead of spinning up the disks via software, after accessing each disk for the first time, the kernel goes into an error condition and starts from the beginning:

[ 16.428193] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 16.428225] ata4.02: hard resetting link
[ 26.428023] ata4.02: softreset failed (timeout)
[ 29.428038] ata4.15: qc timeout (cmd 0xe4)
[ 29.428051] ata4.02: failed to read SCR 0 (Emask=0x5)
[ 29.428055] ata4.02: reset failed, giving up
[ 29.428104] ata4.15: hard resetting link
[ 39.436023] ata4.15: softreset failed (timeout)
[ 39.436068] ata4.15: hard resetting link
[ 41.636029] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 41.636260] ata4.00: hard resetting link
[ 41.988175] ata4.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 41.988201] ata4.01: hard resetting link
[ 42.340177] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 42.340203] ata4.02: hard resetting link
[ 42.692175] ata4.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 42.692200] ata4.03: hard resetting link
[ 52.692023] ata4.03: softreset failed (timeout)
[ 55.692028] ata4.15: qc timeout (cmd 0xe4)
....
and so on.

Looking at the storage housing, i can see that the kernel tries to access the first disk (and i hear it spinning up) and the screen show immediatly a timeout. Then nothing happens for a number of seconds, the kernel accesses the first disk again (succeding) and times out while the seconds disk is spinning up and so on.

After the first five disks are spun up (first multiplier), the kernel accesses all disks once more (after the last multiplier reset) and succeeds, and then gets stuck on the second multiplier. And then on the third...

I can't be certain of that, but i'm pretty sure i hear the already spun up disks parking/unparking the heads when the multiplier gets reset (which can't be good, either).

I think the correct solution would be to implement a staggered spin-up of the disks. Like sending a spin-up command (or whatever the kernel usually uses to wake up disks) to all ports of the multiplier, delayed by 250ms or so, before doing the the disk detection. This should speed up boot considerably and will most likely reduce wear on the disks.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-23-generic 3.8.0-23.34
ProcVersionSignature: Ubuntu 3.8.0-23.34-generic 3.8.11
Uname: Linux 3.8.0-23-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.9.2-0ubuntu8.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Sun Jun 23 12:38:49 2013
HibernationDevice: RESUME=UUID=40dffe0d-945c-43d2-b196-783d72d41937
InstallationDate: Installed on 2013-01-28 (145 days ago)
InstallationMedia: Xubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.1)
IwConfig:
 eth5 no wireless extensions.

 eth6 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 002 Device 002: ID 046d:c018 Logitech, Inc. Optical Wheel Mouse
 Bus 002 Device 003: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: empty empty
MarkForUpload: True
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-23-generic root=UUID=9bbfc83a-95f8-413a-b412-5b2091205864 ro
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-23-generic N/A
 linux-backports-modules-3.8.0-23-generic N/A
 linux-firmware 1.106
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to raring on 2013-04-26 (57 days ago)
dmi.bios.date: 10/22/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 'V2.06 '
dmi.board.asset.tag: empty
dmi.board.name: S2932/S2932-E/S2932-SI
dmi.board.vendor: TYAN Computer Corporation
dmi.board.version: empty
dmi.chassis.asset.tag: empty
dmi.chassis.type: 3
dmi.chassis.vendor: empty
dmi.chassis.version: empty
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr'V2.06':bd10/22/2009:svnempty:pnempty:pvrempty:rvnTYANComputerCorporation:rnS2932/S2932-E/S2932-SI:rvrempty:cvnempty:ct3:cvrempty:
dmi.product.name: empty
dmi.product.version: empty
dmi.sys.vendor: empty

Revision history for this message
Rene Schickbauer (rene-schickbauer) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Rene Schickbauer (rene-schickbauer) wrote :

I managed to get the boot delays down to a few seconds by disabling soft resets on the multiplexer. This is the same workaround that's already in the kernel for some other multiplexers.

[ 6.990717] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[ 9.116035] ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 9.116322] ata10.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[ 9.117397] ata10.00: hard resetting link
[ 9.436262] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 9.436343] ata10.01: hard resetting link
[ 9.756282] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 9.756369] ata10.02: hard resetting link
[ 9.814423] Adding 4191228k swap on /dev/sdb5. Priority:-1 extents:1 across:4191228k
[ 10.076264] ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 10.076350] ata10.03: hard resetting link
[ 10.396281] ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 10.396364] ata10.04: hard resetting link
[ 10.451568] EXT4-fs (sdb1): re-mounted. Opts: errors=remount-ro
[ 10.716264] ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 10.716350] ata10.05: hard resetting link
[ 11.036265] ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[ 12.191655] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[ 12.194589] XFS (sda): Mounting Filesystem

tags: added: patch
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-rc7-saucy/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Rene Schickbauer (rene-schickbauer) wrote :

Confirmed that the bug exists upstream.

During/After testing with mainline i ran into all sorts of problems. Took me hours to be able to access the ZFS filesystem again (using zfsonlinux, not that fuse stuff). So i will not be able to do any more testing of upstream kernels...

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

Revision history for this message
Rene Schickbauer (rene-schickbauer) wrote :

Reported upstream on https://bugzilla.kernel.org/show_bug.cgi?id=60596

I'm already subscribed to too many mailing lists, so i choose to go directly to the bugzilla system.

tags: added: kernel-bug-reported-upstream
Changed in linux (Ubuntu):
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.