Ubuntu

SATA link power management causes disk errors and corruption

Reported by Chris Coulson on 2010-03-16
292
This bug affects 56 people
Affects Status Importance Assigned to Milestone
Linux
Expired
Medium
linux (Ubuntu)
High
Unassigned
Lucid
Low
Unassigned
Maverick
Medium
Unassigned
Natty
High
Unassigned
pm-utils (Ubuntu)
High
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Unassigned
pm-utils-powersave-policy (Ubuntu)
High
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Unassigned

Bug Description

SRU Justification for pm-utils-powersave-policy:

Impact: On certain hardware, enabling power saving for the SATA link can cause data corruption.

How Addressed: The proposed branch removes the sata link power policy script. This will cause the link to be maintained at the normal power usage instead of dropping when the power is removed from the machine.

Reproduction: On an affected machine, unplug and plug in the power a few times. Data corruption will result.

Regression Potential: Removing the script will cause the SATA link to stay fully powered at all times. This may cause an increase in the battery usage for some machines. There should be no functionality regressions or bugs introduced by this change.

=====

Using Lucid on my laptop, I see errors like this in dmesg quite frequently (every few hours):

Mar 14 23:00:09 chris-laptop kernel: [42987.460608] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x50000 action 0xe frozen
Mar 14 23:00:09 chris-laptop kernel: [42987.460618] ata1.00: irq_stat 0x00400000, PHY RDY changed
Mar 14 23:00:09 chris-laptop kernel: [42987.460627] ata1: SError: { PHYRdyChg CommWake }
Mar 14 23:00:09 chris-laptop kernel: [42987.460635] ata1.00: failed command: READ FPDMA QUEUED
Mar 14 23:00:09 chris-laptop kernel: [42987.460649] ata1.00: cmd 60/08:00:97:23:44/00:00:01:00:00/40 tag 0 ncq 4096 in
Mar 14 23:00:09 chris-laptop kernel: [42987.460652] res 40/00:04:97:23:44/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
Mar 14 23:00:09 chris-laptop kernel: [42987.460669] ata1.00: status: { DRDY }
Mar 14 23:00:09 chris-laptop kernel: [42987.460681] ata1: hard resetting link
Mar 14 23:00:09 chris-laptop kernel: [42987.523336] ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Mar 14 23:00:09 chris-laptop kernel: [42987.523346] ata2: irq_stat 0x00400000, PHY RDY changed
Mar 14 23:00:09 chris-laptop kernel: [42987.523355] ata2: SError: { PHYRdyChg CommWake }
Mar 14 23:00:09 chris-laptop kernel: [42987.523368] ata2: hard resetting link
Mar 14 23:00:09 chris-laptop kernel: [42988.202586] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 14 23:00:09 chris-laptop kernel: [42988.205443] ata1.00: configured for UDMA/133
Mar 14 23:00:09 chris-laptop kernel: [42988.205459] ata1: EH complete
Mar 14 23:00:09 chris-laptop kernel: [42988.280089] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar 14 23:00:09 chris-laptop kernel: [42988.285567] ata2.00: configured for UDMA/100
Mar 14 23:00:09 chris-laptop kernel: [42988.289370] ata2: EH complete

Every couple of days, this results in data corruption and my filesystem being remounted read-only:

[ 6148.305806] Aborting journal on device sda1-8.
[ 6148.325011] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[ 6148.325018] EXT4-fs (sda1): Remounting filesystem read-only
[ 6148.326702] journal commit I/O error
[ 6148.330975] EXT4-fs error (device sda1) in ext4_reserve_inode_write: Journal has aborted
[ 6148.462572] __ratelimit: 15 callbacks suppressed

Those messages generally appear at the end of dmesg after the event, just after the "hard resetting link" message. I then have to boot a live CD and manually run fsck, as I can no longer boot the laptop.

This is happening every couple of days generally, although it happened 3 times in one day last Thursday.

I did contemplate it being a hardware issue, but I tried running the kernel from Karmic for a couple of days, and that worked ok without a single error message

ProblemType: Bug
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: chr1s 4010 F.... pulseaudio
 /dev/snd/controlC1: chr1s 4010 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6afc000 irq 21'
   Mixer name : 'Intel G45 DEVCTG'
   Components : 'HDA:111d76b2,10280263,00100302 HDA:80862802,80860101,00100000'
   Controls : 22
   Simple ctrls : 11
Card1.Amixer.info:
 Card hw:1 'U0x46d0x9a4'/'USB Device 0x46d:0x9a4 at usb-0000:00:1a.7-3.3, high speed'
   Mixer name : 'USB Mixer'
   Components : 'USB046d:09a4'
   Controls : 2
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'Mic',0
   Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
   Capture channels: Mono
   Limits: Capture 0 - 14
   Mono: Capture 0 [0%] [23.75dB] [on]
Date: Tue Mar 16 10:07:41 2010
DistroRelease: Ubuntu 10.04
Frequency: Once a day.
HibernationDevice: RESUME=UUID=762f3439-67ac-4828-aa94-caf2a2ba0f9a
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
LiveMediaBuild: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MachineType: Dell Inc. Latitude E5500
Package: linux-image-2.6.32-16-generic 2.6.32-16.25
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-16-generic root=UUID=4ce5e12b-6e82-4fa4-90ff-7d9859d7504e ro quiet splash
ProcEnviron:
 LANG=en_GB.utf8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-16.25-generic
Regression: Yes
RelatedPackageVersions: linux-firmware 1.32
Reproducible: No
SourcePackage: linux
TestedUpstream: No
Uname: Linux 2.6.32-16-generic x86_64
dmi.bios.date: 11/05/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 0DW635
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd11/05/2009:svnDellInc.:pnLatitudeE5500:pvr:rvnDellInc.:rn0DW635:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude E5500
dmi.sys.vendor: Dell Inc.

Chris Coulson (chrisccoulson) wrote :
Torsten Spindler (tspindler) wrote :

I'm also affected by this, running kernel 2.6.32-16-generic #25-Ubuntu SMP Tue Mar 9 16:33:52 UTC 2010 i686 GNU/Linux
$ dpkg -l linux-image
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii linux-image 2.6.32.16.17 Generic Linux kernel image.

Jeremy Foshee (jeremyfoshee) wrote :

Chris,
     I'm adding this to my list to be reviewed by the team.

Thanks!

~JFo

Changed in linux (Ubuntu):
status: New → Triaged
Surbhi Palande (csurbhi) wrote :

Chris Coulson, the description that you have posted, seems to be a part of a dmesg. Will it be possible to post that dmesg output? That will be helpful!

Surbhi Palande (csurbhi) wrote :

Chris Coulson, can you also try booting the lucid kernel with acpi=off kernel parameter, and check if you get these same errors?

Christian Reis (kiko) wrote :

I picked up 2.6.31-02063112-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and have only rebooted once, but the problem hasn't manifest itself yet, and for every other kernel it manifested itself upon boot.

Stefan Bader (smb) wrote :

Chris, actually the dmesg of Christian does not help that much as I realized later. So if you could get a working kernel booted and just post the dmesg from that. Thanks.

Stefan Bader (smb) on 2010-03-30
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Crashbit (crashbit-gmail) wrote :
Download full text (3.4 KiB)

I have a similar problem.

I have a new main board Asus P7P55D-E LX, with two SATA controllers, jmicron and marvell 9123.
I have connected a Seagate Barracuda XT 6Gb/s hard drive to marvell controller and SSD drive to a jmicron controller.

SSD drive works perfectly, but barracuda don't work properly.

I used a 2.6.32 and 2.6.33 kernel, and connects barracuda hard drive to a jmicron controller, but the problems persist.
I used a 2.6.34-rc2 kernel, and the problems are solved.

The problems appear when I trying to access barracuda hard disk.

I don't know is the same problem, or if they are caused by AHCI controller, but this is my dmesg errors.:
   9.205423] groups: 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178)
[ 11.491620] ata9: exception Emask 0x0 SAct 0xe SErr 0x0 action 0x10 frozen
[ 11.491623] ata9.00: failed command: READ FPDMA QUEUED
[ 11.491627] ata9.00: cmd 60/02:08:81:74:e0/00:00:e8:00:00/40 tag 1 ncq 1024 in
[ 11.491628] res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[ 11.491629] ata9.00: status: { DRDY }
[ 11.491631] ata9.00: failed command: READ FPDMA QUEUED
[ 11.491634] ata9.00: cmd 60/02:10:83:74:e0/00:00:e8:00:00/40 tag 2 ncq 1024 in
[ 11.491634] res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[ 11.491636] ata9.00: status: { DRDY }
[ 11.491637] ata9.00: failed command: READ FPDMA QUEUED
[ 11.491640] ata9.00: cmd 60/02:18:85:74:e0/00:00:e8:00:00/40 tag 3 ncq 1024 in
[ 11.491641] res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[ 11.491642] ata9.00: status: { DRDY }
[ 11.491650] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 11.491652] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 11.491654] Descriptor sense data with sense descriptors (in hex):
[ 11.491655] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 11.491660] e8 e0 74 85
[ 11.491661] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 11.491663] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 81 00 00 02 00
[ 11.491668] end_request: I/O error, dev sdb, sector 3907024001
[ 11.491678] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 11.491680] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 11.491682] Descriptor sense data with sense descriptors (in hex):
[ 11.491683] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 11.491687] e8 e0 74 85
[ 11.491689] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 11.491691] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 83 00 00 02 00
[ 11.491694] end_request: I/O error, dev sdb, sector 3907024003
[ 11.491697] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 11.491699] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 11.491701] Descriptor sense data with sense descriptors (in hex):
[ 11.491702] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 11.491706] e8 e0 74 85
[ 11.491708] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 11.491709] ...

Read more...

Download full text (5.0 KiB)

happens here too. Always when I am away, maybe some problem with suspending?

Kubuntu Lucid, all updates installed
Dell Studio 1749 (Intel i520)
Disk: TOSHIBA MK5056GSY

Apr 11 18:57:22 localhost kernel: [ 1852.336426] CPU0 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336430] CPU1 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336432] CPU2 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336434] CPU3 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.415822] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr 11 18:57:22 localhost kernel: [ 1852.415827] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 11 18:57:22 localhost kernel: [ 1852.415831] ata5: SError: { PHYRdyChg CommWake }
Apr 11 18:57:22 localhost kernel: [ 1852.415837] ata5: hard resetting link
Apr 11 18:57:22 localhost kernel: [ 1852.429190] CPU0 attaching sched-domain:
Apr 11 18:57:22 localhost kernel: [ 1852.429193] domain 0: span 0,2 level SIBLING
Apr 11 18:57:22 localhost kernel: [ 1852.429195] groups: 0 (cpu_power = 589) 2 (cpu_power = 589)
Apr 11 18:57:22 localhost kernel: [ 1852.429199] domain 1: span 0-3 level MC
Apr 11 18:57:22 localhost kernel: [ 1852.429201] groups: 0,2 (cpu_power = 1178) 1,3 (cpu_power = 1178)

another one:
Apr 11 14:11:19 localhost kernel: [ 3518.250989] CPU3 attaching sched-domain:
Apr 11 14:11:19 localhost kernel: [ 3518.250990] domain 0: span 1,3 level SIBLING
Apr 11 14:11:19 localhost kernel: [ 3518.250992] groups: 3 (cpu_power = 589) 1 (cpu_power = 589)
Apr 11 14:11:19 localhost kernel: [ 3518.250995] domain 1: span 0-3 level MC
Apr 11 14:11:19 localhost kernel: [ 3518.250997] groups: 1,3 (cpu_power = 1178) 0,2 (cpu_power = 1178)
Apr 11 14:11:19 localhost kernel: [ 3518.259013] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr 11 14:11:19 localhost kernel: [ 3518.259018] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 11 14:11:19 localhost kernel: [ 3518.259022] ata5: SError: { PHYRdyChg CommWake }
Apr 11 14:11:19 localhost kernel: [ 3518.259028] ata5: hard resetting link
Apr 11 14:11:19 localhost kernel: [ 3519.000927] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 11 14:11:19 localhost kernel: [ 3519.030125] ata5.00: configured for UDMA/100
Apr 11 14:11:19 localhost kernel: [ 3519.031779] ata5: EH complete
Apr 4 22:53:40 localhost kernel: [ 3.288635] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 4 22:53:40 localhost kernel: [ 3.303600] ata5.00: ATAPI: TSSTcorp DVD+/-RW TS-T633C, D700, max UDMA/100, ATAPI AN
Apr 4 22:53:40 localhost kernel: [ 3.303614] ata5.00: applying bridge limits
Apr 4 22:53:40 localhost kernel: [ 3.318833] ata5.00: configured for UDMA/100
Apr 4 22:53:40 localhost kernel: [ 3.338543] ata5: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
Apr 4 22:53:40 localhost kernel: [ 3.338547] ata5: irq_stat 0x40000008
Apr 4 22:53:40 localhost kernel: [ 3.340653] scsi 4:0:0:0: CD-ROM TSSTcorp DVD+-RW TS-T633C D700 PQ: 0 ANSI: 5
Apr 4 22:53:40 localhost kernel: [ 3.346733] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/for...

Read more...

Chase Douglas (chasedouglas) wrote :

I hit the same issue:

[367890.287614] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x1e frozen
[367890.287623] ata1: irq_stat 0x00400001, PHY RDY changed
[367890.287632] ata1: SError: { PHYRdyChg CommWake }
[367890.287644] ata1: hard resetting link
[367891.008114] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[367891.053065] ata1.00: configured for UDMA/133
[367891.068116] ata1: EH complete
[367891.088172] end_request: I/O error, dev sda, sector 239506656
[367891.088213] Aborting journal on device sda1-8.
[367891.092737] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[367891.092751] EXT4-fs (sda1): Remounting filesystem read-only

Christian Reis (kiko) wrote :

Chase, are you running the latest 32-21 kernel? I just upgraded and have yet to reproduce the error message in my kernel log when booting up or suspending. I was on 32-19 before, and I could definitely see the errors (and resulting corruption) there. I'd move to mark this fixed if it stays this way on -21 for me and others.

Chase Douglas (chasedouglas) wrote :

@Christian:

I'm running -19, I haven't had a chance to upgrade yet. I've only seen this occur once after a few weeks of use, so I won't be able to definitively say this bug is fixed if I can't reproduce easily on -21. I hope it does resolve this issue though.

Thanks

Chris Halse Rogers (raof) wrote :

It looks like I've just reproduced this on my Thinkpad x200s with the 2.6.32-21-generic kernel:

[ 5911.516865] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x1e frozen
[ 5911.516871] ata1: irq_stat 0x00400001, PHY RDY changed
[ 5911.516875] ata1: SError: { PHYRdyChg CommWake }
[ 5911.516881] ata1: hard resetting link
[ 5912.280041] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
(fuller dmesg here: http://paste.ubuntu.com/415384/ )

Chris Halse Rogers wrote:
> It looks like I've just reproduced this on my Thinkpad x200s with the
> 2.6.32-21-generic kernel:
>
> [ 5911.516865] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x1e frozen
> [ 5911.516871] ata1: irq_stat 0x00400001, PHY RDY changed
> [ 5911.516875] ata1: SError: { PHYRdyChg CommWake }
> [ 5911.516881] ata1: hard resetting link
> [ 5912.280041] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> (fuller dmesg here: http://paste.ubuntu.com/415384/ )
>
Do we have a somewhat more in depth description on what might be a reproducer?
What was going on at the time this happened? A higher level of I/O load or a
certain application running?

i have the same problem but its constant to the point i can't even install lucid most of the time, windows 7 works perfectly so its not a hardware problem, i seen this bug or various hardware and ubuntu versions so i'm thinking the debian/ubuntu has a problem with the way drive access is setup in the kernel, fedora 12 doesn't have this problem.

[ 776.594450] ata7.00: status: { DRDY }
[ 776.594454] ata7: hard resetting link
[ 777.123108] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 777.125321] ata7.00: configured for UDMA/33
[ 777.125328] ata7: EH complete
[ 777.151827] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x280100 action 0x6 frozen
[ 777.151830] ata7.00: irq_stat 0x08000000, interface fatal error
[ 777.151833] ata7: SError: { UnrecovData 10B8B BadCRC }
[ 777.151836] ata7.00: failed command: READ FPDMA QUEUED
[ 777.151841] ata7.00: cmd 60/08:00:bf:28:54/00:00:02:00:00/40 tag 0 ncq 4096 in
[ 777.151842] res 40/00:00:bf:28:54/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[ 777.151844] ata7.00: status: { DRDY }
[ 777.151847] ata7: hard resetting link
[ 777.682849] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 777.685059] ata7.00: configured for UDMA/33
[ 777.685065] ata7: EH complete
[ 777.798069] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x280100 action 0x6 frozen
[ 777.798072] ata7.00: irq_stat 0x08000000, interface fatal error
[ 777.798075] ata7: SError: { UnrecovData 10B8B BadCRC }
[ 777.798078] ata7.00: failed command: READ FPDMA QUEUED
[ 777.798083] ata7.00: cmd 60/08:00:39:4b:38/00:00:3a:00:00/40 tag 0 ncq 4096 in
[ 777.798084] res 40/00:00:39:4b:38/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)
[ 777.798086] ata7.00: status: { DRDY }
[ 777.798089] ata7: hard resetting link
[ 778.322554] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 778.324640] ata7.00: configured for UDMA/33
[ 778.324648] ata7: EH complete

I've got no idea how to reproduce this. For me, it only occurs on the
order of once or twice a month. It's difficult to remember what was
happening across those times.

I seem to have a slight association in my mind with this happening after
at least one resume-from-suspend, but since I suspend this laptop all
the time, there shouldn't be too much stock put in this.

Download full text (4.4 KiB)

Same here on Thinkpad T61:

[22359.500669] ata3.00: exception Emask 0x10 SAct 0x3 SErr 0x50000 action 0xe frozen
[22359.500675] ata3.00: irq_stat 0x00400000, PHY RDY changed
[22359.500680] ata3: SError: { PHYRdyChg CommWake }
[22359.500685] ata3.00: failed command: READ FPDMA QUEUED
[22359.500694] ata3.00: cmd 60/20:00:08:7f:5e/00:00:00:00:00/40 tag 0 ncq 16384 in
[22359.500697] res 40/00:0c:40:7f:5e/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[22359.500701] ata3.00: status: { DRDY }
[22359.500705] ata3.00: failed command: READ FPDMA QUEUED
[22359.500714] ata3.00: cmd 60/48:08:40:7f:5e/00:00:00:00:00/40 tag 1 ncq 36864 in
[22359.500716] res 40/00:0c:40:7f:5e/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[22359.500720] ata3.00: status: { DRDY }
[22359.500728] ata3: hard resetting link
[22360.250117] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[22360.252801] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[22360.252812] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[22360.252821] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[22360.256923] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[22360.256934] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[22360.256943] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[22360.258711] ata3.00: configured for UDMA/133
[22360.263591] ata3.00: configured for UDMA/133
[22360.263618] ata3: EH complete

It seems to happen just before it's going to suspend:

[22364.620394] PM: Syncing filesystems ... done.
[22364.890139] PM: Preparing system for mem sleep
[22364.890145] Freezing user space processes ... (elapsed 0.00 seconds) done.
[22364.891281] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
[22364.891353] PM: Entering mem sleep
[22364.891367] Suspending console(s) (use no_console_suspend to debug)
[22365.112468] PM: suspend of drv:psmouse dev:serio2 complete after 220.652 msecs
[22365.230163] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[22365.230346] sd 2:0:0:0: [sda] Stopping disk
[22365.808260] PM: suspend of drv:sd dev:2:0:0:0 complete after 578.100 msecs
[22366.183944] PM: suspend of drv:psmouse dev:serio1 complete after 375.492 msecs
[22366.780123] PM: suspend of drv:atkbd dev:serio0 complete after 596.170 msecs
[22366.782343] parport_pc 00:0b: disabled
[22366.782345] ACPI handle has no context!
[22366.782462] serial 00:0a: disabled
[22366.782464] ACPI handle has no context!
[22366.787653] ACPI handle has no context!
[22366.810094] iwlagn 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0x000033D8
[22366.852765] iwlagn 0000:03:00.0: RF_KILL bit toggled to disable radio.
[22366.960249] ata2: port disabled. ignoring.
[22366.960358] ata_piix 0000:00:1f.1: PCI INT C disabled
[22366.960389] ehci_hcd 0000:00:1d.7: PCI INT D disabled
[22366.960410] uhci_hcd 0000:00:1d.2: PCI INT C disabled
[22366.960436] uhci_hcd 0000:00:1d.1: PCI INT B disabled
[22366.960455] uhci_hcd 0000:00:1d.0: PCI INT A disabled
[22366.960472] pciehp 0000:00:1c.3:pcie04: pciehp_suspend ENTRY
[22367.070402] HDA Intel 0000:00:1b.0: PCI INT B disabled
[22367.09014...

Read more...

red_hood (chris-red-hood) wrote :

Just followed discussion upstream on https://bugzilla.kernel.org/show_bug.cgi?id=14543.
The error can be reproduced by attaching external power to the notebook.
The file /sys/class/scsi_host/host3/link_power_management_policy then changes from "min_power" to "max_performance".
I was not able to reproduce the error by manually writing to the file and toggle between the two values.

I can reproduce it too with attaching external power. So setting power management to performance in battery mode fixes it for me. Moreover the kernel freezes I had seem to be gone since getting rid of closed source drivers (Broadcom and ATI Radeon)

red_hood (chris-red-hood) wrote :

Is it possible to find out which program is writing to the "/sys/class/scsi_host/hostX/link_power_management_policy" files? Setting permissions to 000 stops changing link power modes, so I guess the changes are handled in user space.
As a workaround, setting the S-ATA links to "Compatibility Mode" in BIOS (switching off AHCI) worked for me.

lightweight (dave-egressive) wrote :

I'm getting the same sorts of problems every time I attempt a suspend on my ASUS Z62E laptop with

Linux 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 x86_64 GNU/Linux

The machine doesn't suspend, and eventually returns to the X login window for unlocking the screen (not the GDM login) with a very high load average, and a lot of CPU wait (unsurprisingly). Ultimately, the only fix is a reboot.

Note, this laptop was recently upgraded from Karmic, where suspend worked without problems. It also worked after the initial upgrade to Lucid, but possibly since a kernel update, this problem has sprung up probably starting around 8 May.

The error messages start on suspend, either via closing the lid or explicitly selecting suspend from the Session menu. In syslog, the only temporally related event is the DHCPREQUEST, which I've included below.

May 12 08:45:24 stampy dhclient: DHCPREQUEST of 172.20.4.212 on wlan0 to 172.20.4.1 port 67
May 12 08:45:24 stampy kernel: [48315.060765] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48315.060779] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48315.060790] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48315.060810] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48315.060813] res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
May 12 08:45:24 stampy kernel: [48315.060823] ata3.00: status: { DRDY ERR }
May 12 08:45:24 stampy kernel: [48315.060830] ata3.00: error: { UNC }
May 12 08:45:24 stampy kernel: [48315.089709] ata3.00: configured for UDMA/133
May 12 08:45:24 stampy kernel: [48315.089741] ata3: EH complete
May 12 08:45:24 stampy kernel: [48319.288891] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48319.288905] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48319.288916] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48319.288935] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48319.288939] res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
May 12 08:45:24 stampy kernel: [48319.288949] ata3.00: status: { DRDY ERR }
May 12 08:45:24 stampy kernel: [48319.288957] ata3.00: error: { UNC }
May 12 08:45:24 stampy kernel: [48319.317872] ata3.00: configured for UDMA/133
May 12 08:45:24 stampy kernel: [48319.317900] ata3: EH complete
May 12 08:45:24 stampy kernel: [48323.496473] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48323.496487] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48323.496498] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48323.496518] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48323.496522] res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
... and many more

Chris Coulson (chrisccoulson) wrote :

I've traced the trigger of the errors down to /usr/lib/pm-utils/power.d/powersave-policy-sata-link-power. This triggers the errors for me every time on my laptop:

echo "max_performance" | sudo tee /sys/class/scsi_host/host1/link_power_management_policy

Anton Zayats (anton-zayats) wrote :

Same problem here. Multiple ata1.00 errors even when pc is idle.
GA-E7AUM-DS2H motherboard.
One disk already is dead (Seagate) - it was on Jaunty.
The next was going down (WD) on 10.04 fresh minimal install with nvidia proprietary drivers.
Switching off AHCI from bios worked for me as well.
Multiple reports about this issue:
https://bugs.launchpad.net/ubuntu/+bug/353812
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/217920
http://www.nvnews.net/vbulletin/showthread.php?t=126152

Stefan Bader (smb) wrote :

@Anton, as long as this is not triggered by either suspend or removal of AC power, this seems to be a different problem. If you have not yet, please file a separate bug.

I changed the bugs title to be more specific to this. Only when switching between "min_power" and "max_performance" like described in comment #23 causes the errors it is the same class of bug.

summary: - Frequent ATA errors and disk corruption
+ SATA link power management causes disk errors and corruption

Un/re-plugging power causes switching between "min_power" and
"max_performance". See
/usr/lib/pm-utils/power.d/powersave-policy-sata-link-power

Christian Reis (kiko) wrote :

Stefan, Chris: you guys have nailed it; plugging a power cable back into the laptop causes the problem. Is there a fix or workaround that we can apply to avoid losing data meanwhile?

I guess the best short term action is probably to edit the script in
usr/lib/pm-utils/power.d/powersave-policy-sata-link-power and put an early exit
0 in there.
Its a bit hacky, though.

Chris Coulson (chrisccoulson) wrote :

The upstream bug report (https://bugzilla.kernel.org/show_bug.cgi?id=14543) has a patch which might help fix this

Changed in pm-utils-powersave-policy (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Chase Douglas (chasedouglas)
description: updated
Martin Pitt (pitti) wrote :

For lucid we'll just disable the script entirely, so wontfixing the kernel lucid task.

Changed in linux (Ubuntu Lucid):
status: New → Won't Fix
Changed in pm-utils-powersave-policy (Ubuntu Lucid):
status: New → In Progress
assignee: nobody → Chase Douglas (chasedouglas)
Martin Pitt (pitti) on 2010-05-21
Changed in pm-utils-powersave-policy (Ubuntu Lucid):
status: In Progress → Fix Committed

Accepted pm-utils-powersave-policy into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed

While I'm all for a temporary work-around, this is, IIUC, a real kernel
bug. I don't want to have my drive in full power mode when I'm on battery
as the final solution. :)

Chase Douglas (chasedouglas) wrote :

Kees,

Note that we still have the task open against "linux (ubuntu)". If we find a fix in the kernel we can consider reenabling the power save policy for Lucid. For now though, it seems prudent to disable the power save policy. Who knows if we find a fix for one SATA chipset only to reenable everything and find another chipset has the same problem. I'd prefer to force that on users testing development versions rather than released versions.

Martin Pitt (pitti) wrote :

Chris, any chance to test the lucid-proposed package to see that it stops the HDD breakage?

red_hood (chris-red-hood) wrote :

I installed pm-utils-powersave-policy from lucid-proposed and can confirm that S-ATA link reset does not occur anymore.

Martin Pitt (pitti) on 2010-06-08
tags: added: verification-done
removed: verification-needed
Christian Reis (kiko) wrote :

Agreed; just verified that this version works around the problem:

  ii pm-utils-powersave-policy 0.3.1 lightweight power saving policy when on battery

Does an upstream bug need to be filed (or identified) in the kernel bugzilla?

Chase Douglas (chasedouglas) wrote :

There have been reports in bug 528981, which may be a dupe of this bug, that the issue seems resolved from 2.6.33 and onward. It would be great if anyone here could test out a mainline kernel with the current version of pm-utils-powersave-policy to verify that it has been fixed. Mainline kernels may be found at http://kernel.ubuntu.com/~kernel-ppa/mainline/.

Thanks

Please note that bug 528981 is in regards to a PATA controller. But yes, the problem has been fixed in 2.6.33+. The problem didn't manifest itself until sometime after 2.6.28, and by the time Karmic rolled around it was impossible to even complete an installation without such severe corruption that the installation failed. The default Lucid kernel is markedly better, but still will result in corruption.

H.i.M (hir-i-mogul-gmail) wrote :

Could sb take a look a this dmesg-log.
Id like to know it this but is the problem which effects me.
If not would report a new bug.

------
[ 279.189053] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x4080000 action 0xe frozen
[ 279.189063] ata1.00: irq_stat 0x00000040, connection status changed
[ 279.189071] ata1: SError: { 10B8B DevExch }
[ 279.189079] ata1.00: failed command: READ FPDMA QUEUED
[ 279.189093] ata1.00: cmd 60/00:00:10:6b:53/01:00:05:00:00/40 tag 0 ncq 131072 in
[ 279.189096] res 40/00:20:70:1d:84/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
[ 279.189103] ata1.00: status: { DRDY }
[ 279.189113] ata1: hard resetting link
[ 280.430362] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 280.434063] ata1.00: configured for UDMA/133
[ 280.434088] ata1: EH complete
-----

Thanks H.i.M

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pm-utils-powersave-policy - 0.3.1

---------------
pm-utils-powersave-policy (0.3.1) lucid-proposed; urgency=low

  [ Chase Douglas ]
  * Remove sata link power save policy due to data corruption potential.
    -LP: #539467

  [ Martin Pitt ]
  * debian/control: Drop Vcs-Bzr: header, switching to auto-import branch.
 -- Chase Douglas <email address hidden> Fri, 21 May 2010 17:16:50 +0200

Changed in pm-utils-powersave-policy (Ubuntu Lucid):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Chase, should we also upload the p-u-p-p change to maverick, or do we expect a kernel-side solution for this?

Brian Rogers (brian-rogers) wrote :

According to comment 38, this should already be fixed in the kernel in maverick.

Martin Pitt (pitti) wrote :

Ah, closing the maverick linux task then, thanks for pointing out.

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in pm-utils-powersave-policy (Ubuntu):
status: In Progress → Won't Fix

I can confirm this is still occurring in Maverick as of this morning. The kernel fix doesn't appear to be catching all instances of the problem.

Marco Cavallini (koansoftware) wrote :

Same error here.
I Upgraded this machine since it was 8.04 up to the latest 10.04

[ 2860.060493] Restarting tasks ... done.
[ 2892.988087] ata3.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
[ 2892.988213] ata3.00: failed command: READ FPDMA QUEUED
[ 2892.988307] ata3.00: cmd 60/70:00:63:a8:ba/00:00:08:00:00/40 tag 0 ncq 57344 in
[ 2892.988310] res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 2892.988539] ata3.00: status: { DRDY }
[ 2892.988603] ata3.00: failed command: READ FPDMA QUEUED
[ 2892.988695] ata3.00: cmd 60/48:08:93:b0:ba/00:00:08:00:00/40 tag 1 ncq 36864 in
[ 2892.988698] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2892.988926] ata3.00: status: { DRDY }
[ 2892.988995] ata3: hard resetting link
[ 2893.308073] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2893.311007] ata3.00: _GTF unexpected object type 0x1
[ 2893.317617] ata3.00: _GTF unexpected object type 0x1
[ 2893.321210] ata3.00: configured for UDMA/133
[ 2893.321220] ata3.00: device reported invalid CHS sector 0
[ 2893.321227] ata3.00: device reported invalid CHS sector 0
[ 2893.321243] ata3: EH complete

Acer TravelMate 6292
Ubuntu 10.04
/dev/sda5 on / type ext3 (rw,relatime,errors=remount-ro)

Most of the errors aren't visible with dmesg but rather with a test console.
Please help.
TIA

Christian Reis (kiko) wrote :

This seems to actually not be fixed for Maverick. I upgraded today, and in looking through my dmesg I see, again
[ 2766.533670] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 2766.533675] ata3.00: irq_stat 0x40000008
[ 2766.533679] ata3.00: failed command: READ FPDMA QUEUED
[ 2766.533687] ata3.00: cmd 60/00:00:7a:f5:2a/01:00:0a:00:00/40 tag 0 ncq 131072 in
[ 2766.533689] res 40/00:00:7a:f5:2a/00:00:0a:00:00/40 Emask 0x401 (device error) <F>
[ 2766.533692] ata3.00: status: { DRDY }
[ 2766.534991] ata3.00: configured for UDMA/100
[ 2766.535001] ata3: EH complete
[ 2766.576800] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 2766.576806] ata3.00: irq_stat 0x40000008
[ 2766.576810] ata3.00: failed command: READ FPDMA QUEUED
[ 2766.576818] ata3.00: cmd 60/00:00:7a:f5:2a/01:00:0a:00:00/40 tag 0 ncq 131072 in
[ 2766.576819] res 40/00:00:7a:f5:2a/00:00:0a:00:00/40 Emask 0x401 (device error) <F>
[ 2766.576823] ata3.00: status: { DRDY }
[ 2766.578039] ata3.00: configured for UDMA/100
[ 2766.578052] ata3: EH complete
[ 2766.619225] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0

etc. Is this a separate issue, or the same problem occurring with 2.6.35? This is a Thinkpad x61s:

00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)

Andy Whitcroft (apw) wrote :

Ok it seems this is not fixed in Maverick, as Natty is now open the development task is for that so I have nominated this for Maverick so we can fix it there. I have also reopened the Natty tasks as we do not know if this is fixed there or not.

I suspect the expedient thing to do is pull forward the fix for pm-utils-powersave-policy to Maverick and fix it that way there. It would also be useful if someone who is affected by this issue could test a late Natty kernel and report back if that fixes things for Natty or not.

Changed in pm-utils-powersave-policy (Ubuntu):
status: Won't Fix → New
Changed in pm-utils (Ubuntu Natty):
importance: Undecided → High
tags: added: regression-release
removed: regression-potential
Roman Yepishev (rye) wrote :

Running Linux buzz 2.6.37-11-generic #25-Ubuntu SMP Tue Dec 21 23:42:56 UTC 2010 x86_64 GNU/Linux (latest natty kernel at the time of writing)
pm-utils:
  Installed: 1.4.1-4

After unplugging the AC the pm-suspend log gets
"""
/usr/lib/pm-utils/power.d/readahead true: success.
Running hook /usr/lib/pm-utils/power.d/sata_alpm true:
Setting SATA APLM on host2 to min_power...Done.
Setting SATA APLM on host3 to min_power...Done.
Setting SATA APLM on host4 to min_power...Done.
Setting SATA APLM on host5 to min_power...Done.
"""
And subsequent usage of the system is rather hard. HDD led stays on for prolonged periods of time, dmesg hangs for some seconds (i believe binary is being read from disk), then the following can be found in the output:
"""
[13870.046027] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x150000 action 0x6 frozen
[13870.046044] ata3: SError: { PHYRdyChg CommWake Dispar }
[13870.046052] ata3.00: failed command: SET FEATURES
[13870.046070] ata3.00: cmd ef/05:fe:00:00:00/00:00:00:00:00/40 tag 0
[13870.046073] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[13870.046081] ata3.00: status: { DRDY }
[13870.046094] ata3: hard resetting link
[13870.390116] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[13870.693978] ata3.00: configured for UDMA/133
[13870.694151] ata3: EH complete
"""

I have disabled the aplm control by by placing
"""
#!/bin/sh
export SATA_ALPM_ENABLE=false
"""
to /etc/pm/config.d/00-no-sata-alpm but that is just a workaround which the users will not be really happy with if it does not work out-of-the-box.

Andy Whitcroft (apw) on 2011-01-05
Changed in linux (Ubuntu Natty):
importance: Undecided → High
status: Fix Released → New
Stefan Bader (smb) wrote :

The safest thing going forward would be to set the default of SATA link power managment to false. Of course this prevents seeing problems and further reports. As this is likely some problem which only happens on certain controllers, certain drives or the combination of both it would be nice to have a conditional quirking mechanism (or those cases fixed if possible). That however requires knowledge about what is broken and what not.

So for those knowing to be broken in Natty (or current upstream), if we can add "sudo lspci -vvvnn" (for the controller) and "sudo hdparm -i /dev/sd..." (for the drive) output here. And we need to think of a way to summarize the info somewhere, so it does not require to read though tons of comments each time...

voss749 (voss749) wrote :

Im using Mint 10 which is based on Maverick and im experiencing the same problem so it was NOT fixed in Maverick.
However it was also effecting my previous debian squeeze installation so if you have a fix you might want to share it with the debian folks. Im hoping a Maverick fix will come soon or else I'll go back to Mint 9 which is based on Lucid and apply the Lucid fix.

Roman Yepishev (rye) wrote :

Here is the info for my machine.
lspci -vvvn

Roman Yepishev (rye) wrote :

hdparm -i /dev/sdb:

 Model=WDC WD2500BEVS-22UST0, FwRev=01.01A01, SerialNo=WD-WXE108A79290
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7

Changed in linux:
status: Unknown → Confirmed
Martin Pitt (pitti) wrote :

Closing invalid tasks. In lucid the hook was in pm-utils-powersave-policy, in maverick onwards it got merged into pm-utils itself.

So we should disable /usr/lib/pm-utils/power.d/sata_alpm entirely by default?

Changed in pm-utils-powersave-policy (Ubuntu Maverick):
status: New → Invalid
Changed in pm-utils-powersave-policy (Ubuntu Natty):
status: New → Invalid
Changed in pm-utils (Ubuntu Lucid):
status: New → Invalid
Stefan Bader (smb) wrote :

Yes I would vote for it. Since we cannot say for sure which hardware is safe or not. And giving disk corruption is quite serious to find out that it is not safe.

Stefan Bader (smb) wrote :

And then it would be awesome if people having the problems could try out some recent mainline kernels (http://kernel.ubuntu.com/~kernel-ppa/mainline/) to see whether the problem may have been fixed in between.

Martin Pitt (pitti) wrote :

Disabled by default in current Debian git packaging head now.

Changed in pm-utils (Ubuntu Natty):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pm-utils - 1.4.1-5

---------------
pm-utils (1.4.1-5) experimental; urgency=low

  * Add 13-49bluetooth-sync.patch: Wait for btusb module to get unused, so
    that you can remove it in SUSPEND_MODULES. (LP: #698331)
  * Add 14-disable-sata-alpm.patch: Disable SATA link power management by
    default, as it still causes disk errors and corruptions on many hardware.
    (LP: #539467)
 -- Martin Pitt <email address hidden> Tue, 01 Feb 2011 16:11:40 +0100

Changed in pm-utils (Ubuntu Natty):
status: Fix Committed → Fix Released
Changed in linux:
importance: Unknown → Medium
Jeremy Foshee (jeremyfoshee) wrote :

Marked kernel tasks incomplete pending the results of response to Stefan's Comment #55.

~JFo

Changed in linux (Ubuntu Maverick):
status: New → Incomplete
Changed in linux (Ubuntu Natty):
status: New → Incomplete
Torsten Spindler (tspindler) wrote :

For testing purposes I run 2.6.38-020638rc5-generic #201102160907 from the source Stefan gave. This is on a Lucid system with pm-utils 1.3.0-1ubuntu3 and pm-utils-powersave-policy 0.3.1 installed. Do I need to change something there to make my testing valid?

Stefan Bader (smb) wrote :

As the pm-utils have the feature disabled, the test case would be:

ls -la /sys/block/sd?/device

Note the link points to h:b:d:l, where h is the host number. Then change into

cd /sys/class/scsi_host/host?/

in there is a link_power_management_policy into which (as root) "min_power" and "max_performace" can be written. The latter should be there by default. So one would twiddle from max_performance to min_power and back. Then do some reading operations involving the disk. In my case I seem to get

ata3: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[46446.498522] ata3: irq_stat 0x00400000, PHY RDY changed
[46446.498538] ata3: SError: { PHYRdyChg CommWake }
[46446.498559] ata3: hard resetting link
[46447.220211] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[46447.222302] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[46447.242143] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[46447.242603] ata3.00: configured for UDMA/133
[46447.242623] ata3: EH complete

once in dmesg and without any visible effects on the reading operations. In the bad case there would be hard errors (involving timeouts), repeating and with errors reported to the reading operation.

Torsten Spindler (tspindler) wrote :

I've run the following test:

while true
> do
> cat link_power_management_policy
> echo max_performance | sudo tee link_power_management_policy
> sleep 3
> echo min_power | sudo tee link_power_management_policy
> sleep 3
> done

While this loop is running, I've created some artificial fs activity:

while true
> do
> cat /boot/vmlinuz-2.6.38-020638rc5-generic > /dev/null
> sleep 1
> done

And also:

for n in $(seq 0 9)
> do
> sudo find /usr -type f -iname \*$n\* -exec cat {} > /dev/null \;
> done

I have found no signs of excepetion Emask in kern.log, the test was run
for more than 2 hours.

For completeness, uname -a:
Linux spitfire 2.6.38-020638rc5-generic #201102160907 SMP Wed Feb 16
10:18:56 UTC 2011 i686 GNU/Linux

Stefan Bader (smb) wrote :

For completeness, what disk controller and brand/type of disk did you test with? As said above, this highly depends on the hardware involved.

Stefan Bader (smb) wrote :

To head my own advice: I would not see problems even in 10.04 with:
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02)
and WD1600BEVT-2.

Torsten Spindler (tspindler) wrote :

This one:

$ sudo lspci -vvnn | grep -i SATA
00:1f.2 SATA controller [0106]: Intel Corporation ICH9M/M-E SATA AHCI
Controller [8086:2929] (rev 03) (prog-if 01)
 Capabilities: [a8] SATA HBA <?>

Chow Loong Jin (hyperair) wrote :

On Thursday 17,February,2011 09:59 PM, Torsten Spindler wrote:
> This one:
>
> $ sudo lspci -vvnn | grep -i SATA
> 00:1f.2 SATA controller [0106]: Intel Corporation ICH9M/M-E SATA AHCI
> Controller [8086:2929] (rev 03) (prog-if 01)
> Capabilities: [a8] SATA HBA <?>
>

No issues with this one:-

00:1f.2 SATA controller [0106]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E)
SATA AHCI Controller [8086:2829] (rev 03) (prog-if 01 [AHCI 1.0]) with a
WDC_WD5000BEVT-80A0RT0 disk.

However, the controller appears as a SATA-IDE controller (8086:2828) that uses
the piix driver by default, and you need a custom kernel (AFAICT) to quirk the
8086:2828 controller into AHCI mode whereby it turns into 8086:2829, and uses
the ahci driver, allowing for the SATA link power management.

--
Kind regards,
Loong Jin

red_hood (chris-red-hood) wrote :

My hardware details:

Lenovo T61, 7664-18G with BIOS Version 2.26

$ lspci -vvnn |grep -i sata
00:1f.2 SATA controller [0106]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller [8086:2829] (rev 03) (prog-if 01 [AHCI 1.0])

Hard drive is:

$ smartctl -a /dev/sda

Model Family: Western Digital Scorpio Blue Serial ATA family
Device Model: WDC WD5000BEVT-00ZAT0
Firmware Version: 01.01A01

Kernel is:

$ uname -a
Linux redlap 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux

When I do

$ echo "min_power" > /sys/class/scsi_host/host2/link_power_management_policy

there is no message from the kernel.

Switching back to

$ echo "max_performance" > /sys/class/scsi_host/host2/link_power_management_policy

the following messages appear:

[47694.108618] ata3: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[47694.108626] ata3: irq_stat 0x00400000, PHY RDY changed
[47694.108635] ata3: SError: { PHYRdyChg CommWake }
[47694.108646] ata3: hard resetting link
[47694.850173] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[47694.853238] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[47694.853250] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[47694.853259] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[47694.860000] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[47694.860051] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[47694.860062] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[47694.862814] ata3.00: configured for UDMA/133
[47694.868797] ata3.00: configured for UDMA/133
[47694.868804] ata3: EH complete

Seems it's still the same behavior in Maverick as it was before in Lucid.

The original hard drive delivered with the Notebook was from Hitachi, so maybe there is some kind of incompatibility between the controller/BIOS and the WD drive. If I find some time, I'll insert the original one and repeat the test.

Roman Yepishev (rye) wrote :

uname -r: 2.6.38-020638rc5-generic

Model Family: Western Digital Scorpio Blue Serial ATA family
Device Model: WDC WD2500BEVS-22UST0
Serial Number: WD-WXE108A79290
Firmware Version: 01.01A01

See also comment #51 for my lspci

In my case the behavior is the same as with older kernels, however i changed the link power management policy on all the hosts with the following script:

#!/bin/sh
for X in /sys/class/scsi_host/*/link_power_management_policy; do
    echo "$1" > $X
done

used as sudo ./sata-link-policy min_power

Here's what i get - the drive led lights continuously for about 10 seconds during which any hdd access results in hanging process:
[12348.040077] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x150000 action 0x6 frozen
[12348.040086] ata3: SError: { PHYRdyChg CommWake Dispar }
[12348.040091] ata3.00: failed command: READ FPDMA QUEUED
[12348.040099] ata3.00: cmd 60/10:00:b0:94:c5/00:00:03:00:00/40 tag 0 ncq 8192 in
[12348.040101] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[12348.040104] ata3.00: status: { DRDY }
[12348.040112] ata3: hard resetting link
[12348.390082] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[12348.404414] ata3.00: configured for UDMA/133
[12348.404550] ata3.00: device reported invalid CHS sector 0
[12348.404570] ata3: EH complete

Cabalbl4 (i-vohmin) wrote :

Acer ASPIRE 3820 TG, unplugging power gives this in dmesg:

[ 857.224924] EXT4-fs (sda3): re-mounted. Opts: errors=remount-ro,commit=600
[ 869.230484] EXT4-fs (sda3): re-mounted. Opts: errors=remount-ro,commit=0
[ 869.416988] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[ 869.416991] ata1: irq_stat 0x00400000, PHY RDY changed
[ 869.416993] ata1: SError: { PHYRdyChg CommWake }
[ 869.416998] ata1: hard resetting link
[ 870.170801] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 870.175479] ata1.00: configured for UDMA/133
[ 870.175487] ata1: EH complete

This can be avoided by changing AHCI to IDE mode in bios.

Cabalbl4 (i-vohmin) wrote :

The distro is Maverick
uname -a
Linux hagel-Aspire-3820 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux

Cabalbl4 (i-vohmin) wrote :

sudo lspci -vvnn | grep -i SATA
[00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA AHCI Controller [8086:3b29] (rev 05) (prog-if 01 [AHCI 1.0])
 Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004

Stefan Bader (smb) wrote :

@Roman, there was a question from upstream whether using "medium_power" instead of "min_power" does not lead to problems. Could you give that a try? Thanks.

Roman Yepishev (rye) wrote :

@Stefan, I set the policy to "medium_power" and there is actually no difference in machine behavior at all, everything is working as expected!

Changed in linux (Ubuntu Natty):
status: Incomplete → Triaged
Changed in pm-utils-powersave-policy (Ubuntu Lucid):
assignee: Chase Douglas (chasedouglas) → nobody
Changed in pm-utils-powersave-policy (Ubuntu Natty):
assignee: Chase Douglas (chasedouglas) → nobody
Andy Whitcroft (apw) on 2011-02-24
Changed in linux (Ubuntu Maverick):
status: Incomplete → Triaged
Stefan Bader (smb) wrote :

There is a proposed change from upstream to prevent mcp65 (nvidia) based controllers from causing problems with minimal power settings. Roman, and all that are on Maverick or Natty and got that type of ahci controller, I create natty (2.6.38) based kernels with that patch applied. It should be possible to use those packages for Maverick, too. If anybody can install one of the kernels at http://people.canonical.com/~smb/lp539467/, boot it and try to set the minmal power policy. Then report back here with the results. Thanks.

Roman Yepishev (rye) wrote :

Stefan, confirming that on the kernel you've built setting the link policy to min_power does not cause the system to misbehave.
Linux buzz 2.6.38-6-generic #34+lp539467 SMP Fri Mar 11 14:00:36 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

rkrizan (ryankrizan) wrote :

Is this bug limited to the 64-bit kernel or is 32-bit kernel affected as well?

On 03/22/2011 10:15 PM, rkrizan wrote:
> Is this bug limited to the 64-bit kernel or is 32-bit kernel affected as
> well?
>
It is depending on your ahci controller card. Whether using 64bit or 32bit does
not matter.

rkrizan (ryankrizan) wrote :

What's the overall status on this issue? Still actively being investigated, or is there a fix that I've missed?

capone:/tmp# lspci -vvvnn | grep -i sata
00:1f.2 SATA controller [0106]: Intel Corporation ICH9M/M-E SATA AHCI Controller [8086:2929] (rev 03) (prog-if 01 [AHCI 1.0])
 Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
capone:/tmp#

capone:/tmp# hdparm -i /dev/sda1

/dev/sda1:

 Model=WDC WD3200BEVT-75ZCT2, FwRev=11.01A11, SerialNo=WD-WXEY08JR7234
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=8
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode

capone:/tmp#

Stefan Bader (smb) wrote :

As issues with the link power management are so controller specific, the solutions usually only fix one type. The last upstream changes added some framework to prevent bad effects only for a Nvidia controller. I would highly suggest to open individual bug reports for each controller.
When doing so, could you please add the exact symptom (like whether the fs gets remounted read-only or whether there are only error message). Also try using medium_power written to the policy as described above. Does that work ok?
And finally (if you are not already using a natty kernel) get the 2.6.38 mainline kernel from get http://kernel.ubuntu.com/~kernel-ppa/mainline/ to see whether by now that controller works. Generally when I asked about this upstream I got the impression that ICH controllers were supposed to work. Of course there are always exceptions...

Changed in linux (Ubuntu Lucid):
importance: Undecided → Low
Changed in linux (Ubuntu Maverick):
importance: Undecided → Medium
Changed in linux (Ubuntu Natty):
milestone: none → natty-updates
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in pm-utils (Ubuntu Maverick):
status: New → Confirmed
Changed in linux:
status: Confirmed → Expired
Stefan Bader (smb) on 2013-02-27
Changed in linux (Ubuntu Natty):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
milestone: natty-updates → none
Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
milestone: natty-updates → none
dino99 (9d9) wrote :
Changed in pm-utils (Ubuntu Maverick):
status: Confirmed → Invalid
Changed in linux (Ubuntu Natty):
status: Triaged → Invalid
Changed in linux (Ubuntu Maverick):
status: Triaged → Invalid
Changed in linux (Ubuntu):
status: Triaged → Invalid

I think I see this with Ubuntu saucy (Kernel 3.11.0-12-generic 64 bit.), on a Thinkpad T520. I see it even when running on external power and 'link_power_management_policy' set to 'max_performance':

[23391.818508] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[23391.818513] ata4: irq_stat 0x00000040, connection status changed
[23391.818516] ata4: SError: { CommWake DevExch }
[23391.818523] ata4: limiting SATA link speed to 1.5 Gbps
[23391.818525] ata4: hard resetting link
[23392.538993] ata4: SATA link down (SStatus 0 SControl 310)
[23392.554893] ata4: EH complete

nanog (sorenimpey) wrote :

I am getting this in freshly upgraded Trusty Tahr.

It previsely mirrors the previously reported bug in every respect and I always get the following kernel disk error when I attempt to resume after plugging back the power cord in when running on battery power:

Mar 29 11:01:20 lenovo kernel: [ 51.906860] ideapad_laptop: Unknown event: 1
Mar 29 11:01:20 lenovo kernel: [ 52.029487] ata2.00: exception Emask 0x10 SAct 0x1 SErr 0x50000 action 0xe frozen
Mar 29 11:01:20 lenovo kernel: [ 52.029493] ata2.00: irq_stat 0x00400000, PHY RDY changed
Mar 29 11:01:20 lenovo kernel: [ 52.029496] ata2: SError: { PHYRdyChg CommWake }
Mar 29 11:01:20 lenovo kernel: [ 52.029499] ata2.00: failed command: READ FPDMA QUEUED
Mar 29 11:01:20 lenovo kernel: [ 52.029505] ata2.00: cmd 60/00:00:20:88:00/01:00:0f:00:00/40 tag 0 ncq 131072 in
Mar 29 11:01:20 lenovo kernel: [ 52.029505] res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
Mar 29 11:01:20 lenovo kernel: [ 52.029507] ata2.00: status: { DRDY }

A new bug was filed but I think this bug should be re-opened and my report marked as a duplicate.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1299567

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.