Bug #539467 “SATA link power management causes disk errors and c...” : Bugs : linux package : Ubuntu

Revision history for this message

Chris Coulson (chrisccoulson) wrote on 2010-03-16:

#1

AlsaDevices.txt Edit (760 bytes, text/plain; charset="utf-8")
AplayDevices.txt Edit (389 bytes, text/plain; charset="utf-8")
ArecordDevices.txt Edit (285 bytes, text/plain; charset="utf-8")
BootDmesg.txt Edit (49.9 KiB, text/plain; charset="utf-8")
Card0.Amixer.values.txt Edit (2.0 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec.0.txt Edit (7.5 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec.2.txt Edit (982 bytes, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (5.3 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (1.3 KiB, text/plain; charset="utf-8")
IwConfig.txt Edit (631 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (14.5 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (1.2 KiB, text/plain; charset="utf-8")
PciMultimedia.txt Edit (586 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (1.5 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.6 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (6.1 KiB, text/plain; charset="utf-8")
RfKill.txt Edit (240 bytes, text/plain; charset="utf-8")
UdevDb.txt Edit (145.9 KiB, text/plain; charset="utf-8")
UdevLog.txt Edit (344.1 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (100.2 KiB, text/plain; charset="utf-8")

Revision history for this message

Torsten Spindler (tspindler) wrote on 2010-03-17:

#2

I'm also affected by this, running kernel 2.6.32-16-generic #25-Ubuntu SMP Tue Mar 9 16:33:52 UTC 2010 i686 GNU/Linux
$ dpkg -l linux-image
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii linux-image 2.6.32.16.17 Generic Linux kernel image.

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-03-17:

#3

Chris,
I'm adding this to my list to be reviewed by the team.

Thanks!

~JFo

Changed in linux (Ubuntu):
status:	New → Triaged

Revision history for this message

Surbhi Palande (csurbhi) wrote on 2010-03-22:

#4

Chris Coulson, the description that you have posted, seems to be a part of a dmesg. Will it be possible to post that dmesg output? That will be helpful!

Revision history for this message

Surbhi Palande (csurbhi) wrote on 2010-03-22:

#5

Chris Coulson, can you also try booting the lucid kernel with acpi=off kernel parameter, and check if you get these same errors?

Revision history for this message

Christian Reis (kiko) wrote on 2010-03-30:

#6

I picked up 2.6.31-02063112-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and have only rebooted once, but the problem hasn't manifest itself yet, and for every other kernel it manifested itself upon boot.

Revision history for this message

Christian Reis (kiko) wrote on 2010-03-30:

#7

A dmesg dump of working 2.6.31-02063112-generic Edit (49.7 KiB, text/plain)

Revision history for this message

Stefan Bader (smb) wrote on 2010-03-30:

#8

Chris, actually the dmesg of Christian does not help that much as I realized later. So if you could get a working kernel booted and just post the dmesg from that. Thanks.

Stefan Bader (smb) on 2010-03-30

Changed in linux (Ubuntu):
assignee:	nobody → Stefan Bader (stefan-bader-canonical)

Revision history for this message

Crashbit (crashbit-gmail) wrote on 2010-03-31:

#9

dmesg.txt Edit (79.2 KiB, text/plain)

Download full text (3.4 KiB)

I have a similar problem.

I have a new main board Asus P7P55D-E LX, with two SATA controllers, jmicron and marvell 9123.
I have connected a Seagate Barracuda XT 6Gb/s hard drive to marvell controller and SSD drive to a jmicron controller.

SSD drive works perfectly, but barracuda don't work properly.

I used a 2.6.32 and 2.6.33 kernel, and connects barracuda hard drive to a jmicron controller, but the problems persist.
I used a 2.6.34-rc2 kernel, and the problems are solved.

The problems appear when I trying to access barracuda hard disk.

I don't know is the same problem, or if they are caused by AHCI controller, but this is my dmesg errors.:
9.205423] groups: 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178)
[ 11.491620] ata9: exception Emask 0x0 SAct 0xe SErr 0x0 action 0x10 frozen
[ 11.491623] ata9.00: failed command: READ FPDMA QUEUED
[ 11.491627] ata9.00: cmd 60/02:08:81:74:e0/00:00:e8:00:00/40 tag 1 ncq 1024 in
[ 11.491628] res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[ 11.491629] ata9.00: status: { DRDY }
[ 11.491631] ata9.00: failed command: READ FPDMA QUEUED
[ 11.491634] ata9.00: cmd 60/02:10:83:74:e0/00:00:e8:00:00/40 tag 2 ncq 1024 in
[ 11.491634] res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[ 11.491636] ata9.00: status: { DRDY }
[ 11.491637] ata9.00: failed command: READ FPDMA QUEUED
[ 11.491640] ata9.00: cmd 60/02:18:85:74:e0/00:00:e8:00:00/40 tag 3 ncq 1024 in
[ 11.491641] res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[ 11.491642] ata9.00: status: { DRDY }
[ 11.491650] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 11.491652] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 11.491654] Descriptor sense data with sense descriptors (in hex):
[ 11.491655] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 11.491660] e8 e0 74 85
[ 11.491661] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 11.491663] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 81 00 00 02 00
[ 11.491668] end_request: I/O error, dev sdb, sector 3907024001
[ 11.491678] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 11.491680] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 11.491682] Descriptor sense data with sense descriptors (in hex):
[ 11.491683] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 11.491687] e8 e0 74 85
[ 11.491689] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 11.491691] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 83 00 00 02 00
[ 11.491694] end_request: I/O error, dev sdb, sector 3907024003
[ 11.491697] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 11.491699] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 11.491701] Descriptor sense data with sense descriptors (in hex):
[ 11.491702] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 11.491706] e8 e0 74 85
[ 11.491708] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 11.491709] ...

I have a similar problem.

I have a new main board Asus P7P55D-E LX, with two SATA controllers, jmicron and marvell 9123.
I have connected a Seagate Barracuda XT 6Gb/s hard drive to marvell controller and SSD drive to a jmicron controller.

SSD drive works perfectly, but barracuda don't work properly.

I used a 2.6.32 and 2.6.33 kernel, and connects barracuda hard drive to a jmicron controller, but the problems persist.
I used a 2.6.34-rc2 kernel, and the problems are solved.

The problems appear when I trying to access barracuda hard disk.

I don't know is the same problem, or if they are caused by AHCI controller, but this is my dmesg errors.:
   9.205423]    groups: 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178)
[   11.491620] ata9: exception Emask 0x0 SAct 0xe SErr 0x0 action 0x10 frozen
[   11.491623] ata9.00: failed command: READ FPDMA QUEUED
[   11.491627] ata9.00: cmd 60/02:08:81:74:e0/00:00:e8:00:00/40 tag 1 ncq 1024 in
[   11.491628]          res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[   11.491629] ata9.00: status: { DRDY }
[   11.491631] ata9.00: failed command: READ FPDMA QUEUED
[   11.491634] ata9.00: cmd 60/02:10:83:74:e0/00:00:e8:00:00/40 tag 2 ncq 1024 in
[   11.491634]          res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[   11.491636] ata9.00: status: { DRDY }
[   11.491637] ata9.00: failed command: READ FPDMA QUEUED
[   11.491640] ata9.00: cmd 60/02:18:85:74:e0/00:00:e8:00:00/40 tag 3 ncq 1024 in
[   11.491641]          res 40/00:18:85:74:e0/00:00:e8:00:00/40 Emask 0x4 (timeout)
[   11.491642] ata9.00: status: { DRDY }
[   11.491650] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   11.491652] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[   11.491654] Descriptor sense data with sense descriptors (in hex):
[   11.491655]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[   11.491660]         e8 e0 74 85
[   11.491661] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[   11.491663] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 81 00 00 02 00
[   11.491668] end_request: I/O error, dev sdb, sector 3907024001
[   11.491678] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   11.491680] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[   11.491682] Descriptor sense data with sense descriptors (in hex):
[   11.491683]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[   11.491687]         e8 e0 74 85
[   11.491689] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[   11.491691] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 83 00 00 02 00
[   11.491694] end_request: I/O error, dev sdb, sector 3907024003
[   11.491697] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   11.491699] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[   11.491701] Descriptor sense data with sense descriptors (in hex):
[   11.491702]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[   11.491706]         e8 e0 74 85
[   11.491708] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[   11.491709] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 e8 e0 74 85 00 00 02 00
[   11.491713] end_request: I/O error, dev sdb, sector 3907024005
[   18.491162] sd 8:0:0:0: timing out command, waited 7s
[   25.489905] sd 8:0:0:0: timing out command, waited 7s

Revision history for this message

Wolfgang Rohdewald (wolfgang-rohdewald) wrote on 2010-04-11:

#10

Download full text (5.0 KiB)

happens here too. Always when I am away, maybe some problem with suspending?

Kubuntu Lucid, all updates installed
Dell Studio 1749 (Intel i520)
Disk: TOSHIBA MK5056GSY

Apr 11 18:57:22 localhost kernel: [ 1852.336426] CPU0 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336430] CPU1 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336432] CPU2 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336434] CPU3 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.415822] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr 11 18:57:22 localhost kernel: [ 1852.415827] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 11 18:57:22 localhost kernel: [ 1852.415831] ata5: SError: { PHYRdyChg CommWake }
Apr 11 18:57:22 localhost kernel: [ 1852.415837] ata5: hard resetting link
Apr 11 18:57:22 localhost kernel: [ 1852.429190] CPU0 attaching sched-domain:
Apr 11 18:57:22 localhost kernel: [ 1852.429193] domain 0: span 0,2 level SIBLING
Apr 11 18:57:22 localhost kernel: [ 1852.429195] groups: 0 (cpu_power = 589) 2 (cpu_power = 589)
Apr 11 18:57:22 localhost kernel: [ 1852.429199] domain 1: span 0-3 level MC
Apr 11 18:57:22 localhost kernel: [ 1852.429201] groups: 0,2 (cpu_power = 1178) 1,3 (cpu_power = 1178)

another one:
Apr 11 14:11:19 localhost kernel: [ 3518.250989] CPU3 attaching sched-domain:
Apr 11 14:11:19 localhost kernel: [ 3518.250990] domain 0: span 1,3 level SIBLING
Apr 11 14:11:19 localhost kernel: [ 3518.250992] groups: 3 (cpu_power = 589) 1 (cpu_power = 589)
Apr 11 14:11:19 localhost kernel: [ 3518.250995] domain 1: span 0-3 level MC
Apr 11 14:11:19 localhost kernel: [ 3518.250997] groups: 1,3 (cpu_power = 1178) 0,2 (cpu_power = 1178)
Apr 11 14:11:19 localhost kernel: [ 3518.259013] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr 11 14:11:19 localhost kernel: [ 3518.259018] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 11 14:11:19 localhost kernel: [ 3518.259022] ata5: SError: { PHYRdyChg CommWake }
Apr 11 14:11:19 localhost kernel: [ 3518.259028] ata5: hard resetting link
Apr 11 14:11:19 localhost kernel: [ 3519.000927] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 11 14:11:19 localhost kernel: [ 3519.030125] ata5.00: configured for UDMA/100
Apr 11 14:11:19 localhost kernel: [ 3519.031779] ata5: EH complete
Apr 4 22:53:40 localhost kernel: [ 3.288635] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 4 22:53:40 localhost kernel: [ 3.303600] ata5.00: ATAPI: TSSTcorp DVD+/-RW TS-T633C, D700, max UDMA/100, ATAPI AN
Apr 4 22:53:40 localhost kernel: [ 3.303614] ata5.00: applying bridge limits
Apr 4 22:53:40 localhost kernel: [ 3.318833] ata5.00: configured for UDMA/100
Apr 4 22:53:40 localhost kernel: [ 3.338543] ata5: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
Apr 4 22:53:40 localhost kernel: [ 3.338547] ata5: irq_stat 0x40000008
Apr 4 22:53:40 localhost kernel: [ 3.340653] scsi 4:0:0:0: CD-ROM TSSTcorp DVD+-RW TS-T633C D700 PQ: 0 ANSI: 5
Apr 4 22:53:40 localhost kernel: [ 3.346733] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/for...

happens here too. Always when I am away, maybe some problem with suspending?

Kubuntu Lucid, all updates installed
Dell Studio 1749 (Intel i520)
Disk: TOSHIBA MK5056GSY

Apr 11 18:57:22 localhost kernel: [ 1852.336426] CPU0 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336430] CPU1 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336432] CPU2 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.336434] CPU3 attaching NULL sched-domain.
Apr 11 18:57:22 localhost kernel: [ 1852.415822] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr 11 18:57:22 localhost kernel: [ 1852.415827] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 11 18:57:22 localhost kernel: [ 1852.415831] ata5: SError: { PHYRdyChg CommWake }
Apr 11 18:57:22 localhost kernel: [ 1852.415837] ata5: hard resetting link
Apr 11 18:57:22 localhost kernel: [ 1852.429190] CPU0 attaching sched-domain:
Apr 11 18:57:22 localhost kernel: [ 1852.429193]  domain 0: span 0,2 level SIBLING
Apr 11 18:57:22 localhost kernel: [ 1852.429195]   groups: 0 (cpu_power = 589) 2 (cpu_power = 589)
Apr 11 18:57:22 localhost kernel: [ 1852.429199]   domain 1: span 0-3 level MC
Apr 11 18:57:22 localhost kernel: [ 1852.429201]    groups: 0,2 (cpu_power = 1178) 1,3 (cpu_power = 1178)

another one:
Apr 11 14:11:19 localhost kernel: [ 3518.250989] CPU3 attaching sched-domain:
Apr 11 14:11:19 localhost kernel: [ 3518.250990]  domain 0: span 1,3 level SIBLING
Apr 11 14:11:19 localhost kernel: [ 3518.250992]   groups: 3 (cpu_power = 589) 1 (cpu_power = 589)
Apr 11 14:11:19 localhost kernel: [ 3518.250995]   domain 1: span 0-3 level MC
Apr 11 14:11:19 localhost kernel: [ 3518.250997]    groups: 1,3 (cpu_power = 1178) 0,2 (cpu_power = 1178)
Apr 11 14:11:19 localhost kernel: [ 3518.259013] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr 11 14:11:19 localhost kernel: [ 3518.259018] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 11 14:11:19 localhost kernel: [ 3518.259022] ata5: SError: { PHYRdyChg CommWake }
Apr 11 14:11:19 localhost kernel: [ 3518.259028] ata5: hard resetting link
Apr 11 14:11:19 localhost kernel: [ 3519.000927] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 11 14:11:19 localhost kernel: [ 3519.030125] ata5.00: configured for UDMA/100
Apr 11 14:11:19 localhost kernel: [ 3519.031779] ata5: EH complete
Apr  4 22:53:40 localhost kernel: [    3.288635] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr  4 22:53:40 localhost kernel: [    3.303600] ata5.00: ATAPI: TSSTcorp DVD+/-RW TS-T633C, D700, max UDMA/100, ATAPI AN
Apr  4 22:53:40 localhost kernel: [    3.303614] ata5.00: applying bridge limits
Apr  4 22:53:40 localhost kernel: [    3.318833] ata5.00: configured for UDMA/100
Apr  4 22:53:40 localhost kernel: [    3.338543] ata5: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
Apr  4 22:53:40 localhost kernel: [    3.338547] ata5: irq_stat 0x40000008
Apr  4 22:53:40 localhost kernel: [    3.340653] scsi 4:0:0:0: CD-ROM            TSSTcorp DVD+-RW TS-T633C D700 PQ: 0 ANSI: 5
Apr  4 22:53:40 localhost kernel: [    3.346733] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray

and:
Apr  7 21:36:56 localhost kernel: [ 6348.521788] CPU0 attaching NULL sched-domain.
Apr  7 21:36:56 localhost kernel: [ 6348.521794] CPU1 attaching NULL sched-domain.
Apr  7 21:36:56 localhost kernel: [ 6348.521797] CPU2 attaching NULL sched-domain.
Apr  7 21:36:56 localhost kernel: [ 6348.521800] CPU3 attaching NULL sched-domain.
Apr  7 21:36:56 localhost kernel: [ 6348.544835] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr  7 21:36:56 localhost kernel: [ 6348.544840] ata1.00: irq_stat 0x00400000, PHY RDY changed
Apr  7 21:36:56 localhost kernel: [ 6348.544844] ata1: SError: { PHYRdyChg CommWake }
Apr  7 21:36:56 localhost kernel: [ 6348.544847] ata1.00: failed command: FLUSH CACHE EXT
Apr  7 21:36:56 localhost kernel: [ 6348.544853] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Apr  7 21:36:56 localhost kernel: [ 6348.544855]          res 50/00:80:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr  7 21:36:56 localhost kernel: [ 6348.544858] ata1.00: status: { DRDY }
Apr  7 21:36:56 localhost kernel: [ 6348.544864] ata1: hard resetting link
Apr  7 21:36:56 localhost kernel: [ 6348.600788] ata5: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Apr  7 21:36:56 localhost kernel: [ 6348.600792] ata5: irq_stat 0x00400000, PHY RDY changed
Apr  7 21:36:56 localhost kernel: [ 6348.600795] ata5: SError: { PHYRdyChg CommWake }
Apr  7 21:36:56 localhost kernel: [ 6348.600802] ata5: hard resetting link
Apr  7 21:36:56 localhost kernel: [ 6348.604140] CPU0 attaching sched-domain:
Apr  7 21:36:56 localhost kernel: [ 6348.604145]  domain 0: span 0,2 level SIBLING
Apr  7 21:36:56 localhost kernel: [ 6348.604148]   groups: 0 (cpu_power = 589) 2 (cpu_power = 589)
Apr  7 21:36:56 localhost kernel: [ 6348.604154]   domain 1: span 0-3 level MC
Apr  7 21:36:56 localhost kernel: [ 6348.604157]    groups: 0,2 (cpu_power = 1178) 1,3 (cpu_power = 1178)

Revision history for this message

Chase Douglas (chasedouglas) wrote on 2010-04-15:

#11

I hit the same issue:

[367890.287614] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x1e frozen
[367890.287623] ata1: irq_stat 0x00400001, PHY RDY changed
[367890.287632] ata1: SError: { PHYRdyChg CommWake }
[367890.287644] ata1: hard resetting link
[367891.008114] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[367891.053065] ata1.00: configured for UDMA/133
[367891.068116] ata1: EH complete
[367891.088172] end_request: I/O error, dev sda, sector 239506656
[367891.088213] Aborting journal on device sda1-8.
[367891.092737] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[367891.092751] EXT4-fs (sda1): Remounting filesystem read-only

Revision history for this message

Christian Reis (kiko) wrote on 2010-04-15:

#12

Chase, are you running the latest 32-21 kernel? I just upgraded and have yet to reproduce the error message in my kernel log when booting up or suspending. I was on 32-19 before, and I could definitely see the errors (and resulting corruption) there. I'd move to mark this fixed if it stays this way on -21 for me and others.

Revision history for this message

Chase Douglas (chasedouglas) wrote on 2010-04-16:

#13

@Christian:

I'm running -19, I haven't had a chance to upgrade yet. I've only seen this occur once after a few weeks of use, so I won't be able to definitively say this bug is fixed if I can't reproduce easily on -21. I hope it does resolve this issue though.

Thanks

Revision history for this message

Chris Halse Rogers (raof) wrote on 2010-04-16:

#14

It looks like I've just reproduced this on my Thinkpad x200s with the 2.6.32-21-generic kernel:

[ 5911.516865] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x1e frozen
[ 5911.516871] ata1: irq_stat 0x00400001, PHY RDY changed
[ 5911.516875] ata1: SError: { PHYRdyChg CommWake }
[ 5911.516881] ata1: hard resetting link
[ 5912.280041] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
(fuller dmesg here: http://paste.ubuntu.com/415384/ )

Revision history for this message

Stefan Bader (smb) wrote on 2010-04-16: Re: [Bug 539467] Re: Frequent ATA errors and disk corruption

#15

Chris Halse Rogers wrote:
> It looks like I've just reproduced this on my Thinkpad x200s with the
> 2.6.32-21-generic kernel:
>
> [ 5911.516865] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x1e frozen
> [ 5911.516871] ata1: irq_stat 0x00400001, PHY RDY changed
> [ 5911.516875] ata1: SError: { PHYRdyChg CommWake }
> [ 5911.516881] ata1: hard resetting link
> [ 5912.280041] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> (fuller dmesg here: http://paste.ubuntu.com/415384/ )
>
Do we have a somewhat more in depth description on what might be a reproducer?
What was going on at the time this happened? A higher level of I/O load or a
certain application running?

Revision history for this message

deadite66 (lee295012) wrote on 2010-04-18: Re: Frequent ATA errors and disk corruption

#16

i have the same problem but its constant to the point i can't even install lucid most of the time, windows 7 works perfectly so its not a hardware problem, i seen this bug or various hardware and ubuntu versions so i'm thinking the debian/ubuntu has a problem with the way drive access is setup in the kernel, fedora 12 doesn't have this problem.

[ 776.594450] ata7.00: status: { DRDY }
[ 776.594454] ata7: hard resetting link
[ 777.123108] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 777.125321] ata7.00: configured for UDMA/33
[ 777.125328] ata7: EH complete
[ 777.151827] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x280100 action 0x6 frozen
[ 777.151830] ata7.00: irq_stat 0x08000000, interface fatal error
[ 777.151833] ata7: SError: { UnrecovData 10B8B BadCRC }
[ 777.151836] ata7.00: failed command: READ FPDMA QUEUED
[ 777.151841] ata7.00: cmd 60/08:00:bf:28:54/00:00:02:00:00/40 tag 0 ncq 4096 in
[ 777.151842] res 40/00:00:bf:28:54/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[ 777.151844] ata7.00: status: { DRDY }
[ 777.151847] ata7: hard resetting link
[ 777.682849] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 777.685059] ata7.00: configured for UDMA/33
[ 777.685065] ata7: EH complete
[ 777.798069] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x280100 action 0x6 frozen
[ 777.798072] ata7.00: irq_stat 0x08000000, interface fatal error
[ 777.798075] ata7: SError: { UnrecovData 10B8B BadCRC }
[ 777.798078] ata7.00: failed command: READ FPDMA QUEUED
[ 777.798083] ata7.00: cmd 60/08:00:39:4b:38/00:00:3a:00:00/40 tag 0 ncq 4096 in
[ 777.798084] res 40/00:00:39:4b:38/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)
[ 777.798086] ata7.00: status: { DRDY }
[ 777.798089] ata7: hard resetting link
[ 778.322554] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 778.324640] ata7.00: configured for UDMA/33
[ 778.324648] ata7: EH complete

Revision history for this message

Chris Halse Rogers (raof) wrote on 2010-04-19: Re: [Bug 539467] Re: Frequent ATA errors and disk corruption

#17

I've got no idea how to reproduce this. For me, it only occurs on the
order of once or twice a month. It's difficult to remember what was
happening across those times.

I seem to have a slight association in my mind with this happening after
at least one resume-from-suspend, but since I suspend this laptop all
the time, there shouldn't be too much stock put in this.

Revision history for this message

red_hood (chris-red-hood) wrote on 2010-04-21: Re: Frequent ATA errors and disk corruption

#18

Download full text (4.4 KiB)

Same here on Thinkpad T61:

[22359.500669] ata3.00: exception Emask 0x10 SAct 0x3 SErr 0x50000 action 0xe frozen
[22359.500675] ata3.00: irq_stat 0x00400000, PHY RDY changed
[22359.500680] ata3: SError: { PHYRdyChg CommWake }
[22359.500685] ata3.00: failed command: READ FPDMA QUEUED
[22359.500694] ata3.00: cmd 60/20:00:08:7f:5e/00:00:00:00:00/40 tag 0 ncq 16384 in
[22359.500697] res 40/00:0c:40:7f:5e/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[22359.500701] ata3.00: status: { DRDY }
[22359.500705] ata3.00: failed command: READ FPDMA QUEUED
[22359.500714] ata3.00: cmd 60/48:08:40:7f:5e/00:00:00:00:00/40 tag 1 ncq 36864 in
[22359.500716] res 40/00:0c:40:7f:5e/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[22359.500720] ata3.00: status: { DRDY }
[22359.500728] ata3: hard resetting link
[22360.250117] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[22360.252801] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[22360.252812] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[22360.252821] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[22360.256923] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[22360.256934] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[22360.256943] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[22360.258711] ata3.00: configured for UDMA/133
[22360.263591] ata3.00: configured for UDMA/133
[22360.263618] ata3: EH complete

It seems to happen just before it's going to suspend:

[22364.620394] PM: Syncing filesystems ... done.
[22364.890139] PM: Preparing system for mem sleep
[22364.890145] Freezing user space processes ... (elapsed 0.00 seconds) done.
[22364.891281] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
[22364.891353] PM: Entering mem sleep
[22364.891367] Suspending console(s) (use no_console_suspend to debug)
[22365.112468] PM: suspend of drv:psmouse dev:serio2 complete after 220.652 msecs
[22365.230163] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[22365.230346] sd 2:0:0:0: [sda] Stopping disk
[22365.808260] PM: suspend of drv:sd dev:2:0:0:0 complete after 578.100 msecs
[22366.183944] PM: suspend of drv:psmouse dev:serio1 complete after 375.492 msecs
[22366.780123] PM: suspend of drv:atkbd dev:serio0 complete after 596.170 msecs
[22366.782343] parport_pc 00:0b: disabled
[22366.782345] ACPI handle has no context!
[22366.782462] serial 00:0a: disabled
[22366.782464] ACPI handle has no context!
[22366.787653] ACPI handle has no context!
[22366.810094] iwlagn 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0x000033D8
[22366.852765] iwlagn 0000:03:00.0: RF_KILL bit toggled to disable radio.
[22366.960249] ata2: port disabled. ignoring.
[22366.960358] ata_piix 0000:00:1f.1: PCI INT C disabled
[22366.960389] ehci_hcd 0000:00:1d.7: PCI INT D disabled
[22366.960410] uhci_hcd 0000:00:1d.2: PCI INT C disabled
[22366.960436] uhci_hcd 0000:00:1d.1: PCI INT B disabled
[22366.960455] uhci_hcd 0000:00:1d.0: PCI INT A disabled
[22366.960472] pciehp 0000:00:1c.3:pcie04: pciehp_suspend ENTRY
[22367.070402] HDA Intel 0000:00:1b.0: PCI INT B disabled
[22367.09014...

Same here on Thinkpad T61:

[22359.500669] ata3.00: exception Emask 0x10 SAct 0x3 SErr 0x50000 action 0xe frozen
[22359.500675] ata3.00: irq_stat 0x00400000, PHY RDY changed
[22359.500680] ata3: SError: { PHYRdyChg CommWake }
[22359.500685] ata3.00: failed command: READ FPDMA QUEUED
[22359.500694] ata3.00: cmd 60/20:00:08:7f:5e/00:00:00:00:00/40 tag 0 ncq 16384 in
[22359.500697]          res 40/00:0c:40:7f:5e/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[22359.500701] ata3.00: status: { DRDY }
[22359.500705] ata3.00: failed command: READ FPDMA QUEUED
[22359.500714] ata3.00: cmd 60/48:08:40:7f:5e/00:00:00:00:00/40 tag 1 ncq 36864 in
[22359.500716]          res 40/00:0c:40:7f:5e/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[22359.500720] ata3.00: status: { DRDY }
[22359.500728] ata3: hard resetting link
[22360.250117] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[22360.252801] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[22360.252812] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[22360.252821] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[22360.256923] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[22360.256934] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[22360.256943] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[22360.258711] ata3.00: configured for UDMA/133
[22360.263591] ata3.00: configured for UDMA/133
[22360.263618] ata3: EH complete

It seems to happen just before it's going to suspend:

[22364.620394] PM: Syncing filesystems ... done.
[22364.890139] PM: Preparing system for mem sleep
[22364.890145] Freezing user space processes ... (elapsed 0.00 seconds) done.
[22364.891281] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
[22364.891353] PM: Entering mem sleep
[22364.891367] Suspending console(s) (use no_console_suspend to debug)
[22365.112468] PM: suspend of drv:psmouse dev:serio2 complete after 220.652 msecs
[22365.230163] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[22365.230346] sd 2:0:0:0: [sda] Stopping disk
[22365.808260] PM: suspend of drv:sd dev:2:0:0:0 complete after 578.100 msecs
[22366.183944] PM: suspend of drv:psmouse dev:serio1 complete after 375.492 msecs
[22366.780123] PM: suspend of drv:atkbd dev:serio0 complete after 596.170 msecs
[22366.782343] parport_pc 00:0b: disabled
[22366.782345] ACPI handle has no context!
[22366.782462] serial 00:0a: disabled
[22366.782464] ACPI handle has no context!
[22366.787653] ACPI handle has no context!
[22366.810094] iwlagn 0000:03:00.0: MAC is in deep sleep!.  CSR_GP_CNTRL = 0x000033D8
[22366.852765] iwlagn 0000:03:00.0: RF_KILL bit toggled to disable radio.
[22366.960249] ata2: port disabled. ignoring.
[22366.960358] ata_piix 0000:00:1f.1: PCI INT C disabled
[22366.960389] ehci_hcd 0000:00:1d.7: PCI INT D disabled
[22366.960410] uhci_hcd 0000:00:1d.2: PCI INT C disabled
[22366.960436] uhci_hcd 0000:00:1d.1: PCI INT B disabled
[22366.960455] uhci_hcd 0000:00:1d.0: PCI INT A disabled
[22366.960472] pciehp 0000:00:1c.3:pcie04: pciehp_suspend ENTRY
[22367.070402] HDA Intel 0000:00:1b.0: PCI INT B disabled
[22367.090145] PM: suspend of drv:HDA Intel dev:0000:00:1b.0 complete after 129.640 msecs
[22367.090168] ehci_hcd 0000:00:1a.7: PCI INT C disabled
[22367.090191] uhci_hcd 0000:00:1a.1: PCI INT B disabled
[22367.090212] uhci_hcd 0000:00:1a.0: PCI INT A disabled
[22367.091862] e1000e 0000:00:19.0: PCI INT A disabled
[22367.091873] e1000e 0000:00:19.0: PME# enabled
[22367.091881] e1000e 0000:00:19.0: wake-up capability enabled by ACPI
[22367.110522] PM: suspend of devices complete after 2218.741 msecs
[22367.110525] PM: suspend devices took 2.220 seconds
[22367.150085] ehci_hcd 0000:00:1d.7: power state changed by ACPI to D3
[22367.170071] uhci_hcd 0000:00:1d.2: power state changed by ACPI to D3
[22367.190070] uhci_hcd 0000:00:1d.0: power state changed by ACPI to D3
[22367.230070] ehci_hcd 0000:00:1a.7: power state changed by ACPI to D3
[22367.300064] uhci_hcd 0000:00:1a.1: power state changed by ACPI to D3
[22367.300241] PM: late suspend of devices complete after 189.710 msecs
[22367.400064] ACPI: Preparing to enter system sleep state S3
[22367.790019] Disabling non-boot CPUs ...
[22367.790043] CPU0 attaching NULL sched-domain.
[22367.790046] CPU1 attaching NULL sched-domain.
[22367.950023] CPU0 attaching NULL sched-domain.
[22368.060022] CPU 1 is now offline

Revision history for this message

red_hood (chris-red-hood) wrote on 2010-04-21:

#19

Just followed discussion upstream on https://bugzilla.kernel.org/show_bug.cgi?id=14543.
The error can be reproduced by attaching external power to the notebook.
The file /sys/class/scsi_host/host3/link_power_management_policy then changes from "min_power" to "max_performance".
I was not able to reproduce the error by manually writing to the file and toggle between the two values.

Revision history for this message

Wolfgang Rohdewald (wolfgang-rohdewald) wrote on 2010-04-22:

#20

I can reproduce it too with attaching external power. So setting power management to performance in battery mode fixes it for me. Moreover the kernel freezes I had seem to be gone since getting rid of closed source drivers (Broadcom and ATI Radeon)

Revision history for this message

red_hood (chris-red-hood) wrote on 2010-04-24:

#21

Is it possible to find out which program is writing to the "/sys/class/scsi_host/hostX/link_power_management_policy" files? Setting permissions to 000 stops changing link power modes, so I guess the changes are handled in user space.
As a workaround, setting the S-ATA links to "Compatibility Mode" in BIOS (switching off AHCI) worked for me.

Revision history for this message

Dave Lane (lightweight) wrote on 2010-05-11:

#22

I'm getting the same sorts of problems every time I attempt a suspend on my ASUS Z62E laptop with

Linux 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 x86_64 GNU/Linux

The machine doesn't suspend, and eventually returns to the X login window for unlocking the screen (not the GDM login) with a very high load average, and a lot of CPU wait (unsurprisingly). Ultimately, the only fix is a reboot.

Note, this laptop was recently upgraded from Karmic, where suspend worked without problems. It also worked after the initial upgrade to Lucid, but possibly since a kernel update, this problem has sprung up probably starting around 8 May.

The error messages start on suspend, either via closing the lid or explicitly selecting suspend from the Session menu. In syslog, the only temporally related event is the DHCPREQUEST, which I've included below.

May 12 08:45:24 stampy dhclient: DHCPREQUEST of 172.20.4.212 on wlan0 to 172.20.4.1 port 67
May 12 08:45:24 stampy kernel: [48315.060765] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48315.060779] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48315.060790] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48315.060810] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48315.060813] res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
May 12 08:45:24 stampy kernel: [48315.060823] ata3.00: status: { DRDY ERR }
May 12 08:45:24 stampy kernel: [48315.060830] ata3.00: error: { UNC }
May 12 08:45:24 stampy kernel: [48315.089709] ata3.00: configured for UDMA/133
May 12 08:45:24 stampy kernel: [48315.089741] ata3: EH complete
May 12 08:45:24 stampy kernel: [48319.288891] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48319.288905] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48319.288916] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48319.288935] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48319.288939] res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
May 12 08:45:24 stampy kernel: [48319.288949] ata3.00: status: { DRDY ERR }
May 12 08:45:24 stampy kernel: [48319.288957] ata3.00: error: { UNC }
May 12 08:45:24 stampy kernel: [48319.317872] ata3.00: configured for UDMA/133
May 12 08:45:24 stampy kernel: [48319.317900] ata3: EH complete
May 12 08:45:24 stampy kernel: [48323.496473] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48323.496487] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48323.496498] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48323.496518] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48323.496522] res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
... and many more

I'm getting the same sorts of problems every time I attempt a suspend on my ASUS Z62E laptop with

Linux 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 x86_64 GNU/Linux

The machine doesn't suspend, and eventually returns to the X login window for unlocking the screen (not the GDM login) with a very high load average, and a lot of CPU wait (unsurprisingly). Ultimately, the only fix is a reboot.

Note, this laptop was recently upgraded from Karmic, where suspend worked without problems. It also worked after the initial upgrade to Lucid, but possibly since a kernel update, this problem has sprung up probably starting around 8 May.

The error messages start on suspend, either via closing the lid or explicitly selecting suspend from the Session menu. In syslog, the only temporally related event is the DHCPREQUEST, which I've included below.

May 12 08:45:24 stampy dhclient: DHCPREQUEST of 172.20.4.212 on wlan0 to 172.20.4.1 port 67
May 12 08:45:24 stampy kernel: [48315.060765] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48315.060779] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48315.060790] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48315.060810] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48315.060813]          res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
May 12 08:45:24 stampy kernel: [48315.060823] ata3.00: status: { DRDY ERR }
May 12 08:45:24 stampy kernel: [48315.060830] ata3.00: error: { UNC }
May 12 08:45:24 stampy kernel: [48315.089709] ata3.00: configured for UDMA/133
May 12 08:45:24 stampy kernel: [48315.089741] ata3: EH complete
May 12 08:45:24 stampy kernel: [48319.288891] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48319.288905] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48319.288916] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48319.288935] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48319.288939]          res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
May 12 08:45:24 stampy kernel: [48319.288949] ata3.00: status: { DRDY ERR }
May 12 08:45:24 stampy kernel: [48319.288957] ata3.00: error: { UNC }
May 12 08:45:24 stampy kernel: [48319.317872] ata3.00: configured for UDMA/133
May 12 08:45:24 stampy kernel: [48319.317900] ata3: EH complete
May 12 08:45:24 stampy kernel: [48323.496473] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 12 08:45:24 stampy kernel: [48323.496487] ata3.00: irq_stat 0x40000008
May 12 08:45:24 stampy kernel: [48323.496498] ata3.00: failed command: READ FPDMA QUEUED
May 12 08:45:24 stampy kernel: [48323.496518] ata3.00: cmd 60/08:00:57:03:90/00:00:01:00:00/40 tag 0 ncq 4096 in
May 12 08:45:24 stampy kernel: [48323.496522]          res 41/40:00:59:03:90/00:00:01:00:00/40 Emask 0x409 (media error) <F>
... and many more

Revision history for this message

Chris Coulson (chrisccoulson) wrote on 2010-05-13:

#23

I've traced the trigger of the errors down to /usr/lib/pm-utils/power.d/powersave-policy-sata-link-power. This triggers the errors for me every time on my laptop:

echo "max_performance" | sudo tee /sys/class/scsi_host/host1/link_power_management_policy

Revision history for this message

Anton Zayats (anton-zayats) wrote on 2010-05-17:

#24

Same problem here. Multiple ata1.00 errors even when pc is idle.
GA-E7AUM-DS2H motherboard.
One disk already is dead (Seagate) - it was on Jaunty.
The next was going down (WD) on 10.04 fresh minimal install with nvidia proprietary drivers.
Switching off AHCI from bios worked for me as well.
Multiple reports about this issue:
https://bugs.launchpad.net/ubuntu/+bug/353812
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/217920
http://www.nvnews.net/vbulletin/showthread.php?t=126152

Revision history for this message

Stefan Bader (smb) wrote on 2010-05-18:

#25

@Anton, as long as this is not triggered by either suspend or removal of AC power, this seems to be a different problem. If you have not yet, please file a separate bug.

I changed the bugs title to be more specific to this. Only when switching between "min_power" and "max_performance" like described in comment #23 causes the errors it is the same class of bug.

summary:

- Frequent ATA errors and disk corruption
+ SATA link power management causes disk errors and corruption

Revision history for this message

Kees Cook (kees) wrote on 2010-05-18: Re: [Bug 539467] Re: SATA link power management causes disk errors and corruption

#26

Un/re-plugging power causes switching between "min_power" and
"max_performance". See
/usr/lib/pm-utils/power.d/powersave-policy-sata-link-power

Revision history for this message

Christian Reis (kiko) wrote on 2010-05-20:

#27

Stefan, Chris: you guys have nailed it; plugging a power cable back into the laptop causes the problem. Is there a fix or workaround that we can apply to avoid losing data meanwhile?

Revision history for this message

Stefan Bader (smb) wrote on 2010-05-20: Re: [Bug 539467] Re: SATA link power management causes disk errors and corruption

#28

I guess the best short term action is probably to edit the script in
usr/lib/pm-utils/power.d/powersave-policy-sata-link-power and put an early exit
0 in there.
Its a bit hacky, though.

Revision history for this message

Chris Coulson (chrisccoulson) wrote on 2010-05-21:

#29

The upstream bug report (https://bugzilla.kernel.org/show_bug.cgi?id=14543) has a patch which might help fix this

Chase Douglas (chasedouglas) on 2010-05-21

Changed in pm-utils-powersave-policy (Ubuntu):
status:	New → In Progress
importance:	Undecided → High
assignee:	nobody → Chase Douglas (chasedouglas)

Chase Douglas (chasedouglas) on 2010-05-21

description:

updated

Revision history for this message

Martin Pitt (pitti) wrote on 2010-05-21:

#30

For lucid we'll just disable the script entirely, so wontfixing the kernel lucid task.

Changed in linux (Ubuntu Lucid):
status:	New → Won't Fix
Changed in pm-utils-powersave-policy (Ubuntu Lucid):
status:	New → In Progress
assignee:	nobody → Chase Douglas (chasedouglas)

Martin Pitt (pitti) on 2010-05-21

Changed in pm-utils-powersave-policy (Ubuntu Lucid):
status:	In Progress → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2010-05-21: Please test proposed package

#31

Accepted pm-utils-powersave-policy into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags:

added: verification-needed

Revision history for this message

Kees Cook (kees) wrote on 2010-05-21: Re: [Bug 539467] Re: SATA link power management causes disk errors and corruption

#32

While I'm all for a temporary work-around, this is, IIUC, a real kernel
bug. I don't want to have my drive in full power mode when I'm on battery
as the final solution. :)

Revision history for this message

Chase Douglas (chasedouglas) wrote on 2010-05-21:

#33

Kees,

Note that we still have the task open against "linux (ubuntu)". If we find a fix in the kernel we can consider reenabling the power save policy for Lucid. For now though, it seems prudent to disable the power save policy. Who knows if we find a fix for one SATA chipset only to reenable everything and find another chipset has the same problem. I'd prefer to force that on users testing development versions rather than released versions.

Revision history for this message

Martin Pitt (pitti) wrote on 2010-06-01:

#34

Chris, any chance to test the lucid-proposed package to see that it stops the HDD breakage?

Revision history for this message

red_hood (chris-red-hood) wrote on 2010-06-06:

#35

I installed pm-utils-powersave-policy from lucid-proposed and can confirm that S-ATA link reset does not occur anymore.

Martin Pitt (pitti) on 2010-06-08

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Christian Reis (kiko) wrote on 2010-06-09:

#36

Agreed; just verified that this version works around the problem:

ii pm-utils-powersave-policy 0.3.1 lightweight power saving policy when on battery

Does an upstream bug need to be filed (or identified) in the kernel bugzilla?

Revision history for this message

Chase Douglas (chasedouglas) wrote on 2010-06-09:

#37

There have been reports in bug 528981, which may be a dupe of this bug, that the issue seems resolved from 2.6.33 and onward. It would be great if anyone here could test out a mainline kernel with the current version of pm-utils-powersave-policy to verify that it has been fixed. Mainline kernels may be found at http://kernel.ubuntu.com/~kernel-ppa/mainline/.

Thanks

Revision history for this message

Scott Testerman (scott-testerman) wrote on 2010-06-10:

#38

Please note that bug 528981 is in regards to a PATA controller. But yes, the problem has been fixed in 2.6.33+. The problem didn't manifest itself until sometime after 2.6.28, and by the time Karmic rolled around it was impossible to even complete an installation without such severe corruption that the installation failed. The default Lucid kernel is markedly better, but still will result in corruption.

Revision history for this message

H.i.M (hir-i-mogul-gmail) wrote on 2010-06-10:

#39

dmesg log file Edit (53.7 KiB, text/plain)

Could sb take a look a this dmesg-log.
Id like to know it this but is the problem which effects me.
If not would report a new bug.

------
[ 279.189053] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x4080000 action 0xe frozen
[ 279.189063] ata1.00: irq_stat 0x00000040, connection status changed
[ 279.189071] ata1: SError: { 10B8B DevExch }
[ 279.189079] ata1.00: failed command: READ FPDMA QUEUED
[ 279.189093] ata1.00: cmd 60/00:00:10:6b:53/01:00:05:00:00/40 tag 0 ncq 131072 in
[ 279.189096] res 40/00:20:70:1d:84/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
[ 279.189103] ata1.00: status: { DRDY }
[ 279.189113] ata1: hard resetting link
[ 280.430362] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 280.434063] ata1.00: configured for UDMA/133
[ 280.434088] ata1: EH complete
-----

Thanks H.i.M

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-06-11:

#40

This bug was fixed in the package pm-utils-powersave-policy - 0.3.1

---------------
pm-utils-powersave-policy (0.3.1) lucid-proposed; urgency=low

  [ Chase Douglas ]
  * Remove sata link power save policy due to data corruption potential.
    -LP: #539467

[ Martin Pitt ]
* debian/control: Drop Vcs-Bzr: header, switching to auto-import branch.
-- Chase Douglas <email address hidden> Fri, 21 May 2010 17:16:50 +0200

Changed in pm-utils-powersave-policy (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Martin Pitt (pitti) wrote on 2010-06-11:

#41

Chase, should we also upload the p-u-p-p change to maverick, or do we expect a kernel-side solution for this?

Revision history for this message

Brian Rogers (brian-rogers) wrote on 2010-06-11:

#42

According to comment 38, this should already be fixed in the kernel in maverick.

Revision history for this message

Martin Pitt (pitti) wrote on 2010-06-11:

#43

Ah, closing the maverick linux task then, thanks for pointing out.

Changed in linux (Ubuntu):
status:	Triaged → Fix Released
Changed in pm-utils-powersave-policy (Ubuntu):
status:	In Progress → Won't Fix

Revision history for this message

Joshua Coombs (josh-coombs-gmail) wrote on 2010-06-14:

#44

I can confirm this is still occurring in Maverick as of this morning. The kernel fix doesn't appear to be catching all instances of the problem.

Revision history for this message

Marco (koansoftware) wrote on 2010-08-01:

#45

Same error here.
I Upgraded this machine since it was 8.04 up to the latest 10.04

[ 2860.060493] Restarting tasks ... done.
[ 2892.988087] ata3.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
[ 2892.988213] ata3.00: failed command: READ FPDMA QUEUED
[ 2892.988307] ata3.00: cmd 60/70:00:63:a8:ba/00:00:08:00:00/40 tag 0 ncq 57344 in
[ 2892.988310] res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 2892.988539] ata3.00: status: { DRDY }
[ 2892.988603] ata3.00: failed command: READ FPDMA QUEUED
[ 2892.988695] ata3.00: cmd 60/48:08:93:b0:ba/00:00:08:00:00/40 tag 1 ncq 36864 in
[ 2892.988698] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2892.988926] ata3.00: status: { DRDY }
[ 2892.988995] ata3: hard resetting link
[ 2893.308073] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2893.311007] ata3.00: _GTF unexpected object type 0x1
[ 2893.317617] ata3.00: _GTF unexpected object type 0x1
[ 2893.321210] ata3.00: configured for UDMA/133
[ 2893.321220] ata3.00: device reported invalid CHS sector 0
[ 2893.321227] ata3.00: device reported invalid CHS sector 0
[ 2893.321243] ata3: EH complete

Acer TravelMate 6292
Ubuntu 10.04
/dev/sda5 on / type ext3 (rw,relatime,errors=remount-ro)

Most of the errors aren't visible with dmesg but rather with a test console.
Please help.
TIA

Revision history for this message

Christian Reis (kiko) wrote on 2010-11-13:

#46

This seems to actually not be fixed for Maverick. I upgraded today, and in looking through my dmesg I see, again
[ 2766.533670] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 2766.533675] ata3.00: irq_stat 0x40000008
[ 2766.533679] ata3.00: failed command: READ FPDMA QUEUED
[ 2766.533687] ata3.00: cmd 60/00:00:7a:f5:2a/01:00:0a:00:00/40 tag 0 ncq 131072 in
[ 2766.533689] res 40/00:00:7a:f5:2a/00:00:0a:00:00/40 Emask 0x401 (device error) <F>
[ 2766.533692] ata3.00: status: { DRDY }
[ 2766.534991] ata3.00: configured for UDMA/100
[ 2766.535001] ata3: EH complete
[ 2766.576800] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 2766.576806] ata3.00: irq_stat 0x40000008
[ 2766.576810] ata3.00: failed command: READ FPDMA QUEUED
[ 2766.576818] ata3.00: cmd 60/00:00:7a:f5:2a/01:00:0a:00:00/40 tag 0 ncq 131072 in
[ 2766.576819] res 40/00:00:7a:f5:2a/00:00:0a:00:00/40 Emask 0x401 (device error) <F>
[ 2766.576823] ata3.00: status: { DRDY }
[ 2766.578039] ata3.00: configured for UDMA/100
[ 2766.578052] ata3: EH complete
[ 2766.619225] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0

etc. Is this a separate issue, or the same problem occurring with 2.6.35? This is a Thinkpad x61s:

00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)

Revision history for this message

Andy Whitcroft (apw) wrote on 2011-01-04:

#47

Ok it seems this is not fixed in Maverick, as Natty is now open the development task is for that so I have nominated this for Maverick so we can fix it there. I have also reopened the Natty tasks as we do not know if this is fixed there or not.

I suspect the expedient thing to do is pull forward the fix for pm-utils-powersave-policy to Maverick and fix it that way there. It would also be useful if someone who is affected by this issue could test a late Natty kernel and report back if that fixes things for Natty or not.

Changed in pm-utils-powersave-policy (Ubuntu):
status:	Won't Fix → New

Brian Murray (brian-murray) on 2011-01-04

Changed in pm-utils (Ubuntu Natty):
importance:	Undecided → High

Ubuntu Foundations Team Bug Bot (crichton) on 2011-01-04

tags:

added: regression-release
removed: regression-potential

Revision history for this message

Roman Yepishev (rye) wrote on 2011-01-05:

#48

Running Linux buzz 2.6.37-11-generic #25-Ubuntu SMP Tue Dec 21 23:42:56 UTC 2010 x86_64 GNU/Linux (latest natty kernel at the time of writing)
pm-utils:
Installed: 1.4.1-4

After unplugging the AC the pm-suspend log gets
"""
/usr/lib/pm-utils/power.d/readahead true: success.
Running hook /usr/lib/pm-utils/power.d/sata_alpm true:
Setting SATA APLM on host2 to min_power...Done.
Setting SATA APLM on host3 to min_power...Done.
Setting SATA APLM on host4 to min_power...Done.
Setting SATA APLM on host5 to min_power...Done.
"""
And subsequent usage of the system is rather hard. HDD led stays on for prolonged periods of time, dmesg hangs for some seconds (i believe binary is being read from disk), then the following can be found in the output:
"""
[13870.046027] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x150000 action 0x6 frozen
[13870.046044] ata3: SError: { PHYRdyChg CommWake Dispar }
[13870.046052] ata3.00: failed command: SET FEATURES
[13870.046070] ata3.00: cmd ef/05:fe:00:00:00/00:00:00:00:00/40 tag 0
[13870.046073] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[13870.046081] ata3.00: status: { DRDY }
[13870.046094] ata3: hard resetting link
[13870.390116] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[13870.693978] ata3.00: configured for UDMA/133
[13870.694151] ata3: EH complete
"""

I have disabled the aplm control by by placing
"""
#!/bin/sh
export SATA_ALPM_ENABLE=false
"""
to /etc/pm/config.d/00-no-sata-alpm but that is just a workaround which the users will not be really happy with if it does not work out-of-the-box.

Andy Whitcroft (apw) on 2011-01-05

Changed in linux (Ubuntu Natty):
importance:	Undecided → High
status:	Fix Released → New

Revision history for this message

Stefan Bader (smb) wrote on 2011-01-05:

#49

The safest thing going forward would be to set the default of SATA link power managment to false. Of course this prevents seeing problems and further reports. As this is likely some problem which only happens on certain controllers, certain drives or the combination of both it would be nice to have a conditional quirking mechanism (or those cases fixed if possible). That however requires knowledge about what is broken and what not.

So for those knowing to be broken in Natty (or current upstream), if we can add "sudo lspci -vvvnn" (for the controller) and "sudo hdparm -i /dev/sd..." (for the drive) output here. And we need to think of a way to summarize the info somewhere, so it does not require to read though tons of comments each time...

Revision history for this message

voss749 (voss749) wrote on 2011-01-05:

#50

Im using Mint 10 which is based on Maverick and im experiencing the same problem so it was NOT fixed in Maverick.
However it was also effecting my previous debian squeeze installation so if you have a fix you might want to share it with the debian folks. Im hoping a Maverick fix will come soon or else I'll go back to Mint 9 which is based on Lucid and apply the Lucid fix.

Revision history for this message

Roman Yepishev (rye) wrote on 2011-01-13:

#51

lspci_-vvvnn.txt Edit (29.5 KiB, text/plain)

Here is the info for my machine.
lspci -vvvn

Revision history for this message

Roman Yepishev (rye) wrote on 2011-01-13:

#52

hdparm -i /dev/sdb:

Model=WDC WD2500BEVS-22UST0, FwRev=01.01A01, SerialNo=WD-WXE108A79290
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7

Bug Watch Updater (bug-watch-updater) on 2011-01-24

Changed in linux:
status:	Unknown → Confirmed

Revision history for this message

Martin Pitt (pitti) wrote on 2011-01-27:

#53

Closing invalid tasks. In lucid the hook was in pm-utils-powersave-policy, in maverick onwards it got merged into pm-utils itself.

So we should disable /usr/lib/pm-utils/power.d/sata_alpm entirely by default?

Changed in pm-utils-powersave-policy (Ubuntu Maverick):
status:	New → Invalid
Changed in pm-utils-powersave-policy (Ubuntu Natty):
status:	New → Invalid
Changed in pm-utils (Ubuntu Lucid):
status:	New → Invalid

Revision history for this message

Stefan Bader (smb) wrote on 2011-01-27:

#54

Yes I would vote for it. Since we cannot say for sure which hardware is safe or not. And giving disk corruption is quite serious to find out that it is not safe.

Revision history for this message

Stefan Bader (smb) wrote on 2011-01-27:

#55

And then it would be awesome if people having the problems could try out some recent mainline kernels (http://kernel.ubuntu.com/~kernel-ppa/mainline/) to see whether the problem may have been fixed in between.

Revision history for this message

Martin Pitt (pitti) wrote on 2011-02-01:

#56

Disabled by default in current Debian git packaging head now.

Changed in pm-utils (Ubuntu Natty):
status:	New → Fix Committed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-02-01:

#57

This bug was fixed in the package pm-utils - 1.4.1-5

---------------
pm-utils (1.4.1-5) experimental; urgency=low

  * Add 13-49bluetooth-sync.patch: Wait for btusb module to get unused, so
    that you can remove it in SUSPEND_MODULES. (LP: #698331)
  * Add 14-disable-sata-alpm.patch: Disable SATA link power management by
    default, as it still causes disk errors and corruptions on many hardware.
    (LP: #539467)
-- Martin Pitt <email address hidden> Tue, 01 Feb 2011 16:11:40 +0100

Changed in pm-utils (Ubuntu Natty):
status:	Fix Committed → Fix Released

Bug Watch Updater (bug-watch-updater) on 2011-02-03

Changed in linux:
importance:	Unknown → Medium

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2011-02-16:

#58

Marked kernel tasks incomplete pending the results of response to Stefan's Comment #55.

~JFo

Changed in linux (Ubuntu Maverick):
status:	New → Incomplete
Changed in linux (Ubuntu Natty):
status:	New → Incomplete

Revision history for this message

Torsten Spindler (tspindler) wrote on 2011-02-17:

#59

For testing purposes I run 2.6.38-020638rc5-generic #201102160907 from the source Stefan gave. This is on a Lucid system with pm-utils 1.3.0-1ubuntu3 and pm-utils-powersave-policy 0.3.1 installed. Do I need to change something there to make my testing valid?

Revision history for this message

Stefan Bader (smb) wrote on 2011-02-17:

#60

As the pm-utils have the feature disabled, the test case would be:

ls -la /sys/block/sd?/device

Note the link points to h:b:d:l, where h is the host number. Then change into

cd /sys/class/scsi_host/host?/

in there is a link_power_management_policy into which (as root) "min_power" and "max_performace" can be written. The latter should be there by default. So one would twiddle from max_performance to min_power and back. Then do some reading operations involving the disk. In my case I seem to get

ata3: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[46446.498522] ata3: irq_stat 0x00400000, PHY RDY changed
[46446.498538] ata3: SError: { PHYRdyChg CommWake }
[46446.498559] ata3: hard resetting link
[46447.220211] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[46447.222302] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[46447.242143] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[46447.242603] ata3.00: configured for UDMA/133
[46447.242623] ata3: EH complete

once in dmesg and without any visible effects on the reading operations. In the bad case there would be hard errors (involving timeouts), repeating and with errors reported to the reading operation.

Revision history for this message

Torsten Spindler (tspindler) wrote on 2011-02-17:

#61

I've run the following test:

while true
> do
> cat link_power_management_policy
> echo max_performance | sudo tee link_power_management_policy
> sleep 3
> echo min_power | sudo tee link_power_management_policy
> sleep 3
> done

While this loop is running, I've created some artificial fs activity:

while true
> do
> cat /boot/vmlinuz-2.6.38-020638rc5-generic > /dev/null
> sleep 1
> done

And also:

for n in $(seq 0 9)
> do
> sudo find /usr -type f -iname \*$n\* -exec cat {} > /dev/null \;
> done

I have found no signs of excepetion Emask in kern.log, the test was run
for more than 2 hours.

For completeness, uname -a:
Linux spitfire 2.6.38-020638rc5-generic #201102160907 SMP Wed Feb 16
10:18:56 UTC 2011 i686 GNU/Linux

Revision history for this message

Stefan Bader (smb) wrote on 2011-02-17:

#62

For completeness, what disk controller and brand/type of disk did you test with? As said above, this highly depends on the hardware involved.

Revision history for this message

Stefan Bader (smb) wrote on 2011-02-17:

#63

To head my own advice: I would not see problems even in 10.04 with:
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02)
and WD1600BEVT-2.

Revision history for this message

Torsten Spindler (tspindler) wrote on 2011-02-17:

#64

This one:

$ sudo lspci -vvnn | grep -i SATA
00:1f.2 SATA controller [0106]: Intel Corporation ICH9M/M-E SATA AHCI
Controller [8086:2929] (rev 03) (prog-if 01)
Capabilities: [a8] SATA HBA <?>

Revision history for this message

Chow Loong Jin (hyperair) wrote on 2011-02-17:

#65

On Thursday 17,February,2011 09:59 PM, Torsten Spindler wrote:
> This one:
>
> $ sudo lspci -vvnn | grep -i SATA
> 00:1f.2 SATA controller [0106]: Intel Corporation ICH9M/M-E SATA AHCI
> Controller [8086:2929] (rev 03) (prog-if 01)
> Capabilities: [a8] SATA HBA <?>
>

No issues with this one:-

00:1f.2 SATA controller [0106]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E)
SATA AHCI Controller [8086:2829] (rev 03) (prog-if 01 [AHCI 1.0]) with a
WDC_WD5000BEVT-80A0RT0 disk.

However, the controller appears as a SATA-IDE controller (8086:2828) that uses
the piix driver by default, and you need a custom kernel (AFAICT) to quirk the
8086:2828 controller into AHCI mode whereby it turns into 8086:2829, and uses
the ahci driver, allowing for the SATA link power management.

--
Kind regards,
Loong Jin

Revision history for this message

red_hood (chris-red-hood) wrote on 2011-02-17:

#66

My hardware details:

Lenovo T61, 7664-18G with BIOS Version 2.26

$ lspci -vvnn |grep -i sata
00:1f.2 SATA controller [0106]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller [8086:2829] (rev 03) (prog-if 01 [AHCI 1.0])

Hard drive is:

$ smartctl -a /dev/sda

Model Family: Western Digital Scorpio Blue Serial ATA family
Device Model: WDC WD5000BEVT-00ZAT0
Firmware Version: 01.01A01

Kernel is:

$ uname -a
Linux redlap 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux

When I do

$ echo "min_power" > /sys/class/scsi_host/host2/link_power_management_policy

there is no message from the kernel.

Switching back to

$ echo "max_performance" > /sys/class/scsi_host/host2/link_power_management_policy

the following messages appear:

[47694.108618] ata3: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[47694.108626] ata3: irq_stat 0x00400000, PHY RDY changed
[47694.108635] ata3: SError: { PHYRdyChg CommWake }
[47694.108646] ata3: hard resetting link
[47694.850173] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[47694.853238] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[47694.853250] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[47694.853259] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[47694.860000] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[47694.860051] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[47694.860062] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[47694.862814] ata3.00: configured for UDMA/133
[47694.868797] ata3.00: configured for UDMA/133
[47694.868804] ata3: EH complete

Seems it's still the same behavior in Maverick as it was before in Lucid.

The original hard drive delivered with the Notebook was from Hitachi, so maybe there is some kind of incompatibility between the controller/BIOS and the WD drive. If I find some time, I'll insert the original one and repeat the test.

My hardware details:

Lenovo T61, 7664-18G with BIOS Version 2.26

$ lspci -vvnn |grep -i sata
00:1f.2 SATA controller [0106]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller [8086:2829] (rev 03) (prog-if 01 [AHCI 1.0])

Hard drive is:

$ smartctl -a /dev/sda

Model Family:     Western Digital Scorpio Blue Serial ATA family
Device Model:     WDC WD5000BEVT-00ZAT0
Firmware Version: 01.01A01

Kernel is:

$ uname -a
Linux redlap 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux

When I do

$ echo "min_power" > /sys/class/scsi_host/host2/link_power_management_policy

there is no message from the kernel.

Switching back to

$ echo "max_performance" > /sys/class/scsi_host/host2/link_power_management_policy

the following messages appear:

[47694.108618] ata3: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[47694.108626] ata3: irq_stat 0x00400000, PHY RDY changed
[47694.108635] ata3: SError: { PHYRdyChg CommWake }
[47694.108646] ata3: hard resetting link
[47694.850173] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[47694.853238] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[47694.853250] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[47694.853259] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[47694.860000] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[47694.860051] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[47694.860062] ata3.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[47694.862814] ata3.00: configured for UDMA/133
[47694.868797] ata3.00: configured for UDMA/133
[47694.868804] ata3: EH complete

Seems it's still the same behavior in Maverick as it was before in Lucid.

The original hard drive delivered with the Notebook was from Hitachi, so maybe there is some kind of incompatibility between the controller/BIOS and the WD drive. If I find some time, I'll insert the original one and repeat the test.

Revision history for this message

Roman Yepishev (rye) wrote on 2011-02-18:

#67

uname -r: 2.6.38-020638rc5-generic

Model Family: Western Digital Scorpio Blue Serial ATA family
Device Model: WDC WD2500BEVS-22UST0
Serial Number: WD-WXE108A79290
Firmware Version: 01.01A01

This issue appears to be present again on kernsl 3.13 (all) 3.16 (all) and 3.17 (all)

upon shifting sata link power from min_power state to max_performance state all kernels report various forms of this error:

[   45.200582] ata3.00: exception Emask 0x10 SAct 0x8000 SErr 0x50000 action 0xe frozen
[   45.200586] ata3.00: irq_stat 0x00400000, PHY RDY changed
[   45.200589] ata3: SError: { PHYRdyChg CommWake }
[   45.200592] ata3.00: failed command: WRITE FPDMA QUEUED
[   45.200596] ata3.00: cmd 61/e8:78:00:3f:48/00:00:04:00:00/40 tag 15 ncq 118784 out
[   45.200596]          res 40/00:7c:00:3f:48/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
[   45.200597] ata3.00: status: { DRDY }
[   45.200601] ata3: hard resetting link
[   45.925051] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   45.925911] ata3.00: configured for UDMA/133
[   45.941016] ahci 0000:00:1f.2: port does not support device sleep
[   45.941029] ata3: EH complete

With the current 3.13 kernel reporting the most severe errors of block write failures, etc.

The machine this is being tested on is an A05 bios Dell XPS13 (9333)

[    2.288104] ata3.00: ATA-8: LITEONIT LMT-256L9M-11 MSATA 256GB, HM8110B, max UDMA/133
[    2.288554] scsi 2:0:0:0: Direct-Access     ATA      LITEONIT LMT-256 10B  PQ: 0 ANSI: 5

As this machine is brand new, it's possible that the HW is actually failing, however SMART doesn't indicate any problems with the block device

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.17.0-031700-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     LITEONIT LMT-256L9M-11 MSATA 256GB
Serial Number:    TW0N42H75508548P1854
Firmware Version: HM8110B
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 4a
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Oct 10 13:39:25 2014 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (   10) seconds.
Offline data collection
capabilities:                    (0x15) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering                                                                                                                                                                     
                                        power-saving mode.                                                                                                                                                                                   
                                        Supports SMART auto save timer.                                                                                                                                                                      
Error logging capability:        (0x01) Error logging supported.                                                                                                                                                                             
                                        General Purpose Logging supported.                                                                                                                                                                   
Short self-test routine                                                                                                                                                                                                                      
recommended polling time:        (   1) minutes.                                                                                                                                                                                             
Extended self-test routine                                                                                                                                                                                                                   
recommended polling time:        (  10) minutes.                                                                                                                                                                                             
SCT capabilities:              (0x003d) SCT Status supported.                                                                                                                                                                                
                                        SCT Error Recovery Control supported.                                                                                                                                                                
                                        SCT Feature Control supported.                                                                                                                                                                       
                                        SCT Data Table supported.                                                                                                                                                                            
                                                                                                                                                                                                                                             
SMART Attributes Data Structure revision number: 1                                                                                                                                                                                           
Vendor Specific SMART Attributes with Thresholds:                                                                                                                                                                                            
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                                                                                                                             
  5 Reallocated_Sector_Ct   0x0003   100   100   000    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0003   100   100   000    Pre-fail  Always       -       46
175 Program_Fail_Count_Chip 0x0003   100   100   000    Pre-fail  Always       -       0
176 Erase_Fail_Count_Chip   0x0003   100   100   000    Pre-fail  Always       -       0
177 Wear_Leveling_Count     0x0003   100   100   000    Pre-fail  Always       -       1946
178 Used_Rsvd_Blk_Cnt_Chip  0x0003   100   100   000    Pre-fail  Always       -       0
179 Used_Rsvd_Blk_Cnt_Tot   0x0003   100   100   000    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   100   100   000    Pre-fail  Always       -       1216
181 Program_Fail_Cnt_Total  0x0003   100   100   000    Pre-fail  Always       -       0
182 Erase_Fail_Count_Total  0x0003   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0003   100   100   000    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x0003   100   100   000    Pre-fail  Always       -       0
241 Total_LBAs_Written      0x0003   100   100   000    Pre-fail  Always       -       8704
242 Total_LBAs_Read         0x0003   100   100   000    Pre-fail  Always       -       1385

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -
# 2  Short offline       Completed without error       00%         0         -

Selective Self-tests/Logging not supported

Revision history for this message

cfaber (cfaber) wrote on 2014-10-10:

#85

One Additional note, when shifting from high_performance to low_power modes of operation, no errors are encountered. It only seems to happen when going from a low to high state.

Revision history for this message

Wurlitzer (thomas-publique) wrote on 2015-03-12:

#86

Uninstalling the tlp package resolved this issue for me (Ubuntu 14.04).

Disabling laptop_mode also seemed to help:
echo 0 > /proc/sys/vm/laptop_mode

Changed in linux (Ubuntu Natty):
assignee:	Stefan Bader (stefan-bader-canonical) → nobody
milestone:	natty-updates → none
Changed in linux (Ubuntu):
assignee:	Stefan Bader (stefan-bader-canonical) → nobody
milestone:	natty-updates → none

Ubuntu
linux package

SATA link power management causes disk errors and corruption

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
Linux	Expired	Medium	linux-kernel-bugs #14543
linux (Ubuntu)	Invalid	High	Unassigned
Lucid	Won't Fix	Low	Unassigned
Maverick	Invalid	Medium	Unassigned
Natty	Invalid	High	Unassigned
pm-utils (Ubuntu)	Fix Released	High	Unassigned
Lucid	Invalid	Undecided	Unassigned
Maverick	Invalid	Undecided	Unassigned
Natty	Fix Released	High	Unassigned
pm-utils-powersave-policy (Ubuntu)	Invalid	High	Unassigned
Lucid	Fix Released	Undecided	Unassigned
Maverick	Invalid	Undecided	Unassigned
Natty	Invalid	High	Unassigned

Changed in linux (Ubuntu Natty):
status:	Incomplete → Triaged

Changed in pm-utils-powersave-policy (Ubuntu Lucid):
assignee:	Chase Douglas (chasedouglas) → nobody
Changed in pm-utils-powersave-policy (Ubuntu Natty):
assignee:	Chase Douglas (chasedouglas) → nobody

Changed in linux (Ubuntu Lucid):
importance:	Undecided → Low
Changed in linux (Ubuntu Maverick):
importance:	Undecided → Medium

Changed in linux (Ubuntu Natty):
milestone:	none → natty-updates

Changed in pm-utils (Ubuntu Maverick):
status:	New → Confirmed

Changed in pm-utils (Ubuntu Maverick):
status:	Confirmed → Invalid
Changed in linux (Ubuntu Natty):
status:	Triaged → Invalid
Changed in linux (Ubuntu Maverick):
status:	Triaged → Invalid
Changed in linux (Ubuntu):
status:	Triaged → Invalid

Ubuntulinux package

SATA link power management causes disk errors and corruption

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package