ATA failures when operating on battery power in Ubuntu 12.04

Bug #984308 reported by David Barker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High

Bug Description

When operating on battery power (not AC), multiple ATA failures occur.
This issue was not present in Ubuntu 11.10 but is observable in Ubuntu 12.04 beta 2 and Xubuntu 12.04 beta 2. The issue also occurs in Ubuntu 12.04 daily (17/04/2012) with latest updates (via apt).

I have tried using the most recent (as of 17/04/2012) mainline kernel.

The issue occurs both when booting under battery power and when booting under AC and then switching to battery.

Under AC power, no issues are observed.

The system will function correctly for some time (up to a few minutes) then applications will stop responding. The following occurs in dmesg when the issue presents (after which the system begins functioning normally for another few minutes).

[ 238.848502] ata1.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x6 frozen
[ 238.848526] ata1.00: failed command: WRITE FPDMA QUEUED
[ 238.848548] ata1.00: cmd 61/00:00:3f:e1:00/04:00:19:00:00/40 tag 0 ncq 524288 out
[ 238.848552] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 238.848563] ata1.00: status: { DRDY }
[ 238.848571] ata1.00: failed command: WRITE FPDMA QUEUED
[ 238.848589] ata1.00: cmd 61/00:08:3f:e5:00/04:00:19:00:00/40 tag 1 ncq 524288 out
[ 238.848593] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 238.848602] ata1.00: status: { DRDY }
[ 238.848610] ata1.00: failed command: WRITE FPDMA QUEUED
[ 238.848628] ata1.00: cmd 61/00:10:3f:e9:00/04:00:19:00:00/40 tag 2 ncq 524288 out
[ 238.848632] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 238.848640] ata1.00: status: { DRDY }
[ 238.848648] ata1.00: failed command: WRITE FPDMA QUEUED
[ 238.848666] ata1.00: cmd 61/00:18:3f:ed:00/04:00:19:00:00/40 tag 3 ncq 524288 out
[ 238.848670] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 238.848678] ata1.00: status: { DRDY }
[ 238.848696] ata1: hard resetting link
[ 248.860425] ata1: softreset failed (device not ready)
[ 248.860450] ata1: hard resetting link
[ 258.872300] ata1: softreset failed (device not ready)
[ 258.872322] ata1: hard resetting link
[ 269.448629] ata1: link is slow to respond, please be patient (ready=0)
[ 270.232426] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 270.244974] ata1.00: configured for UDMA/133
[ 270.260214] ata1.00: device reported invalid CHS sector 0
[ 270.260239] ata1.00: device reported invalid CHS sector 0
[ 270.260256] ata1.00: device reported invalid CHS sector 0
[ 270.260272] ata1.00: device reported invalid CHS sector 0
[ 270.260297] ata1: EH complete

I tried exiting early from /usr/lib/pm-utils/power.d/sata_alpm, thinking this was a SATA ALPM (via pm-utils) issue however this did not help.

I have tried changing the HDD and SATA cable to rule out a coincidental hardware failure however the issue persists. Switching back to Ubuntu 11.10 resolves the issue.

My SATA controller is as follows:
00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)

The machine in question is a Sony Vaio Y-Series VPCYB2M1E with AMD E-350.

*Edited for spelling

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-23-generic 3.2.0-23.36
ProcVersionSignature: Ubuntu 3.2.0-23.36-generic 3.2.14
Uname: Linux 3.2.0-23-generic x86_64
NonfreeKernelModules: fglrx
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu4
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 1: SB [HDA ATI SB], device 0: ALC269VB Analog [ALC269VB Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: david 1810 F.... pulseaudio
 /dev/snd/controlC0: david 1810 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'Generic'/'HD-Audio Generic at 0xf0244000 irq 42'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100200'
   Controls : 6
   Simple ctrls : 1
Card0.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Card1.Amixer.info:
 Card hw:1 'SB'/'HDA ATI SB at 0xf0240000 irq 16'
   Mixer name : 'Realtek ALC269VB'
   Components : 'HDA:10ec0269,104d5400,00100100'
   Controls : 18
   Simple ctrls : 10
Date: Tue Apr 17 20:21:25 2012
HibernationDevice: RESUME=UUID=52c3358d-d81b-4afb-8b8f-63a75dbe5126
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Beta amd64 (20120417)
MachineType: Sony Corporation VPCYB2M1E
ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-23-generic root=UUID=73dfb856-315d-4755-b265-473cb92b4c88 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-23-generic N/A
 linux-backports-modules-3.2.0-23-generic N/A
 linux-firmware 1.79
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/02/2011
dmi.bios.vendor: Insyde Corp.
dmi.bios.version: R0161Z7
dmi.board.asset.tag: N/A
dmi.board.name: VAIO
dmi.board.vendor: Sony Corporation
dmi.board.version: N/A
dmi.chassis.asset.tag: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Sony Corporation
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnInsydeCorp.:bvrR0161Z7:bd06/02/2011:svnSonyCorporation:pnVPCYB2M1E:pvrC900XXT7:rvnSonyCorporation:rnVAIO:rvrN/A:cvnSonyCorporation:ct10:cvrN/A:
dmi.product.name: VPCYB2M1E
dmi.product.version: C900XXT7
dmi.sys.vendor: Sony Corporation

Revision history for this message
David Barker (dbark) wrote :
Revision history for this message
David Barker (dbark) wrote :

Additionally - I have tried specifying libata.force=noncq kernel option on boot to no effect.

David Barker (dbark)
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

So you've tested the latest v3.4 kernel[0] and it also has this bug? If so, Would it be possible for you to open an upstream bug report at bugzilla.kernel.org [1]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc3-precise/
[1] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: regression-release
tags: added: kernel-da-key
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
David Barker (dbark) wrote :

@Joseph:
Yes, I have tested with 3.4 rc3 and the issue persists.
I will try to open an upstream bug report shortly.

Revision history for this message
David Barker (dbark) wrote :

Upstream (kernel) issue number is 43117.
Please see https://bugzilla.kernel.org/show_bug.cgi?id=43117

Note: This is the first time I have reported a kernel bug and as such there is a high likelihood that it is miscategorised or otherwise poorly filed. Any advice or help would be appreciated.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Colin Ian King (colin-king) wrote :

@David, it would be useful to factor out all the pm-utils/power.d scripts, so on AC power can you run:

sudo pm-powersave true

and see if the problem is triggered by this.

Revision history for this message
David Barker (dbark) wrote :

@Colin,
Thanks for the suggestion - I actually tried this (along with "sudo pm-powersave false" when running on battery) to no avail.
I also suspected a pm-utils script.

I also tried putting an copying the scripts from /usr/lib/pm-utils/power.d to /etc/pm/power.d/ and running "chmod a-x" on all of them to prevent them running (similar to the suggestion in https://help.ubuntu.com/community/PowerManagement/ReducedPower)

Sorry I forgot to mention this in my initial bug report!

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
VS (storvann) wrote :

I believe this might be related to what I'm seeing in bug 993507. Do the errors go away if you manually run "echo max_performance | sudo tee /sys/class/scsi_host/host*/link_power_management_policy" when on battery?

Revision history for this message
David Barker (dbark) wrote :

@VS:
I tried but that had no effect. Thanks for the suggestion.

I've tried running Fedora 17 (Beta) with kernel 3.3.3 - this works fine with no issues.
That points to it being Ubuntu related as the issue is present in Ubuntu with kernel 3.2.0-24 AND 3.4 mainline, but not present in Fedora 17 with 3.3.3.
If it was a kernel issue I wouldn't expect it to skip a version.

Revision history for this message
David Barker (dbark) wrote :

Some further experimentation has revealed that the issue may be with the drive Advanced Power Management function.

When operating on AC power, "hdparm -B /dev/sda" gives APM_level = 254
A few seconds after removing AC power, "hdparm -B /dev/sda" gives APM_level = 127

Running like this for a few minutes (generally anything between 30 seconds and 10 minutes) results in the above ATA errors.

Executing "hdparm -B 254 /dev/sda" after unplugging AC power appears to prevent the errors appearing.
This isn't ideal as it means the disk won't spin down (I think it may hae implications relating to disk head parking too though I don't know enough about this).

Plugging AC power back in then removing it sets APM_level back to 127, so in the absence of a fix for the ATA errors at APM_level 127, at the very least I need to find out how to prevent APM_level being set to 127 on removal of power.

Again - this did not occur with Ubuntu 11.x or Fedora 16 & 17 Beta 2.
Today I tried mainline kernel 3.4.0 rc5 (dated May 1st 2012) which also exhibits the same symptoms.

Any suggestions?

Revision history for this message
David Barker (dbark) wrote :

Tracking it through more files than I can recall, I found that /usr/lib/pm-utils/power.d/95hdparm-apm calls /lib/hdparm/hdparm-functions which includes the following lines:

if hdparm_is_on_battery; then
                hdparm_set_option -B127

Changing -B127 to -B254 seems to prevent APM causing ATA resets on battery mode. This method allows the other hdparm values to be set properly.
I'm not sure of the wider implications of this or what has changed since Ubuntu 11.x (did it also set -B 127? did something else change which has meant -B 127 causes ATA resets on my hardware?).

If anyone can help get my findings to the correct person/group/project to resolve this permenantly I'd be most grateful.

David Barker (dbark)
no longer affects: linux (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.