DMA timeouts on UDMA harddisks

Bug #25975 reported by Fabian Schindler
14
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

Dualboot system on my Laptop (Issam Smartbook i1000c) (one OS being Windows XP
SP2.. sorry). Ubuntu 5.10 is installed on the Laptop. From time to time I
experience temoprary harddisk freezes or lockups. I checked if the Hitachi 40GB
2,5" hdd might be faulty. Tested the drive with Hitachis system tools from their
home page and the drive reported well on all integrity and health checks. I did
health checks again with Ubuntu and the harddrive was sane again. No lockups in
Windows XP at all. The drive was checked by an IT-expert at University and no
problems were found. Everything was sane. But in /var/log/messages, there is
this error reported repeatedly:

...
Nov 20 15:35:35 localhost kernel: [4294765.104000] hda: dma_timer_expiry: dma
status == 0x20
Nov 20 15:35:35 localhost kernel: [4294765.104000] hda: DMA timeout retry
Nov 20 15:35:35 localhost kernel: [4294765.104000] hda: status error:
status=0x58 { DriveReady SeekComplete DataRequest }
Nov 20 15:35:35 localhost kernel: [4294765.104000]
Nov 20 15:35:35 localhost kernel: [4294765.104000] ide: failed opcode was: unknown

The drive seems to use the UDMA mode.

user@kopernikus:~$ sudo hdparm -i /dev/hda
/dev/hda:

 Model=HITACHI_DK23FB-40, FwRev=00M1A0A1, SerialNo=4CY679
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off
 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=78140160
 IORDY=yes, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-5 T13 1321D revision 3:

 * signifies the current active mode

The drive is NOT nearing death. That was checked already.
(I searched bugzilla but did not find similar reports, so I hope it is not a
duplicate entry.)
Thank you for your attention. :)

Revision history for this message
Adriaan Peeters (apeeters) wrote :

I have a similar problem on my Dell Latitude D505, except that the status nr is
0x21 and the message seems to be Busy:

Aug 24 17:00:31 twiadria kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 24 17:00:46 twiadria kernel: hda: DMA timeout error
Aug 24 17:00:46 twiadria kernel: hda: dma timeout error: status=0xd0 { Busy }
Aug 24 17:00:46 twiadria kernel:
Aug 24 17:00:46 twiadria kernel: ide: failed opcode was: unknown
Aug 24 17:00:46 twiadria kernel: hda: DMA disabled
Aug 24 17:00:46 twiadria kernel: ide0: reset: success

I am running in udma5, but even switching to udma2 (using "hdparm -d1 -X66
/dev/hda") does not seem to fix the problem. I also ran the Dell Diagnostics,
but it finished without any problems.

This problem is also raised for Debian [1] and Fedora [2] and I guess other
distributions as well. Apparently no one reported it upstream yet.

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=321409
[2] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132584

/dev/hda:

 Model=IC25N030ATMR04-0, FwRev=MOAOAD0A, SerialNo=MRG2E0KBHZS3DJ
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=1740kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=58605120
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 3a:

Revision history for this message
Fabian Schindler (fabianmschindler) wrote :

DMA disabling and system permormance was tested now with launching the following
command:
hdparm -d0 -B 255 /dev/hda

Result was a stable system. Speed slowdowns are minimal. Changing the
/etc/hdparm.conf accordingly did not help for bypassing the problem however. The
kernel ignored the commands of hdparm.conf, thus hdparm -d0 -B 255 /dev/hda
needs to be launched from a terminal after every login.

Revision history for this message
Adriaan Peeters (apeeters) wrote :

Created an attachment (id=5232)
hdparm output on Latitude D505

I do not think disabling DMA is acceptable. This reduces the read speed to
approximately 3MB/sec. Attached are some test results. As for the -B option:
disabling apm on a laptop does not seem very useful either.

Revision history for this message
Fabian Schindler (fabianmschindler) wrote :

I agree, disabling DMA is not really acceptable as a "solution". Launching the
afore mentioned command was the only possibility for stabilizing my own system.
The -B option was necessary for removing a power-management problem with the
Hitachi-drive. Hitachi seems to have created a "buggy" harddisk for laptops again.

Revision history for this message
Ben Collins (ben-collins) wrote :

If possible, please upgrade to Dapper's 2.6.15-7 kernel. If you do not want to
upgrade to Dapper, then you can also wait for the Dapper Flight 2 CD's, which
are due out within the next few days.

Let me know if this bug still exists with this kernel.

Revision history for this message
Fabian Schindler (fabianmschindler) wrote :

Tried system upgrade to dapper. Success: 100% negative. System broke completely.
Will try the Flight CD's once they are available.

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #6)
> Tried system upgrade to dapper. Success: 100% negative. System broke completely.
> Will try the Flight CD's once they are available.

http://cdimage.ubuntulinux.org/releases/dapper/flight-2/

Revision history for this message
Fabian Schindler (fabianmschindler) wrote :

Downloaded and tried to install Dapper Flight-2. New problems arise:

1. CD does not detect the hdd most times and if it detects the hdd, it refuses
to detect any existing partitions and always asks for formatting the whole drive
(I simply can't permit Dapper to do that, sorry) which is reported to be 16 GB
size, although disk size is 40GB.

2. The installer confuses the BIOS. After a reboot, no harddrive is shown in
BIOS. Need to reboot a dozen times in order get my hdd running again. (BIOS
auto-detection of devices is somehow affected by the hardware-detection menu in
Dapper) I ran sanity checks with Hitachi-tools on the drive again and it is not
a fault of the drive.

No idea which path to take now. :(

PS: I checked the bug and Slackware, Mandriva, Debian and others had the same
DMA-timeouts, no matter if using 2.4 series or 2.6 series kernel.

Revision history for this message
Ben Collins (ben-collins) wrote :

With the filesystem not seeing all of your drive, and then your BIOS not seeing
the drive at all, that is more of a hardware issue. Nothing in the system would
affect your BIOS like that.

Make sure your BIOS reads the drive correctly (correct heads/cylinders/sectors
and LBA settings). Make sure that linux sees the same numbers as your BIOS reports.

Revision history for this message
Fabian Schindler (fabianmschindler) wrote :

Did a BIOS upgrade. Harddrive is working now. The Dapper-CD seems to have been
simply a bad burn. Just installed the system and will test it under heavy
workload with the new Kernel. Will report back in a few days if the problem
still exists, but seems to be quite stable right now.

Thanks for your patience. :)

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #10)
> Did a BIOS upgrade. Harddrive is working now. The Dapper-CD seems to have been
> simply a bad burn. Just installed the system and will test it under heavy
> workload with the new Kernel. Will report back in a few days if the problem
> still exists, but seems to be quite stable right now.
>
> Thanks for your patience. :)

Excellent. I'm going to go ahead and close the bug, but feel free to reopen it
if you find the bug still exists.

Thanks!

Revision history for this message
Adriaan Peeters (apeeters) wrote :

> Excellent. I'm going to go ahead and close the bug, but feel free to reopen it
> if you find the bug still exists.

Unfortunately I have to reopen this bug. I tried Flight 2 and was able to
reproduce the error from my earlier comment 1.

Revision history for this message
frogzoo (frogzoo) wrote :

I was seeing timeouts but they seem to have gone away. I think it's a question of the hdparm settings being poorly configured

these settings in /etc/hdparm.conf works fine for me

command_line {
        hdparm -q -d1 -q -X udma5 -q -c3 -q -m 16 -q -W1 /dev/hda
}

Maybe it's an issue without multiple sector io enabled (hdparm -m 16)

Revision history for this message
Ben Collins (ben-collins) wrote :

If the settings work ok, then maybe hdparm just need to set things right by default for you.

Please target a bug against hdparm if you feel it's needed.

Changed in linux-source-2.6.15:
status: Unconfirmed → Rejected
Revision history for this message
Adriaan Peeters (apeeters) wrote :

These hdparm settings seem to decrease the occurence of the timeouts, but they still occur on large data transfers.

Changed in linux-source-2.6.15:
status: Rejected → Confirmed
Revision history for this message
Adriaan Peeters (apeeters) wrote :

I was able to 'fix' this issue by replacing the hard disk.

While I was doing some heavy I/O on the harddisk, the DMA timeout occured and the filesystem crashed heavily! I decided to replace the harddisk (7200 rpm instead of 4800) and I didn't observe the issue anymore.

Revision history for this message
Surricani (surricani) wrote :

I own an Acer Aspire 162LM Laptop that suffer of this same bug.

Frequently, when I make a high I/O use of the disk I see that the system Freeze fore some seconds and in the dmesg I see:

[59509.084000] hda: dma_timer_expiry: dma status == 0x21
[59519.084000] hda: DMA timeout error
[59519.084000] hda: dma timeout error: status=0xd0 { Busy }
[59519.084000] ide: failed opcode was: unknown
[59519.084000] hda: DMA disabled
[59519.132000] ide0: reset: success

and the system return to works ok.

I have a

[ 16.977963] VP_IDE: IDE controller at PCI slot 0000:00:11.1
[ 16.978054] ACPI: Unable to derive IRQ for device 0000:00:11.1
[ 16.978057] ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI
[ 16.978067] VP_IDE: chipset revision 6
[ 16.978069] VP_IDE: not 100% native mode: will probe irqs later
[ 16.978081] VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci0000:00:11.1
[ 16.978089] ide0: BM-DMA at 0x1c60-0x1c67, BIOS settings: hda:DMA, hdb:pio
[ 16.978101] ide1: BM-DMA at 0x1c68-0x1c6f, BIOS settings: hdc:DMA, hdd:pio
[ 16.978108] Probing IDE interface ide0...
[ 17.396066] hda: IC25N040ATMR04-0, ATA DISK drive
[ 17.931617] ieee1394: Host added: ID:BUS[0-00:1023] GUID[000ae4045210430c]
[ 18.067738] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

and Ubuntu 7.04 Feisty Fawn.

Revision history for this message
Marco Cimmino (cimmo) wrote :

I have the same problem, I have tried different solutions, also mentioned here, but none of that worked.
Until Feisty the kernel option ide=nodma at least removed the error messages and speed-up the boot and of course decreased hard disk performance, now withy Gutsy 7.10 this kernel parameter doesn't work anymore, messages are shown anyway and boot takes ages.

please help me
Marco

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
kernel-janitor (kernel-janitor) wrote :

This bug report was marked as Confirmed a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Unassigned from Ben Collins. Marked Invalid. If this is still being experienced in Karmic or Lucid, please open a new bug and post the apport data.

-JFo

Changed in linux (Ubuntu):
assignee: Ben Collins (ben-collins) → nobody
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.