2.6.24-19-generic libata soft resetting link every 30 secs when under heavy load

Bug #270794 reported by qwerty on 2008-09-16
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

lsb_release -rd :
Description: Ubuntu 8.04.1
Release: 8.04

uname -a :
Linux srv01 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux

The problem started with Ubuntu 8.04 (I didn't have problems with previous versions ... Ubuntu 7.04, Ubuntu 7.10).

I have a PC with 1GB RAM and 2 HDs (IDE... 40 MB ... "QUANTUM FIREBALLlct20", SATA 120MB ... "SAMSUNG SP1213C"), my Linux is in the SATA disk.

I have run diagnostic tests on both disk ... both disks are Ok.

The problem is that the SATA disk ramdomly freezes for 30 seconds (aprox) when i try to use it (Ex.: copy files, launch programs, etc). It seems that the problem occurs only when the IDE disk is being used by other programs.

The "/var/log/messages" file shows :

Sep 16 00:12:48 srv01 kernel: [38893.436115] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 16 00:12:48 srv01 kernel: [38893.436137] ata3: soft resetting link
Sep 16 00:12:48 srv01 kernel: [38893.608103] ata3.00: configured for UDMA/133
Sep 16 00:12:48 srv01 kernel: [38893.608118] ata3: EH complete
Sep 16 00:12:48 srv01 kernel: [38893.616780] sd 2:0:0:0: [sdb] 234493056 512-byte hardware sectors (120060 MB)
Sep 16 00:12:48 srv01 kernel: [38893.616801] sd 2:0:0:0: [sdb] Write Protect is off
Sep 16 00:12:48 srv01 kernel: [38893.616834] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 16 00:13:18 srv01 kernel: [38923.847114] ata3.00: limiting speed to UDMA/100:PIO4
Sep 16 00:13:18 srv01 kernel: [38923.847135] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 16 00:13:18 srv01 kernel: [38923.847153] ata3: soft resetting link
Sep 16 00:13:18 srv01 kernel: [38924.019129] ata3.00: configured for UDMA/100
Sep 16 00:13:18 srv01 kernel: [38924.019145] ata3: EH complete
Sep 16 00:13:18 srv01 kernel: [38924.039830] sd 2:0:0:0: [sdb] 234493056 512-byte hardware sectors (120060 MB)
Sep 16 00:13:18 srv01 kernel: [38924.054328] sd 2:0:0:0: [sdb] Write Protect is off
Sep 16 00:13:18 srv01 kernel: [38924.088830] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

The "/var/log/dmesg" file shows :

[ 18.822085] scsi0 : ata_piix
[ 18.823482] scsi1 : ata_piix
[ 18.824932] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[ 18.824936] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 19.026180] ata1.00: ATA-5: QUANTUM FIREBALLlct20 40, APL.0900, max UDMA/100
[ 19.026186] ata1.00: 78177792 sectors, multi 8: LBA
[ 19.026205] ata1.00: limited to UDMA/33 due to 40-wire cable
[ 19.042841] ata1.00: configured for UDMA/33
[ 19.516954] ata2.00: ATAPI: CREATIVE CD5233E, C2.05, max MWDMA2
[ 19.516988] ata2.01: ATAPI: HL-DT-ST GCE-8520B, 1.03, max UDMA/33
[ 19.681301] ata2.00: configured for MWDMA2
[ 19.852290] ata2.01: configured for UDMA/33
[ 19.852431] scsi 0:0:0:0: Direct-Access ATA QUANTUM FIREBALL APL. PQ: 0 ANSI: 5
[ 19.853250] scsi 1:0:0:0: CD-ROM CREATIVE CD5233E 2.05 PQ: 0 ANSI: 5
[ 19.853711] scsi 1:0:1:0: CD-ROM HL-DT-ST CD-RW GCE-8520B 1.03 PQ: 0 ANSI: 5
[ 19.853786] ata_piix 0000:00:1f.2: MAP [ P0 -- P1 -- ]
[ 19.853807] ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 18
[ 19.853851] PCI: Setting latency timer of device 0000:00:1f.2 to 64
[ 19.854163] scsi2 : ata_piix
[ 19.854345] scsi3 : ata_piix
[ 19.855386] ata3: SATA max UDMA/133 cmd 0xec00 ctl 0xe800 bmdma 0xdc00 irq 18
[ 19.855390] ata4: SATA max UDMA/133 cmd 0xe400 ctl 0xe000 bmdma 0xdc08 irq 18
[ 20.016097] ata3.00: ATA-7: SAMSUNG SP1213C, SV100-34, max UDMA7
[ 20.016102] ata3.00: 234493056 sectors, multi 16: LBA48
[ 20.024089] ata3.00: configured for UDMA/133
[ 20.189814] scsi 2:0:0:0: Direct-Access ATA SAMSUNG SP1213C SV10 PQ: 0 ANSI: 5
[ 20.203485] Driver 'sd' needs updating - please use bus_type methods
[ 20.203589] sd 0:0:0:0: [sda] 78177792 512-byte hardware sectors (40027 MB)
[ 20.203609] sd 0:0:0:0: [sda] Write Protect is off
[ 20.203613] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 20.203640] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 20.203701] sd 0:0:0:0: [sda] 78177792 512-byte hardware sectors (40027 MB)
[ 20.203718] sd 0:0:0:0: [sda] Write Protect is off
[ 20.203721] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 20.203748] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 20.203752] sda:<4>Driver 'sr' needs updating - please use bus_type methods
[ 20.215538] sda1
[ 20.215620] sd 0:0:0:0: [sda] Attached SCSI disk
[ 20.215715] sd 2:0:0:0: [sdb] 234493056 512-byte hardware sectors (120060 MB)
[ 20.215734] sd 2:0:0:0: [sdb] Write Protect is off
[ 20.215738] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 20.215766] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 20.215823] sd 2:0:0:0: [sdb] 234493056 512-byte hardware sectors (120060 MB)
[ 20.215839] sd 2:0:0:0: [sdb] Write Protect is off
[ 20.215843] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 20.215869] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 20.215873] sdb:sr0: scsi3-mmc drive: 0x/56x cd/rw xa/form2 cdda tray
[ 20.219094] Uniform CD-ROM driver Revision: 3.20
[ 20.219146] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 20.222438] sdb1 sdb2 sdb3 sdb4 <sr1: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
[ 20.222771] sr 1:0:1:0: Attached scsi CD-ROM sr1
[ 20.236407] sdb5 >
[ 20.244017] sd 2:0:0:0: [sdb] Attached SCSI disk
[ 20.262538] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 20.262566] sr 1:0:0:0: Attached scsi generic sg1 type 5
[ 20.262593] sr 1:0:1:0: Attached scsi generic sg2 type 5
[ 20.262619] sd 2:0:0:0: Attached scsi generic sg3 type 0
[ 20.814795] Attempting manual resume
[ 20.814800] swsusp: Resume From Partition 8:19
[ 20.814802] PM: Checking swsusp image.
[ 20.814998] PM: Resume from disk failed.
[ 20.852639] kjournald starting. Commit interval 5 seconds
[ 20.852655] EXT3-fs: mounted filesystem with ordered data mode.

Running "hdparm /dev/sda" ... IDE disk :
/dev/sda:
 IO_support = 0 (default)
16-bit)
 HDIO_GET_UNMASKINTR failed: Inappropriate ioctl for device
 HDIO_GET_DMA failed: Inappropriate ioctl for device
 HDIO_GET_KEEPSETTINGS failed: Inappropriate ioctl for device
 readonly = 0 (off)
 readahead = 256 (on)
 geometry = 4866/255/63, sectors = 78177792, start = 0

On previous versions of Ubuntu "hdparm" never showed errors or warnings ... "HDIO_GET_UNMASKINTR failed"

---------------------------------

I guess it is a bug, but if it is not... does anyone knows how to solve the problem ??

Thanks,

Dimitrios Symeonidis (azimout) wrote :

First of all, don't worry about the error messages from hdparm. Hdparm is for IDE disks. Try the same thing with "sdparm" instead, and you'll see it works fine.

Then, the messages in your kernel log (dmesg) seem normal, so no clues from there.

Can you try and maybe identify in which cases the disk freezes? What exactly happens when it freezes? Does the whole system freeze, or just your transfer stops?

Have you considered the possibility of a failing disk? Try installing smartmontools, which will let you see the SMART messages from your disks...

Finally, just in case, try upgrading to the latest kernel and see if that fixes it...

good luck

qwerty (escalantea) wrote :

The mouse works and all active programs (using the IDE disk) continue working, but I can't launch new programs or switch between active programs (since my Linux system is in the frozen disk).

The last time the freezing ocurred, I was trying to copy a large file (300 MB) from my IDE disk to the SATA disk. There were several "freezes" and once each "freeze" was over (30 Secs later), the "/var/log/messages" file showed that my SATA disk was reconfigured ... first UDMA/133, later UDMA/100 and finally UDMA/33.

I checked both disks with diagnostic tools (including surface scan), and installed smartmontools to see the SMART log (no errors there), both disk are Ok.

Thanks,

Dimitrios Symeonidis (azimout) wrote :

Try upgrading to the latest kernel and see if that fixes the problem.
Also, please attach the output of "lspci -nn" and the file /var/log/messages, as file attachments, not inside the comment.
Thank you

qwerty (escalantea) wrote :

One more detail ... my IDE disk has only 1 partition ... NTFS ... I have a Windows XP there (which I haven't used in weeks).

I might have identified how the problem was originated (I can't be sure since it randomly appears). Several weeks ago I configured Azureus (torrent downloader) to save the downloaded files in a directory in the IDE disk.

It seems that the freezing occurs (randomly) when I try to access (edit, view, copy, etc) for the first time (no problems the second time) the Azureus downloaded file(s); It doesn't matter if Azureus is still running or not (I exited the Azureus program). By the way, the downloaded files are Ok (Azureus verifies them after downloading and I re-check them with md5sum).

Could the problem might be related to the NTFS read/write support ?? ... Azureus makes massive read/writes to the IDE disk (NTFS)

As for myself, I re-configured Azureus to save the downloaded files in the SATA disk and I haven't had anymore problems (at least the last 2 days).

Thanks,

qwerty (escalantea) wrote :

Just in case, here is the /var/log/messages.

Thanks

Dimitrios Symeonidis (azimout) wrote :

Ok, so basically the IDE controller resets the link every 30 seconds when under heavy load. The fact that it's NTFS is probably irrelevant (though I'm not ruling it out).

I would look into the libata module...

Switching status from incomplete back to new

Dimitrios Symeonidis (azimout) wrote :

qwerty, is this still an issue for you? which kernel version are you running now?
have you looked into the possibility of a faulty (or failing) hard drive?

Changed in linux:
assignee: azimout → nobody
status: New → Incomplete
importance: Undecided → Medium
Ahmed Kotb (kotbcorp) wrote :

i have the same problem but i have only 1 sata hd (3 ntfs partitions and 1 ext 3)....
the problem happen when i make any thing that deals with the hard disk (either ntfs or ext3 partition)...
i also noticed that when ubuntu starts the hdd is configured for udma/133 and every thing is perfect but after playing a video or using firefox or any thing that performs io operations....it decreases till reaching udma/33 and then ubuntu freezes completely...
i have this problem from ubuntu 8.04 ...please help :(

qwerty (escalantea) wrote :

I'm using Ubuntu 8.10 (uname -r ... 2.6.27-9-generic) and the issue remains. The hard drive is fine, checked with disk utils from Samsung (it's a Samsung SP1213C disk), the SMART report log doesn't report any errors and just in case, fsck doesn't report errors either.

I've read that there are some problems with libata and Intel (http://linux-ata.org/faq.html), so i'm using "combined_mode=libata" (my mainboard is Intel) in my kernel boot options and it seems it's helping (the disk freezing remains, but the disk performance does not decrease any more).

Thanks.

Ahmed Kotb (kotbcorp) wrote :

iam not sure but i think that the (2.6.27-9-generic) kernel has solved the problem partially
what i mean is that i got those errors rarely (during normal use...no video files) and the performance doesn't decrease...(it remains udma/133)

before this update i got alot of errors even when iam using firefox and this error cause the hdd to be configured at udma/33 then the system freezes...
but now i got those errors rarely without the decrease in hd performance but playing any video files will sure make the system hangs

Dimitrios Symeonidis (azimout) wrote :

marking as triaged

Changed in linux:
status: Incomplete → Triaged
qwerty (escalantea) wrote :

I believe i've found the problem (it's been a week without the disk freezing ... and i've been doing a lot of tests under heavy load).

It seems that the problem is related to "pdflush", i guess the default parameters (the ones that came with my original Ubuntu installation (first 8.04 and later 8.10) could use some tunning.

I noticed (... "cat /proc/meminfo") that the "Dirty" values grow until almost reach 40000 kB (my PC RAM = 1 GB) and stay like that for too much time before they were cleaned. I guessed that for some unknown reason "pdflush" wasn't cleaning the "Dirty" area as frequently as it was needed.

So I modified the "/etc/sysctl.conf" in order to have "pdflush" activated more frequently (... added the following lines) :
vm.dirty_writeback_centisecs = 200
vm.dirty_expire_centisecs = 400

... A system reboot and no more freezing ... (just to be safe a prior "fsck" to clean previous problems caused by previous freezings).

Note :
1. For info about "The Linux Page Cache and pdflush" I found this :
http://hi.baidu.com/pkubuntu/blog/item/d7413c01c8747b0b7bec2c9e.html

2. It's not needed to alter the "/etc/sysctl.conf" file and reboot the system to change the "pdflush" parameters, it can be done (for testing purposes) by issuing the following commands ( ... must be root) :
echo 200 > /proc/sys/vm/dirty_writeback_centisecs
echo 400 > /proc/sys/vm/dirty_expire_centisecs

3. I've seen in the forum other Bug reports related to sata disks and freezes that might be solved by fine tunning the "pdflush parameters" ... does it means that the default "pdflush" parameters are not the best values for those users with sata disks ???

Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Triaged a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
SpmP (scarletpimpernal) wrote :

Well I thought it was fixed w/ 2.6.34, but only better.
Changing dirty_writeback_centisecs etc. as per qwerty w/ 2.6.34 seems to have stoped the ata resets etc. w/via epia-en
 Disk wasn't locking up, just reduced to super slow mode. ~15mb/s.

so, thanks qwerty.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers