SATA problems with 2.6.32 and nvidia MCP51 controller

Bug #573737 reported by szu
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-2.6.32-21-generic

my SATA drive started behaving strangely after the lucid upgrade (which includes 2.6.32 kernel).

effects include:
#1 complete failures & system hangs:
end_request: I/O error, dev sda, sector 365467184 (repeat ad infinitum)

#2 transient problems (this seems to slow down the startup)
ata4: lost interrupt (Status 0x51)
ata4.01: qc timeout (cmd 0xa0)
sr 3:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
ata4: soft resetting link
ata4.00: configured for UDMA/66
ata4.01: configured for UDMA/33
ata4.00: qc timeout (cmd 0xa0)
ata4.00: TEST_UNIT_READY failed (err_mask=0x5)
ata4: soft resetting link
ata4.00: configured for UDMA/66
ata4.01: configured for UDMA/33
ata4.00: qc timeout (cmd 0xa0)
ata4.00: TEST_UNIT_READY failed (err_mask=0x5)
ata4.00: limiting speed to UDMA/66:PIO3
ata4: soft resetting link
ata4.00: configured for UDMA/66
ata4.01: configured for UDMA/33

#3 random glitches, like window decorator does not start with gnome session(!), usb mouse gets disconnected and reconnected again

#1 only happened once (and was really scary :-O)
#2 happens on every boot
#3 is completely random and i'm not sure if it is connected with SATA

booting with the previous kernel version (2.6.31) solves all the problems
(except that lucid can't show the graphics splash screen with the old kernel, but that's a different issue)

i won't run that kernel again (and risk the filesystem corruption) so i don't have dmesg but i'm attaching the /var/log/messages from the previous boot

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-21-generic 2.6.32-21.32
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.31-21.59-generic
Uname: Linux 2.6.31-21-generic i686
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.20.
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: sz 4625 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xf9ff8000 irq 22'
   Mixer name : 'Realtek ALC888'
   Components : 'HDA:10ec0888,14627346,00100001'
   Controls : 36
   Simple ctrls : 21
Date: Sun May 2 16:29:07 2010
HibernationDevice: RESUME=UUID=5689798a-0e20-4e2d-9265-ee4ee17af5fd
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.

 vboxnet0 no wireless extensions.
MachineType: MSI MS-7346
ProcCmdLine: root=UUID=86cdb03b-9986-458d-a8b0-bfe8a431199a ro quiet splash
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34
RfKill:

SourcePackage: linux
WpaSupplicantLog:

dmi.bios.date: 04/13/2007
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: V1.0
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MS-7346
dmi.board.vendor: MSI
dmi.board.version: 1.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrV1.0:bd04/13/2007:svnMSI:pnMS-7346:pvr1.0:rvnMSI:rnMS-7346:rvr1.0:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: MS-7346
dmi.product.version: 1.0
dmi.sys.vendor: MSI

Revision history for this message
szu (szulat) wrote :
Revision history for this message
szu (szulat) wrote :

note:
all attachment except from the first one were added automatically while running the good 2.6.31 and may not reflect the relevant system state.

Revision history for this message
szu (szulat) wrote :

i tried another kernel today, and the randomly chosen "2.6.33.3-lucid" seems to fix the issue
(from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33.3-lucid/ )

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi szu,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
szu (szulat) wrote :

i'm not sure if i understand the terminology here so please correct me if i am wrong: the KernelMainlineBuilds page pointed me to http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/
so i downloaded the following version: 2.6.34-999.201005041005
both test runs resulted in ata6 errors similar to 2.6.32

however, now i'm not so sure the 2.6.33 is any better. previously i only looked at the logs and they seemed okay because there were no errors - but now i see that all my 2.6.33 boots produced in kern.log was:

May 4 14:42:14 ubuntu kernel: Kernel logging (proc) stopped.
May 4 14:43:31 ubuntu kernel: imklog: Cannot read proc file system, 1.

and there ARE ata-related errors in 2.6.33 dmesg (interestingly, they now refer to ata4 instead of ata6 and i only have one hard drive)

so the only thing i know for sure is:
- there were no such errors before upgrading to lucid (kernels before 2.6.31)
- the only crash + I/O error, dev sda, happened in 2.6.32

do these errors messages mean the sata driver is doing something to prevent something bad happening? or do they indicate that something bad has happened already? no idea... i'm just reporting what i see.

Revision history for this message
mbrudka (mbrudka) wrote :

I can confirm this. I also have various nvidia MCP51 SATA problems in 2.6.32-22 after upgrade to lucid. Disk when checked using fsck and badblocks is valid when booted from good old Feisty CD. If necessary I can provide full log data using appport-bug.

Revision history for this message
alf@all.de (alf-all) wrote :

I have a similar problems with nVidia MCP55 which affect SATA & PATA. See Bug #577785 .

Revision history for this message
mbrudka (mbrudka) wrote :

I propose to change the importance of this bug to the critical level and status to confirmed. I/O errors currently disable to use lucid in my box at all. I even have a problem to try to downgrade kernel to 2.6.31 (possible?) as dpkg timeouts when accessing its databases in /var... As I can access and check my disk using previous ubuntu releases I am almost sure that this bug was introduced in lucid and is not related with hardware failures.

Revision history for this message
mbrudka (mbrudka) wrote :

Sorry. My problem is not related with upgrade to lucid. I tried to install 2.6.31 using feisty CD with 2.6.20, and after some time similar (though different) ata errors were reported .. It seems my disc is broken :( Please do not consider my comments as relevant to #573737.

Revision history for this message
szu (szulat) wrote :

new discovery:
it seems that the instability is triggered by my IDE DVD drive (LITE-ON DVDRW SOHW-1653S)
SATA, IDE and USB are all controlled by the same chip (nvidia MCP51) so it kinda explains such broad effect.
the machine worked perfectly during 3 weeks when the drive was removed. yesterday it came back and the bug reappeared.

(of course the same drive worked flawlessly in older ubuntu versions and it still works in windows xp)

Revision history for this message
camden lindsay (camden-lindsay+launchpad) wrote :

I am also having problems with IDE from the MCP51 chipset.

My DL dvd rom drive will no longer record properly.

Will attach the log output.

Revision history for this message
camden lindsay (camden-lindsay+launchpad) wrote :
Revision history for this message
camden lindsay (camden-lindsay+launchpad) wrote :
Revision history for this message
camden lindsay (camden-lindsay+launchpad) wrote :

Interesting-- the other DVD recorder that I own doesn't exhibit this behavior. You may be able to chalk up my experience here to hardware incompatibilities or a bad dvd burner....

Revision history for this message
camden lindsay (camden-lindsay+launchpad) wrote :

Apparently i spoke too soon :(

Jul 31 12:02:56 core2buntu kernel: [ 47.710147] warning: `growisofs' uses 32-bit capabilities (legacy support in use)
Jul 31 12:06:15 core2buntu kernel: [ 246.040058] ata2: lost interrupt (Status 0x50)
Jul 31 12:06:46 core2buntu kernel: [ 277.040066] ata2: lost interrupt (Status 0x51)
Jul 31 12:06:51 core2buntu kernel: [ 282.040038] ata2.00: qc timeout (cmd 0xa0)
Jul 31 12:06:51 core2buntu kernel: [ 282.040055] sr 1:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00
Jul 31 12:06:51 core2buntu kernel: [ 282.040105] ata2: soft resetting link
Jul 31 12:06:51 core2buntu kernel: [ 282.280428] ata2.00: configured for UDMA/33
Jul 31 12:06:51 core2buntu kernel: [ 282.320400] ata2.01: configured for UDMA/33
Jul 31 12:06:56 core2buntu kernel: [ 287.320035] ata2.00: qc timeout (cmd 0xa0)
Jul 31 12:06:56 core2buntu kernel: [ 287.320044] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu development release http://cdimage.ubuntu.com/daily-live/current/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.