Data Corruption on AMD64 SATA System with Breezy Badger

Bug #20988 reported by Linus van Pelt
20
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

Dear all,

writing data on SATA harddisk (Silicon Image 3512A Onboard-RAID) under Breezy
Badger (kernel version 2.6.12-7-amd64-k8, libata version 1.11 loaded; sata_sil
version 0.9) cause sometimes errors in md5sum check espeacially when large files
are copied or moved.

The same problem occurs under Fedora Core 4 or SuSE 9.3 with different kernel
versions.

Memory and harddisk have been checked without any evidence.

With Hoary Hedgehog kernel 2.6.10-5-amd64-k8 everything work fine (libata
version 1.10 loaded; sata_sil version 0.8). Some x86 kernels (e.g. Knoppix) also
works without problems as well as Debian Sarge AMD64 (not really suprising,
isn't it?).

The modules loaded in Breezy seem not to differ relevantly from Hoary except of
some versions (of cause).

I guess I have tried nearly everything to figure out where the problem is
located - so I may give any additional information on request.

So maybe anybody out there has any idea and find help before I go mad.

Greetings.

System description:
  Shuttle XPC SN85Gv2
  Motherboard FN85
  NVIDIA nForce3

  AMD Athlon(tm) 64 Processor 2800+

  RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI
3512 (rev 1)
  Harddisk
  Vendor: ATA Model: SAMSUNG SP1614C Rev: SW10
  Type: Direct-Access ANSI SCSI revision: 05

Revision history for this message
Linus van Pelt (c-t-b) wrote :

Created an attachment (id=3560)
output of cat /proc/pci

Revision history for this message
Linus van Pelt (c-t-b) wrote :

Created an attachment (id=3561)
output of lsmod

Revision history for this message
Ben Collins (ben-collins) wrote :

I'd really be interested if a stock compile of 2.6.12 shows the problem or not.
I've been noticing a lot of issues lately with amd64's DMA (not just in SATA),
so my first guess is that, but I can't be sure. We also have a lot of libata
patches in current breezy kernels, which may also be the cause.

Revision history for this message
Linus van Pelt (c-t-b) wrote :

(In reply to comment #3)
> I'd really be interested if a stock compile of 2.6.12 shows the problem or not.
> I've been noticing a lot of issues lately with amd64's DMA (not just in SATA),
> so my first guess is that, but I can't be sure. We also have a lot of libata
> patches in current breezy kernels, which may also be the cause.

Dear Ben Collins,

I'm not totally sure about the meaning of 'stock compile' kernel. If you give me
a hint, i'll try.

I don't know about the content of patches from libata 1.10 -> 1.11 in the ubuntu
kernel. But I guess that this might not the cause because the data corruption
happens with former libata version under different linux flavors (SuSE/Fedora).

Meanwhile the DMA suggestion sounds good to me cause I discover a new data
corruption problem under hoary while sending large files through a LAN (which
seems not to be present in breezy - but it's finally not tested so I will post
details later).

Thanks and bye,
Christian

Revision history for this message
Ben Collins (ben-collins) wrote :

If possible, please upgrade to Dapper's 2.6.15-7 kernel. If you do not want to
upgrade to Dapper, then you can also wait for the Dapper Flight 2 CD's, which
are due out within the next few days.

Let me know if this bug still exists with this kernel.

Revision history for this message
Linus van Pelt (c-t-b) wrote :

(In reply to comment #5)
> If possible, please upgrade to Dapper's 2.6.15-7 kernel. If you do not want to
> upgrade to Dapper, then you can also wait for the Dapper Flight 2 CD's, which
> are due out within the next few days.
>
> Let me know if this bug still exists with this kernel.

I did the upgrade to Dapper's 2.6.15-7, 2.6.15-8, and 2.6.15-9 but the bug still
occurs.

Today I heart about the offical release of kernel 2.6.15 and the re-coding of
sata and sil modules. Any hope for me.

Bye
Chris

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #6)
> (In reply to comment #5)
> > If possible, please upgrade to Dapper's 2.6.15-7 kernel. If you do not want to
> > upgrade to Dapper, then you can also wait for the Dapper Flight 2 CD's, which
> > are due out within the next few days.
> >
> > Let me know if this bug still exists with this kernel.
>
> I did the upgrade to Dapper's 2.6.15-7, 2.6.15-8, and 2.6.15-9 but the bug still
> occurs.
>
> Today I heart about the offical release of kernel 2.6.15 and the re-coding of
> sata and sil modules. Any hope for me.

It's not likely to be any different. We are up to 2.6.15-11.16, so if you
upgrade to that (which is 2.6.15 final based), then you can find out. I'd also
be interested to know if you could boot a daily liveCD for i386 and see if it
exhibits the same problem (e.g. if this may be an amd64 kernel issue).

Revision history for this message
Carthik Sharma (carthik) wrote :

Thank you for reporting this bug and following up on it.

I am marking this bug Closed since there has been no response from you for over three months. We would like to fix all existing issues, but need need feedback to help with debugging.

Should you still have a problem with the latest up to date Dapper kernel and packages, please reopen this bug. In this case, please answer the questions that have already been asked of you before and provide the log files and information required for debugging.

Changed in linux-source-2.6.15:
status: Needs Info → Rejected
Revision history for this message
towsonu2003 (towsonu2003) wrote :
Download full text (3.6 KiB)

User gave new input thru a new bug, I'm just quoting that info:
[quote]
Dear all,

unfortunately you close Bug #20988 due to inactivity. Sorry for that but I was a little short in time. Nevertheless the problem still exists and I will give you some more information today.

With success I tested the following kernels:

Knoppix 4.0.1 Linux Knoppix 2.6.12 #2 SMP Tue Aug 9 23:20:52 CEST 2005 i686 GNU/Linux

KNOPPIX 4.0.2 Linux Knoppix 2.6.12 #2 SMP Tue Aug 9 23:20:52 CEST 2005 i686 GNU/Linux

Ubuntu 5.04 AMD64 Linux 2.6.10-6-amd64-generic #1 Mon Jan 16 19:04:15 UTC 2006 x86_64 GNU/Linux

Ubuntu 5.04 AMD64 Linux 2.6.10-6-amd64-generic #1 Fri Sep 15 12:24:24 UTC 2006 x86_64 GNU/Linux

I tested the following configuration without success:

Ubuntu 6.04 Linux 2.6.15-18-amd64-generic #1 SMP PREEMPT Thu Mar 9 14:37:22 UTC 2006 x86_64 GNU/Linux

Ubuntu 6.06 AMD64 Linux 2.6.15-26-amd64-k8 #1 SMP PREEMPT Fri Sep 8 20:14:40 UTC 2006 x86_64 GNU/Linux

Ubuntu 6.10 AMD64 Linux 2.6.17-10-generic #2 SMP Tue Sep 26 15:43:28 UTC 2006 x86_64 GNU/Linux

Ever when a precompiled kernel failed a build-from-sources kernel fails as well.

I build also a kernel from the sources at kernel.org (libata version 2.00;sata_sil 0000:01:07.0: version 2.0) without success:
kernel.org AMD64 Linux gecko 2.6.18 #1 SMP PREEMPT Tue Oct 3 16:58:22 CEST 2006 x86_64 GNU/Linux

Beside the mentioned working Live-CD-Systems (Knoppix) I tested without sucess the live distribution of Ubuntu 5.10 (i386 and AMD64 ) and some other live distributions

KANTONIX 2005-04 AMD64 Linux Kanotix 2.6.14-kanotix64-9 #1 SMP PREEMPT Wed Dec 28 10:20:07 GMT 2005 x86_64 GNU/Linux

KANTONIX 2005-04 386 Linux Kanotix 2.6.14-kanotix-9 #1 PREEMPT Wed Dec 28 10:17:53 CET 2005 i686 GNU/Linux

SuSE 93 Linux linux 2.6.11.4-20a-default #1 Wed Mar 23 21:52:37 UTC 2005 i686 athlon i386 GNU/Linux

Just to be secure thats there's no hardware defect I changed main memory and hard disk including cables without success.

In the Internet I found some other people with same or similar problems but finally none of there hints helped. Some of which mentioned a timing problem which may be solved by some BIOS settings but my BIOS don' t give me the opportunity to set some time. Is there a software solution?

I hope I will be able to support you much more in future and help solving this problems.

Best regards.

- - - - -
Here a short review of the former report:
writing data on SATA hard disk (Silicon Image 3512A Onboard-RAID) under Breezy
Badger (kernel version 2.6.12-7-amd64-k8, libata version 1.11 loaded; sata_sil
version 0.9) cause sometimes errors in md5sum check especially when large files
are copied or moved.

The same problem occurs under Fedora Core 4 or SuSE 9.3 with different kernel
versions.

Memory and hard disk have been checked without any evidence.

With Hoary Hedgehog kernel 2.6.10-5-amd64-k8 everything work fine (libata
version 1.10 loaded; sata_sil version 0.8). Some x86 kernels (e.g. Knoppix) also
works without problems as well as Debian Sarge AMD64 (not really surprising,
isn't it?).

The modules loaded in Breezy seem not to differ relevantly from Hoary except of
some versions (of cause).

I guess I have tried n...

Read more...

Changed in linux-source-2.6.15:
status: Rejected → Unconfirmed
Revision history for this message
Linus van Pelt (c-t-b) wrote :

Dear all,

nevertheless I've checked all the computer hardware and I did so with the main memory as well (which was original Infineon memory) with memtest86+ more than once I must tell you the following solution:

After I upgrade the main memory all the mentioned problems disapears.

Meanwhile the 'problem' memory is running on an other board without any trouble.

So I sugest an incompatibility cause my problem.

Thanks to all
bye,
Chris

Revision history for this message
Andrew Ash (ash211) wrote :

It sounds like Linus (the original reporter) doesn't have this issue anymore. Can we close the bug, or is it still an issue with someone here?

Changed in linux-source-2.6.15:
status: New → Incomplete
Revision history for this message
Linus van Pelt (c-t-b) wrote : Re: [Bug 20988] Re: Data Corruption on AMD64 SATA System with Breezy Badger

Dear all,

from my point of view we can close the issue!

Am 17.03.2008 19:04, schrieb Andrew Ash::
> It sounds like Linus (the original reporter) doesn't have this issue
> anymore. Can we close the bug, or is it still an issue with someone
> here?
>
> ** Changed in: linux-source-2.6.15 (Ubuntu)
> Status: New => Incomplete
>
>

Revision history for this message
Andrew Ash (ash211) wrote :

Great! I'll close the bug then. If anyone else was having this issue, please open a new bug. Thanks!

Changed in linux-source-2.6.15:
assignee: ben-collins → nobody
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.