Areca RAID + Intel Core2Duo + XFS = corrupted files

Bug #88705 reported by Vladimir Fonov
8
Affects Status Importance Assigned to Milestone
linux-source-2.6.17 (Ubuntu)
Won't Fix
High
Unassigned
linux-source-2.6.20 (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Exists in Ununtu Edgy , kernel 2.6.17.10-generic and kernel 2.6.17.11-generic
Appears when using Areca ARC-1160 on Intel DP965LT motherboard with Intel Core2Duo 6400 CPU, verified on two identical machines.
Doesn't appear on the same version of software and the same raid card , running on Nforce2 based motherboard with Athlon XP CPU. Also doesn't appear when using build of the vanilla linux kernel 2.6.20:

mkfs.xfs /dev/<raid>
mount /dev/<raid> /mnt/raid
md5sum -b /somewhere/else/<some_big_file> > /somewhere/else/md5
cp <some_big_file> /mnt/raid/
cd /mnt/raid/
and md5sum -c /somewhere/else/md5 reports that the check is failed.

where some_big_file is an arbitrary file with the size around 1Gb or more.
However, if I format file system as EXT3 everything works as it should.

I see no errors or warnings in dmesg or syslog.
xfs_check also reports that everything is ok.

P.S. smaller files are randomly corrupted.

Revision history for this message
Vladimir Fonov (vladimir-fonov) wrote :

Original file

Revision history for this message
Vladimir Fonov (vladimir-fonov) wrote :

Example

description: updated
Changed in linux-source-2.6.17:
importance: Undecided → High
Revision history for this message
Jewel (ejewel) wrote :

I can confirm this bug. I have two identical servers:
AMD X2 3800+
4 GB RAM
Areca ARC-1220 8 port SATA RAID controller
5 500 GB drives in RAID 5

Both servers have a 1.9 TB partition using LVM + XFS.

Server #1 is running dapper, 2.6.15-23-amd64-server.
Server #2 is running edgy, 2.6.17-10-server (x86_64).

Copying a 1.3 GB file from the system drive (an ide drive) to the RAID will always result in silent corruption on edgy. I copied the file 20 times under a variety of system and disk load. The md5sum was different every single time!

Dapper does not have this problem.

I will reboot into 2.6.17-11-server and do some more testing and then upgrade to feisty to see if this problem still exists.

Changed in linux-source-2.6.17:
assignee: nobody → ubuntu-kernel-team
status: Unconfirmed → Confirmed
Revision history for this message
Jewel (ejewel) wrote :

The problem also existed in 2.6.17-11-server but does not happen in feisty (2.6.20-13-server). Vladimir, are you running your servers in 64-bit or 32-bit mode? How much RAM do you have?

Revision history for this message
Vladimir Fonov (vladimir-fonov) wrote :

>are you running your servers in 64-bit or 32-bit mode? How much RAM do you have?
I am running in 32-bit mode, and I have 2Gb of RAM.

Revision history for this message
Pascal de Bruijn (pmjdebruijn) wrote :

This is most likely a kernel issue (obviously)

Please note that the Areca driver has only shortly been included into the mainline Linux kernel. I'm not exactly sure, probably around .16 or .17.

My point being, that the Areca driver probably isn't as mature as we'd want in 2.6.17.

Upgrading to Feisty seems appropriate.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Hardy Heron Alpha series was recently released which contains an updated version of the kernel. You can download and try the new Hardy Heron Alpha release from http://cdimage.ubuntu.com/releases/hardy/ . You should be able to then test the new kernel via the LiveCD. If you can, please verify if this bug still exists or not and report back your results. General information regarding the release can also be found here: http://www.ubuntu.com/testing/ .

Also, we will keep this report open against the actively developed kernel bug against 2.6.17 this will be closed. Thanks.

Changed in linux:
status: New → Incomplete
Changed in linux-source-2.6.17:
status: Confirmed → Won't Fix
Revision history for this message
stevecs (stevecs) wrote :

as a data point I've used ubuntu edgy (desktop & server) w/ the areca 1280ML card here on a 20TiB array running JFS without a problem with corruption under a C2D processor (and now a C2Q). I've run complete md5sum's against all data every two weeks no issues. I wonder if it may not be the areca driver but the XFS filesystem or an interaction between the two. I have moved one system to Feisty (2.6.20) with another 1280ML about 4 months ago also heavily hit, and md5sum's on the data (also JFS for the filesystem) no issues.

Revision history for this message
Vladimir Fonov (vladimir-fonov) wrote :

I've been using vanilla linux kernel 2.6.20 without problems, later I switched to Ubuntu 7.04 and changed the file system to ext3, also without any problems.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks for the update Vladimir. I'm marking this "Fix Released" against 2.6.20.

Changed in linux:
status: Incomplete → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.