Ext3 filesystem corruption - data loss

Bug #53102 reported by Ryan T. Sammartino
44
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Ext3 partitions get corrupted when using Linux 2.6.x.

This problem has been reported with many 2.6.x configurations. Problem has been reported with Feisty and Linux 2.6.15-26-686 (this bug #53102), Edgy and Linux 2.6.17 (bug #65815), Linux 2.6.20 (bug #118256), Dapper and 2.6-amd64-generic (bug #69430). Also Linux Kernel Mailing List and other distributions lists mention a similar bug.

Original description of this bug:

Binary package hint: linux-image-2.6.15-26-686

On an approximately weekly basis, at least one of my ext3 partitions on one of my many disks will become corrupt, remounted as read-only, and require repairs. This is an alarmingly high failure rate.

I was reluctant to log this bug but now that it has occurred on different disks on various occasions I can't dismiss it as a bad drive.

This began shortly after the upgrade to 2.6.15-26. Before that, no problems (that I can recall).

Revision history for this message
Olivier Cortès (olive) wrote :

I suffer the same problem on my machine.
Sometimes the system won't umount / before reboot/halt,
Rarely the system freezes after a short period of time after resume from suspend-to-ram,
Rarely the system freezes, period.
In *any* case, I loose a bunch of data. Often this data is moved to /lost+found (between 250Mb and 1Gb of data) after an extremely long manual fsck (often many manual passes are needed for the FS to be totally clean), I must reinstall from backups.

Distro : Ubuntu Dapper, up-to-date. this problem has been happening regularly since i installed my first Dapper (Flight2). On Breezy it was very occasionnal (happened 2 or 3 times "only").
HW: IBM Thinkpad T40p (this has happened on IBM X31 too when i owned it), PATA Samsung 120Gb drive (this has happened with 40 and 80Gb samsung drives on both machines). Memory should be out of cause (memtest OK on obth machines), disks *could* be (smartctl tests ran without any problems, no SMART errors logged on any drives).

Changed in linux-source-2.6.15:
status: Unconfirmed → Confirmed
Revision history for this message
Olivier Cortès (olive) wrote :

I feel like my problem could be related to #16610... But my disk is not the standard IBM one.
And when the machine freezes after resuming, I don't event have SysRq to dump anything...
I'm completely lost to hunt the source of this problem, and it happends every two weeks or so...

description: updated
Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote : Massive data loss

I had massive corruption on a new drive.

My new IDE ext3 drive had some problems, now it can't be mounted and fsck fails.

Drive was new, less than one month in use. I had Feisty that was just installed. Drive was using ext3. Currently, file rescue is a priority, but after that I might have more details.

Similar bugs:
Bug #65815 Ext3 corruption on a drive
Bug #53102 ext3 partitions are getting corrupt more often than they should [this one]
Bug #66032 fsck.ext3: Unable to resolve
Bug #118256 ext3 data corruption with kernel 2.6.20-16-generic

A bug can't get more serious than massive data loss. Because of the corruptive nature, also backups may be affected. I find it interesting that this confirmed bug is one year old, but its importance is "undecided".

I hope that somehow, there is a non-OS explanation to this bug...

Revision history for this message
Tero Karvinen (karvinen+launchpad) wrote : Re: ext3 partitions are getting corrupt more often than they should

I recovered a partition broken by this. It did not have any surface problems. It seems that only couple of files were broken, even though it's hard to know without comparing to backups. However, it did take a while to ddrescue and e2fsck a big disk... E2fsck from live cd fixed the disk, even though fsck did not work in the busybox shell earlier.

Kernel seems to have other ext3 problems too: bug #109177 [Feisty] Kernel crashes in ext3 dx_probe, "BUG at fs/ext3/namei.c:384!".(+3 duplicates).

More similar bug reports:
Bug #69430 ext3 filesystem corruption?
Similar, but using RAID:
Bug #100126 Data corruption with ext3 in striped logical volume

A similar bug is mentioned in Linux Kernel Mailing List:
http://marc.info/?t=116550778700006&r=1&w=2

Linus 2006-12-28: "And I have a test-program that shows the corruption _much_ easier (at
least according to my own testing, and that of several reporters that back
me up), and that seems to show the corruption going way way back (ie going
back to Linux-2.6.5 at least, according to one tester)."
"So it just got a lot _easier_ to trigger in 2.6.19, but it's not a new
bug."
http://marc.info/?l=linux-kernel&m=116733254829725&w=2

Re: strange ext3 corruption problem on 2.6.x http://www.ussg.iu.edu/hypermail/linux/kernel/0403.1/1480.html

Gentoo-hardened: 2.6.19 file content corruption on ext3 http://osdir.com/ml/gentoo.hardened/2007-01/msg00066.html

description: updated
description: updated
description: updated
Revision history for this message
hardcorelinux (hardcorelinux) wrote :

I have a Compaq V3228AU laptop where a suspend will hose all the partitions. e2fsck will say that corrupted ext3, will delete the journal and rebuild the journal for me, some files will be moved to lost+found, sometimes the entire filesystem will be hosed and I'll have to recover all my files.

Revision history for this message
hardcorelinux (hardcorelinux) wrote :

the iommu=soft workaround discussed here - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/203537

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Gralgrathor (xixulon) wrote :

Similar problems here:

Running
500 Gb SATA (WD)
500 Gb IDE (WD)
500 Gb IDE (WD)
200 Gb IDE (Maxtor)
200 Gb IDE (Maxtor)

On a 2.6.25-2-686 kernel in my box.

Recently, both 500 G disks have begun to develop serious problems. SMART and Badblocks reports no serious problems with the hardware, yet I am losing data each day. Just the other day I lost an entire 375 G volume of data. FSCK managed to restore most files to the lost+found folder, but since all filenames got lost in the process the info became useless to me.

I'll be awaiting the 2.6.27 upgrade; sure hope it'll work, because having to restore 100s of Gb's of data from backups is only fun for so many times...

Revision history for this message
Pedro (pedro-zampa) wrote :

Hi all,
(sorry for my enghish)

I posted (in spanish) a similar problem with Ubuntu 8.10 (https://answers.launchpad.net/ubuntu/+question/51844).

I have a Phenom 9950, mother Asus M3n78-VM, 4Gb DDR2 Supertalent 800MHz and a HD IDE 160Gb.
When I boot Ubuntu 8.10 (for amd 64bits) for the second time, I had Data Loss in the ext3 partition. Aditional, I had chequed the instalation CD and the RAM, and are OK.
Also, when I work with ubuntu, It freeze for some second at random time.

Revision history for this message
hardcorelinux (hardcorelinux) wrote :

Folks, I can confirm that the data corruption issue on my V3228 is gone with the latest kernel update on Ibex.
Linux 2.6.27-8 #1 SMP Thu Nov 6 17:38:14 UTC 2008 x86_64 GNU/Linux

I no longer have to use iommu=soft option in boot params and the suspend/resume works flawlessly.

I still wonder though, what the actual fix was.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks hardcorelinux,

We are keeping the Intrepid kernel uptodate with the recent upstream stable kernel patch sets. It's likely a patch was pulled in which resolves your issue?

Can anyone else confirm if the newer 2.6.27-9 kernel available as an update or if the 2.6.27-10 kernel in intrepid-proposed helps with the issue? Thanks.

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
Pedro (pedro-zampa) wrote : Re: [Bug 53102] Re: Ext3 filesystem corruption - data loss

In my case, the problems begin with an old IDE CD-RW Sonny conected to the
same cable than the HDD (Ide too).

When I removed the CD-RW the problem has gone..

2008/12/3 Leann Ogasawara <email address hidden>

> Thanks hardcorelinux,
>
> We are keeping the Intrepid kernel uptodate with the recent upstream
> stable kernel patch sets. It's likely a patch was pulled in which
> resolves your issue?
>
> Can anyone else confirm if the newer 2.6.27-9 kernel available as an
> update or if the 2.6.27-10 kernel in intrepid-proposed helps with the
> issue? Thanks.
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>
> --
> Ext3 filesystem corruption - data loss
> https://bugs.launchpad.net/bugs/53102
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in "linux" source package in Ubuntu: Incomplete
>
> Bug description:
> Ext3 partitions get corrupted when using Linux 2.6.x.
>
> This problem has been reported with many 2.6.x configurations. Problem has
> been reported with Feisty and Linux 2.6.15-26-686 (this bug #53102), Edgy
> and Linux 2.6.17 (bug #65815), Linux 2.6.20 (bug #118256), Dapper and
> 2.6-amd64-generic (bug #69430). Also Linux Kernel Mailing List and other
> distributions lists mention a similar bug.
>
> Original description of this bug:
>
> Binary package hint: linux-image-2.6.15-26-686
>
> On an approximately weekly basis, at least one of my ext3 partitions on one
> of my many disks will become corrupt, remounted as read-only, and require
> repairs. This is an alarmingly high failure rate.
>
> I was reluctant to log this bug but now that it has occurred on different
> disks on various occasions I can't dismiss it as a bad drive.
>
> This began shortly after the upgrade to 2.6.15-26. Before that, no
> problems (that I can recall).
>
>

Revision history for this message
Gralgrathor (xixulon) wrote :

Ohayo gossaimasu,

I've replaced my ITE IDE controller card with a Brand X model, threw away two IDE disks and swapped in some SATA 1TB disks. So My config is now:

500 Gb SATA on mainboard
1 Tb SATA on mainboard
1 Tb SATA on mainboard
500 Gb IDE on PCI IDE controller
500 Gb IDE on PCI IDE controller
and of course the obligatory
CD/DVD RW on mainboard IDE.

The frequency of dataloss seems to have lessened somewhat - but is not down to zero yet.

Cheers!

Revision history for this message
Joel Goguen (jgoguen) wrote :

You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue for you. Can you try with the latest Ubuntu release? Thanks in advance.

Revision history for this message
Ryan T. Sammartino (ryan-sammartino) wrote :

8.10 has been very solid for me.

Revision history for this message
frodri (fernando-elec) wrote :

Thanks Joel,

I am happily using Ubuntu now without any further problems. The problem
with something like this is repeatability; there is no way to recreate
it AFAIK; but thought I would report it due to the severity of the problem.

Fernando

Joel Goguen wrote:
> You reported this bug a while ago and there hasn't been any activity in
> it recently. We were wondering if this is still an issue for you. Can
> you try with the latest Ubuntu release? Thanks in advance.
>

--
Dr. Fernando Rodriguez <email address hidden>
Dept. of E&E Engineering voice: +44(0)141 330 4108
University of Glasgow FAX: +44(0)141 330 4907
Glasgow G12 8LT

Just put the carbon back where you found it.

Revision history for this message
Joel Goguen (jgoguen) wrote :

I completely understand that this sort of bug may not be easy to consistently reproduce. Given the seriousness of data loss and the apparent frequency it was occurring at, I wanted to make sure it wasn't still happening for anyone on this bug. Since it doesn't seem to be happening anymore, I'm going to close this bug. Feel free to reopen it if you experience this problem again. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again for your bug report, and don't hesitate to submit bug reports in the future!

Changed in linux:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.