in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Bug #453579 reported by Steve Langasek
408
This bug affects 25 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
Release Notes for Ubuntu
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Invalid
Critical
Surbhi Palande
Nominated for Jaunty by r12056
Nominated for Lucid by r12056
Karmic
Invalid
Critical
Unassigned

Bug Description

There are worrying reports of filesystem corruption on ext4 in karmic. Scott says:

12:36 < Keybuk> this whole ext4 thing is worrying me
12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in /var/cache/apt/archives
12:59 < Keybuk> which seems to imply its any file large enough to use lots of extents

I'm opening this bug report so that this bug gets tracked & triaged for karmic. If we're unable to isolate the issue, we should consider rolling back to ext3 as the default filesystem in the installer.

ProblemType: Bug
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: AD198x Analog [AD198x Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: vorlon 3350 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xee240000 irq 17'
   Mixer name : 'Analog Devices AD1981'
   Components : 'HDA:11d41981,17aa2025,00100200'
   Controls : 20
   Simple ctrls : 11
Date: Fri Oct 16 16:01:26 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=f108133c-6b9d-4d28-9058-0b3a0c5549b4
MachineType: LENOVO 6371CTO
Package: linux-image-2.6.31-14-generic 2.6.31-14.46
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: root=/dev/mapper/hostname-root ro
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-13.44-generic
RelatedPackageVersions: linux-firmware 1.22
SourcePackage: linux
Uname: Linux 2.6.31-13-generic x86_64
WpaSupplicantLog:

dmi.bios.date: 12/27/2006
dmi.bios.vendor: LENOVO
dmi.bios.version: 7IET23WW (1.04 )
dmi.board.name: 6371CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7IET23WW(1.04):bd12/27/2006:svnLENOVO:pn6371CTO:pvrThinkPadT60:rvnLENOVO:rn6371CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 6371CTO
dmi.product.version: ThinkPad T60
dmi.sys.vendor: LENOVO

ls

Revision history for this message
Steve Langasek (vorlon) wrote :
Changed in linux (Ubuntu):
importance: Undecided → Critical
milestone: none → ubuntu-9.10
Revision history for this message
Steve Langasek (vorlon) wrote :

There are several open bug reports upstream regarding ext4 corruption, but it's not clear which, if any, are related to the problems being observed.

http://bugzilla.kernel.org/show_bug.cgi?id=14354 is one bug that appears to be linked to the use of the DM layer - if you're following up to this bug report, please indicate whether your ext4 fs is sitting on top of a dm-crypt, LVM, or RAID device.

That bug also mentions using auto_da_alloc=0 as a boot option to work around; we should check whether that boot option makes a difference for users seeing this bug.

Changed in linux (Ubuntu Karmic):
status: New → Triaged
Revision history for this message
Steve Beattie (sbeattie) wrote : apport-collect data

AplayDevices: aplay: device_list:223: no soundcards found...
Architecture: amd64
ArecordDevices: arecord: device_list:223: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer', '/dev/sequencer2', '/dev/sequencer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CheckboxSubmission: 138b721e3738d95476954739cfd660dd
CheckboxSystem: 558fbfb2a1258711a37bb7e23c5d4e6e
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=15325e81-9f2d-4102-9742-a1a76b888317
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Shuttle Inc SA76
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=/dev/mapper/alyosha1-karmic_test ro quiet splash
ProcEnviron:
 SHELL=bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-14-generic N/A
 linux-firmware 1.23
RfKill:

Uname: Linux 2.6.31-14-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 05/04/2009
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: FA76
dmi.board.vendor: Shuttle Inc
dmi.board.version: V10
dmi.chassis.type: 3
dmi.chassis.vendor: Shuttle Inc
dmi.chassis.version: G2
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd05/04/2009:svnShuttleInc:pnSA76:pvrV10:rvnShuttleInc:rnFA76:rvrV10:cvnShuttleInc:ct3:cvrG2:
dmi.product.name: SA76
dmi.product.version: V10
dmi.sys.vendor: Shuttle Inc

Revision history for this message
Steve Beattie (sbeattie) wrote : AlsaDevices.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : BootDmesg.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Card0.Amixer.info.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Card0.Amixer.values.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Card0.Codecs.codec.0.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : CurrentDmesg.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Lspci.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : Lsusb.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : PciMultimedia.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : ProcCpuinfo.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : ProcInterrupts.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : ProcModules.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : UdevDb.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : UdevLog.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : WifiSyslog.txt
Revision history for this message
Steve Beattie (sbeattie) wrote : XsessionErrors.txt
tags: added: apport-collected
Revision history for this message
Steve Beattie (sbeattie) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I did a fresh install from the karmic alt amd64 cd build 20091016 onto ext4 on LVM. Post install update, and installation of a limited amount of additional software, I ran a debsums -a on the system, and noticed the following things:

- debsums claims that the following packages don't have an md5sums at all: bogofilter,g++,binutils, installation-report, libgdbm3, liblockfile1, lockfile-progs, mawk, netbase, update-inetd, xorg,xserver-xorg-input-all, and xserver-xorg-video-all. All of these are supposed to, though 5 are 0 length, but they're all missing from the ext4/lvm install:

- the following files were reported as failing their debsums check:

/var/lib/gdm/.gconf.defaults/%gconf-tree.xml FAILED
/usr/share/applications/gpilotd-control-applet.desktop FAILED
/var/lib/openoffice/basis3.1/share/config/javasettingsunopkginstall.xml FAILED

the last is expected (I believe) but not the first two.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 453579] Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

On Sat, Oct 17, 2009 at 05:08:49PM -0000, Steve Beattie wrote:
> - the following files were reported as failing their debsums check:

> /var/lib/gdm/.gconf.defaults/%gconf-tree.xml FAILED
> /usr/share/applications/gpilotd-control-applet.desktop FAILED
> /var/lib/openoffice/basis3.1/share/config/javasettingsunopkginstall.xml FAILED

> the last is expected (I believe) but not the first two.

The second is a bug in gnome-pilot, I guess it hasn't been rebuilt since we
fixed the translations-stripped-after-debsums problem.

The first could have any number of other explanations besides filesystem
corruption.

The missing .md5sums files are interesting/worrying, though.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Colin Watson (cjwatson) wrote : Re: corruption of large files reported with linux 2.6.31-14.46 on ext4

I don't think the missing .md5sums files are intrinsically worrying. I've looked at several of them and they're genuinely missing. installation-report, for instance (for which I have the source to hand), just doesn't call dh_md5sums, and the same is true for a number of the other packages in the list.

Since md5sums files are created in debian/rules rather than by dpkg-deb, they're merely very widespread rather than actually universal ...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I'm just using plain old ext4 on SSD

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Here's an example of what I mean:

warcraft scott% wget -q http://cdimages.ubuntu.com/ubuntu-moblin-remix/daily-live/current/karmic-moblin-remix-i386.iso
warcraft scott% md5sum karmic-moblin-remix-i386.iso
91e4f415767a45617f0cbfc5b0abd19c karmic-moblin-remix-i386.iso
warcraft root# sync
warcraft scott% md5sum karmic-moblin-remix-i386.iso
26c3177ae594a3713b0e318e12e91e1b karmic-moblin-remix-i386.iso

I assume the change is that the file is no longer in the page cache

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : apport-collect data

Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: scott 1791 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6ffc000 irq 21'
   Mixer name : 'SigmaTel STAC9228'
   Components : 'HDA:83847616,10280209,00100201'
   Controls : 29
   Simple ctrls : 19
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=4e4e4aa8-4e55-432a-a36c-2a4d1cc71f49
MachineType: Dell Inc. XPS M1330
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=UUID=bc91769a-258e-4e9a-89e1-e7fcb36520d7 ro quiet
ProcEnviron:
 LANG=en_GB.UTF-8
 PATH=(custom, user)
 SHELL=/bin/zsh
 LC_COLLATE=C
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-14-generic N/A
 linux-firmware 1.24
Uname: Linux 2.6.31-14-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 12/26/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 0U8042
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd12/26/2008:svnDellInc.:pnXPSM1330:pvr:rvnDellInc.:rn0U8042:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: XPS M1330
dmi.sys.vendor: Dell Inc.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : AlsaDevices.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : AplayDevices.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ArecordDevices.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : BootDmesg.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Card0.Amixer.values.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Card0.Codecs.codec.0.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : CurrentDmesg.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : IwConfig.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Lspci.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Lsusb.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : PciMultimedia.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ProcCpuinfo.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ProcInterrupts.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : ProcModules.txt
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : RfKill.txt
Steve Langasek (vorlon)
Changed in linux (Ubuntu Karmic):
milestone: ubuntu-9.10 → karmic-updates
Steve Langasek (vorlon)
Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
Steve Langasek (vorlon)
Changed in ubuntu-release-notes:
status: New → Fix Released
Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
importance: Undecided → Unknown
status: New → Unknown
Changed in linux:
status: Unknown → Confirmed
Steve Langasek (vorlon)
Changed in linux:
importance: Unknown → Undecided
status: Confirmed → New
Pete Graner (pgraner)
Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Steve Langasek (vorlon)
Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
summary: - corruption of large files reported with linux 2.6.31-14.46 on ext4
+ in-place corruption of large files *without fsck or reboot* reported
+ with linux 2.6.31-14.46 on ext4
tags: added: 2.6.31.8
138 comments hidden view all 218 comments
Revision history for this message
Goffi (goffi) wrote :

I ran memtest86+ for 9 hours, 10 pass, and still no error...

Revision history for this message
Goffi (goffi) wrote :

My issue seems to be related not to ext4 or my memory but to my swap. I tried to download 2 times the same 30 Mb files without swap, and this time it was the same md5.

In addition, I tried to fill my swap partition with zeros, and I have an error:

% sudo dd if=/dev/zero of=/dev/sda5 bs=1024
22293+0 records in
22293+0 records out
22828032 bytes (23 MB) copied, 16.7581 s, 1.4 MB/s
1331109+0 records in
1331109+0 records out
1363055616 bytes (1.4 GB) copied, 254.542 s, 5.4 MB/s
dd: writing `/dev/sda5': Input/output error
2931829+0 records in
2931828+0 records out
3002191872 bytes (3.0 GB) copied, 471.766 s, 6.4 MB/s
zsh: exit 1 sudo dd if=/dev/zero of=/dev/sda5 bs=1024

Is there any check done on swap partition ? Can the kernel detect errors on it ? Is there a way to avoid bad clusters with swap partitions ?

I had also an issue (scrambled screen when booting) which disappeared but I can't be sure it was solved by the swap deactivation, as I tried several things at the same time (replacing kdm by gdm, removing splash at boot, and maybe an upgrade solved the problem).

Revision history for this message
Goffi (goffi) wrote :

The tests on my partitions don't seem to find any problem:

% sudo smartctl -l selftest /dev/sda5
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1790

% sudo badblocks -sv /dev/sda5
Checking blocks 0 to 2931830
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.

Revision history for this message
Z149 (graphics149) wrote :

I am also having problems possibly ext4 related.
After clearing up some big files, the 'free space' reported in my ext4 / filesystem did not decrease.
Emptying the wastebasket and sensible checks have found no cause.
New files added and then deleted and emptied from the wastebasket used up some of my last precious free space and I've not got that space back.

'big' files includes a couple of zipped backups of 0.1 and 0.2 GB, and a monstrous 3.1GB archive.
Nothing ordinary cleared the space. fsck did not help.
could it be ext4?

============
Desktop Ubuntu 9 with linux 2.6.31-17
40GB IDE sata drive

Revision history for this message
Michael Lazarev (milaz) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

@Z149: try "Applications->Accessories->Disk Usage Alanyzer", which
also can be run from command line as "baobab". Push "Scan Filesystem"
button to see where the space goes to.

Revision history for this message
TragicWarrior (bryan-christ) wrote :

I have experienced data corruption on 2 different systems using ext4 on flash media. One of the drives was an Intel SSD drive and the other was a SanDisk Cruzer USB flash drive. I reproduced the problem several times with the both of these drives on two different hardware systems. Here's how I reproduced the problem:

1. Install Ubuntu 9.10 32-bit on USB flash drive...
   - /boot ext2 500MB (primary)
   - swap 500MB (primary)
   - / ext4 rest-of-drive (primary)

2. Install latest updates with Update Manager.

3. Reboot and observe corruption.

I have repeated a similar experiment on Fedora 12 with no file-system corruption.

Changed in ubuntu-release-notes:
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
description: updated
Revision history for this message
Øyvind Stegard (oyvindstegard) wrote :

There's a fair amount of ext4-fixes in the latest 2.6.31-18.55-kernel in karmic-proposed, according to the changelog. I suppose it would be worth testing with that kernel for the people who experience this bug.

Changed in ubuntu-release-notes:
status: Fix Released → Incomplete
Revision history for this message
Steve Langasek (vorlon) wrote :

This is not incomplete. The issue is documented in the release notes.

Changed in ubuntu-release-notes:
status: Incomplete → Fix Released
Revision history for this message
TragicWarrior (bryan-christ) wrote :

This is an unfortunate chicken-and-egg scenario. Assuming the latest kernel in karimc-proposed does fix the problem, how does one safely upgrade their system since there is a likelihood the very update itself will get corrupted? The only certain solution would be to (gasp) re-master the Karmic ISO images with a point-release so that fresh installs are guaranteed usable.

Revision history for this message
Ryan C. Underwood (nemesis-icequake) wrote : Re: [Bug 453579] Re: in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4

Why would the kernel update get corrupted unless the archive or any of
the files it contains are several hundred megabytes in size?

--
Ryan C. Underwood, <email address hidden>

Revision history for this message
TragicWarrior (bryan-christ) wrote :

Ryan,

I believe the large file aspect of the bug is an incorrect characterization. If you take a look at comment #184, you will see that I have reproduced the bug on much smaller files.

Revision history for this message
Ryan C. Underwood (nemesis-icequake) wrote :

You did not say anything about reproducing the bug on smaller files. To my knowledge this would be the first report of a file smaller than 100MB being corrupted by this bug.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Jan 27, 2010 at 07:49:06PM -0000, TragicWarrior wrote:
> I believe the large file aspect of the bug is an incorrect
> characterization. If you take a look at comment #184, you will see that
> I have reproduced the bug on much smaller files.

No, you have reproduced some *other* corruption problem that doesn't fit the
profile of the original bug report. Please file a separate bug.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
TragicWarrior (bryan-christ) wrote :

Steve,

The original posting mentions files that are 512MB (comment #53). Later it is assessed at 300MB (comment #89). Then it was whittled down to 120MB (comment #143). Then it went to 45MB (comment #174). I don't think it would be ideal to open a new bug since so much data has been captured here. Why not just re-characterize the bug to match the collected data? In my case, the largest file that I think would have come down in Update Manager would be OpenOffice ~100MB.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Jan 27, 2010 at 08:29:43PM -0000, TragicWarrior wrote:

> Why not just re-characterize the bug to match the collected data?

Because the data is not related to the bug that was reported, and it's not
appropriate to hijack bug reports for unrelated issues.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
TragicWarrior (bryan-christ) wrote :

Steve, I would hardly call changing the description

from:
"in-place corruption of large files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4"

to:
"in-place corruption of files *without fsck or reboot* reported with linux 2.6.31-14.46 on ext4"

hardly constitutes hijacking. it's not as if we are talking about night and day here. in this case, the original reporter simply didn't know the problem was manifest on smaller files < 512MB. perhaps it is easier to reproduce on larger files, but the evidence now shows that it is a problem on files 45+ MB files.

Revision history for this message
Roland (roland1979) wrote :

I can confirm this bug with current karmic kernel:
 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux

Steps to reproduce:

Download same file with 2 sources in parallel. I took Opera, and wget.

wget http://ubuntu.intergenia.de/releases/karmic/ubuntu-9.10-desktop-i386.iso
Opera saved to http://ubuntu.intergenia.de/releases/karmic/ubuntu-9.10-desktop-i386.iso-opera

Results:

roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso
roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso-opera
3f979c279665cc7d6ead2c11b1060188 ubuntu-9.10-desktop-i386.iso-opera
roland@pdbxe100:~$ ls -l ubuntu-9.10-desktop-i386.iso*
-rw-r--r-- 1 roland roland 723488768 2009-10-28 22:14 ubuntu-9.10-desktop-i386.iso
-rw-r--r-- 1 roland roland 723488768 2010-01-28 15:35 ubuntu-9.10-desktop-i386.iso-opera
roland@pdbxe100:~$

Using cmp I found that there were NO differences?!
roland@pdbxe100:~$ cmp ubuntu-9.10-desktop-i386.iso ubuntu-9.10-desktop-i386.iso-opera

I wondered, and compared again via md5sum:
roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso
roland@pdbxe100:~$ md5sum ubuntu-9.10-desktop-i386.iso-opera
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso-opera
roland@pdbxe100:~$

So after accessing the files a second time, they seemed to have synced, flushed after delay .. or whatever.

This are my ext4 flags:
has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

I created the filesystem manually via mkfs.ext4 /dev/sda4.

Revision history for this message
Øyvind Stegard (oyvindstegard) wrote :

I've yet to see any feedback about the 2.6.31-18 kernel
(karmic-proposed) in this critical bug report, and I find that rather
strange. The proposed -18-kernel has been out for while now and I count
80+ ext4-fixes in the changelog, including a fix for a data corruption
scenario.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/496816
http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.31.8 (upstream
stable release from which 2.6.31-18 has patches)
https://wiki.ubuntu.com/Testing/EnableProposed

Revision history for this message
Oliver Seemann (os-oebs) wrote :

I believe I must withdraw my bug report. I have a test case that reproduces the problem, but it does not seems to be related to ext4, as turned out today.

I read about 2.6.31-18 here and updated yesterday. But also with the new kernel I could reproduce the problem. Wondering about that I created an XFS partition and repeated the test on it ... also positive. So my problem is somewhere else.

The test is copying 4 big files totaling 11gb via nfs from an old Dapper box to a local partition. One of the files always ended up with a mismatching sha1 sum. The "cmp -l" output is always a single contiguous 128 byte block at some random offset. The values don't seem to be affected by single bit flips (110010 -> 101010, 11111101 -> 11010101, 10100 -> 101010, 1110 -> 11010111, 11110001 -> 11010010). Memtest86 also ran fine, at least on this box, I did not yet test the Dapper one.

Surbhi Palande (csurbhi)
Changed in linux (Ubuntu):
assignee: nobody → Surbhi Palande (csurbhi)
Revision history for this message
Surbhi Palande (csurbhi) wrote :

@scott, do you still see this bug ? I tested this by doing both an upgrade and a fresh install + updates and did not seem to run into it. The md5sum works just fine. If this is still a problem, then I will post a debug kernel if you are willing to try ?

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Can anyone else confirm that this is still a bug in Karmic which is reproducible by the following steps mentioned in the original report:

1) download an iso
2) compare the md5sum

Thanks !

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Also, the result of this quick test from anyone who sees this bug, would be appreciated. If you have a ext3 fs/any other fs on some partition(or a sufficiently large file which is formated as a fs other than ext4) then please do the following:

A) ensure that your blocksize if 4096 bytes by looking at the output of dumpe2fs -h <partition which has ext4>
B) from the same output see if you can find "extent" in the line which has "Filesystem features"
C) post the output dumpe2fs -h <partition which has ext4>

if blocksize is 4096 bytes then:

1) download the iso on this ext3/other filesystem
2) dd if=<iso name> of=/dev/<ext*4* partition>/<some file name> bs=512MB count=1
3) the md5sum should be: faf49ac5a653e339f84a8dd0b7c047dc

(Note that bs=512MB writes 512000000 bytes... if you write 536870912 bytes (i.e 4096 * 131072) then the md5sum should be this:
bcbc14f5bfc9229995afaf786bbb2445) Please report if the md5sum matches or not. Thanks for your help :)

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Also forgot to mention that the above comments apply for the following iso image:
http://releases.ubuntu.com/karmic/ubuntu-9.10-desktop-i386.iso which originally has the md5sum as follows:
8790491bfa9d00f283ed9dd2d77b3906 (http://releases.ubuntu.com/karmic/MD5SUMS)

Revision history for this message
Jordan (jordanu) wrote :

**WARNING** Do not run the dd command in comment #200 **WARNING**

The command should read "dd if=<iso name> of=/mountpoint/for/ext4/partition/filename bs=512MB count=1"

Pointing of= to anything in /dev is wrong, and you should always be very careful when using dd. Though unlikely, trying to follow the instructions in comment #200 as currently written could lead you to accidentally overwrite the beginning of your ext4 partition with the contents of the iso, making all of the files on that partition difficult to recover, and overwriting many of them permanently.

Revision history for this message
Surbhi Palande (csurbhi) wrote :

Please use the following safer command to dd:

dd if=<iso name> of=/<mount-point-of-ext4-fs>/<some file name> bs=512MB count=1.
Do avoid using the dev partition, as pointed in #200.

Thanks Jordan :)

Revision history for this message
Surbhi Palande (csurbhi) wrote :

@TragicWarrior, can you let me know if you encounter the bug with an iso image ? Also are you still encountering the bug of a corrupted update on a (i assume safe) reboot ?

Revision history for this message
Miklos Juhasz (mjuhasz) wrote :

I have downloaded the iso and calculated the checksums with the current (2.6.31-19) and the proposed kernel (2.6.31-20) as well. Both of them matched.

$ wget http://ubuntu.intergenia.de/releases/karmic/ubuntu-9.10-desktop-i386.iso
$ md5sum ubuntu-9.10-desktop-i386.iso
8790491bfa9d00f283ed9dd2d77b3906 ubuntu-9.10-desktop-i386.iso

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I'm going to mark this bug as Invalid (I'm the original reporter)

I've not been able to replicate it on production hardware, and not been able to replicate it on the hardware where I was originally able to replicate it with karmic as it existed at release time.

Therefore I can only conclude that the problem was with faulty hardware, exasperated by a kernel issue that was fixed before karmic was released.

If you are a user still experiencing problems with the ext4 (or any other) filesystem, including those resulting in fsck errors, then you don't have the same bug that I reported so should report a new bug. Don't open this one unless you've snuck into my house and stolen my laptop <g>

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux (Ubuntu Karmic):
status: Triaged → Invalid
Changed in linux:
status: New → Invalid
Revision history for this message
MillenniumBug (millenniumbug) wrote :

So the warning should be removed from the Release Notes...?
http://www.ubuntu.com/getubuntu/releasenotes/910

Revision history for this message
MillenniumBug (millenniumbug) wrote :

Seems to have been removed. Thankyou, someone.

Changed in ubuntu-release-notes:
status: Fix Released → Incomplete
Changed in linux (Ubuntu Karmic):
status: Invalid → Incomplete
Changed in linux (Ubuntu):
status: Invalid → Confirmed
Changed in linux (Ubuntu Karmic):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Steve Langasek (vorlon) wrote :

do not change the status of this bug.

Changed in ubuntu-release-notes:
status: Incomplete → Fix Released
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Karmic):
status: Confirmed → Invalid
Revision history for this message
DjznBR (djzn-br) wrote :

* * * I JUST HIT THIS BUG * * *

Yes, I just did it...

I have bought a new SEAGATE HDD, part number ST3500418AS. Formatted as ext4, with / (40GB), swap (5GB), /home (220GB) and ntfs (220GB).

I installed ubuntu 10.04 and installed all updates.

Then I downloaded the ISO for 10.04.1 via Transmission bit-torrent client.
I burned the CD with Brasero.
Upon installation, it stuck in "Ubuntu" screen. Told it to check the CD and there were errors.

For my surprise, the ISO file MD5SUM was mismatching in the ext4 partition.
Then I turned on Transmission again and made it "RECHECK" the file, the file got the correct MD5SUM.

I believe I have hit this bug just now, because I was using ext3 for my home partition in my previous HDD. ext4 only for root partition. Now, problems have arised a couple of minutes JUST AFTER a ISO download and a fresh formatted ext4.

I would consider not marking this bug as invalid.

Revision history for this message
papukaija (papukaija) wrote :

@DjznBR: No, you haven't reproduced this bug. Lucid is using 2.6.32 while this bug is about kernel 2.6.31. In addition, did you read comment 206; especially this:"Therefore I can only conclude that the problem was with faulty hardware, exasperated by a kernel issue that was fixed before karmic was released." ? Please open a new bug for your issue.

Revision history for this message
DjznBR (djzn-br) wrote :

I believe the bug title has been changed once or twice, but let me re-quote here what Scott reported:

"There are worrying reports of filesystem corruption on ext4 in karmic. Scott says:

12:36 < Keybuk> this whole ext4 thing is worrying me
12:36 < Keybuk> I just downloaded an iso image, md5sum didn't match
12:36 < Keybuk> downloaded it into an ext3 partition, matched just fine
12:59 < Keybuk> and I know mvo has seen bugs with corrupted .debs in /var/cache/apt/archives
12:59 < Keybuk> which seems to imply its any file large enough to use lots of extents"

Well, that's exactly what happened on a fresh Lucid install, using ext4 partition.

It may be neither an issue with ext4 itself, nor an issue with kernel version or patch.

I think this is related to "Transmission" application. Because reports are that the corruption takes place when torrents are downloaded. And this is what exactly happened. In some ways it may be that Transmission is not handling ext4 well. And it's very subtle, since a "file recheck" on finished torrents may just reconscrut the proper MD5SUM.

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

It's not Transmission's fault. I'm a KDE user (so, I use KTorrent),
and I was affected back when this bug was filed (no problems since
though).

Revision history for this message
Ben Lau (benlau) wrote :

The bug should also affect 10.04. I have a fresh install 10.04 AMD64 (with data copy from old harddisk , stored on /home) . The result of md5sum on ubuntu-9.10-desktop-i386.iso is changing for every time I run the command:

$ md5sum ubuntu-9.10-desktop-i386.iso
adbe2aa291535c9bfb12f207d25659b5 ubuntu-9.10-desktop-i386.iso
$ md5sum ubuntu-9.10-desktop-i386.iso
735b22e87a77e5cb1b2a885264685280 ubuntu-9.10-desktop-i386.iso
$ md5sum ubuntu-9.10-desktop-i386.iso
9fb810608e96ba3642b1d19085164f33 ubuntu-9.10-desktop-i386.iso

$ uname -a
Linux benlau-desktop 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:38:40 UTC 2010 x86_64 GNU/Linux

Revision history for this message
papukaija (papukaija) wrote :

@Ben (and everyone else who thinks that he/she has reproduced this bug): No, you haven't reproduced this bug. Lucid is using 2.6.32 while this bug is about kernel 2.6.31. In addition, did you read comment 206; especially this:"Therefore I can only conclude that the problem was with faulty hardware, exasperated by a kernel issue that was fixed before karmic was released."? Please open a new bug for your issue.

Revision history for this message
era (era) wrote :

papukaija: could you please update the bug description to point to pertinent bugs for other kernel versions? I'm seeing what I suspect to be ext4 corruption on multi-CPU systems (I think all amd64) or various kernels, on both small and large files. Where and how should I report this? So far, this bug seems the closest match.

Other ext4 bugs I have looked at: bug #438379 (pretty exclusively about suspend/resume problem), bug #317781 (seems to focus on 0-byte files; certainly seems closer to what I am looking at in bug #582341).

Revision history for this message
papukaija (papukaija) wrote :

@era: Unfortunately I won't edit this bug's title nor reopen it due to reasosn mentioned in comment 215 which refers to comment 206. You should report your issue to Launchpad against the linux package. You can do so by running the following command from a Terminal (Applications->Accessories->Terminal) and it will automatically gather and attach debug information to that report:

ubuntu-bug linux

Please try to provide as much information as possible in the bug description:

    1) The majority of kernel bug are hardware specific so be sure to note what hardware/device is being used.
    2) Document any known steps to reproduce the bug.
    3) Also note whether the bug exists in previous kernel versions of Ubuntu or if it's a regression from previous kernel versions.
    4) Finally, it will help if you can test the latest development Ubuntu kernel version as well as the latest upstream mainline kernel[1].

More detailed instructions to file a bug are available at: https://help.ubuntu.com/community/ReportingBugs#How%20to%20report%20bugs

[1]: https://wiki.ubuntu.com/Kernel/MainlineBuilds

Thanks in advance.

Revision history for this message
Emanem (em4n3m) wrote :

I think I'm suffering from the same issue.
I was archiving my home in 1 tar file from 1 disk to another (all EXT4) and then it got stuck. I had to restart the computer and eventually it proceeded, but I have to say, after I copy large files, the chances I can't read/open other large files are high.

Basically as long as I don't copy/manipulate large files I don't have particular issues; as soon as I try to do such operations I have to restart my pc.

I'm using:
2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

I have to say I have a pretty vanilla Ubuntu, no customization. I'm thinking about Ext4 issue because 6 months ago all my disks were Ext2 and never had an issue, but now looks like an issue after another.
Did a memcheck and it seems definitely ok.

Unfortunately it's very hard to reproduce systematically.

Cheers

Displaying first 40 and last 40 comments. View all 218 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.