ext4: panic working with large files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| linux (Ubuntu) |
High
|
Tim Gardner | ||
| Jaunty |
High
|
Tim Gardner |
Bug Description
When working on large files (> ~10GB) the file system can become fatelly corrupted. The system will crash (freeze), and unable to reboot (Grub reports 'Error 2'). Loading from a live/recovery disk and trying to fsck the corrupted filesystem yeilds multiples error.
I have trashed two system running Jaunty (Alpha 3 and Alpha 6) on Ext4 root file system. Both times I was manipulating/using large files. The first time occuired when I simply removed a 48GB file (system frooze), and the second time when VMWare was writing to a virtual disk (large file). Both system had all updates installed (2.6.27-11 kernel)
I've attached a screen shot of part of the ensuing fsck. This is after all(?) the master (global?) blocks have been decalred invalid.If you can't see from the picture, at this stage fsck is reporting multiply-claimed blocks (by the large files being used at the time, and random smaller files).
The system was a new dual processor (Core Duo X9100) Thinkpad W500 running on a 2.5" SATA drive, 4GB core, Intel GPU.
suecom (allister-nowatt) wrote : | #1 |
affects: | ubuntu → linux (Ubuntu) |
Eric Shattow (eshattow) wrote : | #2 |
summary: |
- Ext4 file system fatel corruption + ext4: panic working with large files |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
status: | New → Triaged |
Leann Ogasawara (leannogasawara) wrote : | #3 |
Hi Guys,
Just wanted to also add a note that the kernel is expected to be frozen tomorrow for Jaunty's release. I've pinged the kernel team to see if they can get this pulled in time. If not, I suspect it should qualify for a Stable Release Update for Jaunty. Thanks.
Leann Ogasawara (leannogasawara) wrote : | #4 |
@suecom, also I notice in your description you say you ran Jaunty Alpha3 and Alpha 6 with all updates installed. However, you mention a 2.6.27-11 kernel??? I assume that was a typo? ie. Jaunty has a 2.6.28 based kernel.
Leann Ogasawara (leannogasawara) wrote : | #5 |
Hi Guys,
One of our kernel devs threw together a test kernel with this patch applied and uploaded it to his PPA:
https:/
It's package "linux - 2.6.28-
Eric Shattow (eshattow) wrote : | #6 |
I will build and test, but there is no user case to reproduce. I've hit (what I think might be) this bug maybe 5 times in 2-3 months of heavy ext4 filesystem usage. There is usually file corruption afterward. My own use case is BitTorrent, and so files are checksummed and lost data is thrown out. I don't know if there is a user behavior that would more quickly reproduce the bug described by Original Poster.
Tim Gardner (timg-tpi) wrote : | #7 |
Bah - the PPA is having problems so I built locally and stashed test kernels at http://
Changed in linux (Ubuntu): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
status: | Triaged → In Progress |
Eric Shattow (eshattow) wrote : | #8 |
No noticeable ext4-related problems with 2.6.28-11-generic #42~lp348836 SMP. I do not know if the OP's bug is fixed, only that ~lp348836 is working okay.
Tim Gardner (timg-tpi) wrote : | #9 |
@Eric - thanks for your response. I'll add this as an SRU request for the first upload after release.
Launchpad Janitor (janitor) wrote : | #10 |
This bug was fixed in the package linux - 2.6.28-11.42
---------------
linux (2.6.28-11.42) jaunty; urgency=low
[ Tim Gardner ]
* Enabled LPIA CONFIG_PACKET=y
- LP: #362071
[ Upstream Kernel Changes ]
* ext4: fix bb_prealloc_list corruption due to wrong group locking
- LP: #348836
-- Stefan Bader <email address hidden> Thu, 16 Apr 2009 08:10:55 +0200
Changed in linux (Ubuntu Jaunty): | |
status: | In Progress → Fix Released |
I am running kernel 2.6.28-11.42 generic amd64. I'm still able to crash my system with large files on ext4.
I used the following script to reproduce this:
while true; do dd if=/dev/zero of=zero bs=1M count=102400; dd if=zero of=/dev/null bs=1M; rm zero; done
The underlying device is an LVM on a VG that spans two disks.
Changed in linux (Ubuntu Jaunty): | |
status: | Fix Released → New |
I noticed some other stuff:
[ 371.568931] EXT4-fs: barriers enabled
[ 371.569257] kjournald2 starting. Commit interval 5 seconds
[ 371.569824] EXT4 FS on dm-1, internal journal on dm-1:8
[ 371.569828] EXT4-fs: delayed allocation enabled
[ 371.569831] EXT4-fs: file extents enabled
[ 371.571151] EXT4-fs: mballoc enabled
[ 371.571157] EXT4-fs: mounted filesystem with ordered data mode.
[ 379.816940] JBD: barrier-based sync failed on dm-1:8 - disabling barriers
Barriers seem to be disabled.
$ sudo lvdisplay --maps bulk/testvol
--- Logical volume ---
LV Name /dev/bulk/testvol
VG Name bulk
LV UUID q3e1GQ-
LV Write Access read/write
LV Status available
# open 0
LV Size 125.00 GB
Current LE 32000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
--- Segments ---
Logical extent 0 to 31999:
Type linear
Physical volume /dev/sda1
Physical extents 102333 to 134332
The volume isn't spread across both disks.
The same script running on a comparable ext3 filesystem, on the same disk, on the same machine, has had no problems.
Leann Ogasawara (leannogasawara) wrote : | #13 |
@ArbitraryConstant, it would be better if you opened a new bug for the issue you are seeing - https:/
Changed in linux (Ubuntu Jaunty): | |
status: | New → Fix Released |
This could be related to https:/ /bugzilla. redhat. com/show_ bug.cgi? id=490026 "EXT4 panic, list corruption in ext4_mb_ new_inode_ pa".
I'm experiencing a fatal panic occasionally on interacting with large amounts of data. The system hardlocks and I'm usually working in X11, so I don't have access to the panic message to confirm. It does sound similar to the reported issue.
Please cherrypick http:// git.kernel. org/?p= linux/kernel/ git/torvalds/ linux-2. 6.git;a= commitdiff; h=d33a1976fbee1 ee321d6f014333d 8f03a39d526c to Ubuntu 2.6.28