Ubuntu

Soft lockups (freezes) when deleting files from ext4 partitions on 2.6.28

Reported by Ömer Fadıl USTA on 2009-02-18
750
This bug affects 98 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Unassigned
linux (Ubuntu)
Medium
Tim Gardner
Jaunty
Medium
Tim Gardner
Karmic
Medium
Tim Gardner

Bug Description

[
Please read *all* previous comments before posting.

Mainline kernels are known to not experience this bug, although in general are not supported (i.e., using one is a workaround, but if they break other things you're generally out of luck).

Additional "me-too" comments aren't useful, feel free to select the "This bug affects me too" option and/or subscribe to this bug instead.
]

Binary package hint: linux-image-2.6.28-8-generic

I'm using 8.10 Kubuntu with all updates done on system.

System is a clean installed system with EXT4 formating and using 2.6.8-8 linux kernel.

System sometimes lock and freeze whole inputs even keyboard or mouse.
I have closed X and kdm and try to reprocedure same bug in console ( not konsole )
so i have killed X and kdm.

And try to compile qt-copy in one console and try to svn up on KDE and on other console
i tryto apt-get update to make system under CPU load. and after a while it happens again.

No Keyboard response no harddisc response total freeze.

I have waited a while after freeze and about 4 min later a text appeared on screen saying :

BUG: soft locking - CPU#0 stuck for 61s! [uic: 5356]

after waiting about 4 more minutes a newer but same text appeared unter this message :

BUG: soft locking - CPU#0 stuck for 61s! [uic: 5356]
BUG: soft locking - CPU#0 stuck for 61s! [uic: 5356]

There isn't any error records on /etc/log/messages releated on hardware while around freezing/locking times

And for information : Sometimes i have seen that i'm getting messages like disc is full but
I'm sure that it isn't. Because df shows me there are more than 7 Gb freespace. Not always getting this error.
if a file shows this error while i'm updating it i'm deleting it and downloading a bigger file system won't interrupts me
like saying disk is full. I think it is releated to Ext4.

But i'm not sure these 2 bugs releated or not.

Thanks

Martin Vysny (vyzivus) wrote :

I have exactly the same problem with 2.6.28-8.26. The problem started to appear only recently (2-4 days ago). The problem manifests only when deleting files - it never triggers when adding files. The problem occurs regardless of X running. Interesting is that the problem occurs on 32bit kernel only - 64bit 2.6.28-8.26 does not seem to be affected.

Michał Zając (quintasan) wrote :

I've encountered it more than 7 times (today 3 times).
First time it happend while moving my /home (4GB) to /mnt/Data, I had to reset the computer and lost some data (not very important thankfully). Today I've tried to clean the pbuilder enviroment with "ARCH=amd64 DIST=jaunty sudo pbuilder --clean" and after restarting my .kde directory was gone.

It seems the freeze occurs when moving or deleting big portions of data. Anyone else can confirm it?

Linux nightwalker 2.6.28-8-generic #26-Ubuntu SMP Wed Feb 25 04:27:53 UTC 2009 x86_64 GNU/Linux

dnyaga (daniel-nyaga) wrote :

I had experienced the same, and reported it at https://bugs.launchpad.net/ubuntu/+bug/334581. I have had to hard reset my computer four times today.

The circumstances were the same all 4 times: I was copying large directories between different ext4 partitions (using nautilus) when the system locked up. The directories in question have tens of thousands of small sized files. I am going to mark bug 334581 as a duplicate of this one so that we can focus our discussion and testing on one bug report.

dnyaga (daniel-nyaga) wrote :

Alarming frequency of kernel freezes when working with directories that have lots of tiny files: see https://bugs.launchpad.net/ubuntu/+source/subversion/+bug/342164. That bug reporter's system froze, was hard reset, ext4 had not written the newest file to disk.

Question: what is causing all these freezes?

Changed in linux:
status: New → Confirmed
Agent N2O (agentn2o) wrote :

I have experienced something similar to the first poster: I installed ubuntu 9.04 alpha 5 last week on a newly formated ext4 partition. As I was setting the system up, I was updating the system with the latest package updates but I kept running into an error saying the drive was full (it was actually at 20% full of 160 GB). Tried moving and deleting files off the drive, nothing worked. Eventually a reboot solved this but I don't know why.

Upgraded to kernel 2.26.28-9 (alpha 6) on Friday. This weekend I went about converting 2 x 1 TB data drives to ext4 (from ext3) and all went initially well but I wanted to get the full extent (no pun intended) of ext4 file structure so I was cut and pasting data back and forth between the drives using nautilus but the OS kept freezing. Eventually I figured out that copying and pasting was fine but deleting was the culprit. I tried deleting in nautilus and that hung the OS. Tried in a terminal, same thing. Booting into recovery mode and down to the root prompt and went about deleting these files and got a series of these: "BUG: soft locking - CPU#0 stuck for 61s!"

In the end I managed to completely clear one drive off so I reformatted it and then transferred everything back and then reformatted the other. Now both TB drives have "native" ext4 partitions and I can delete from those drives without hangs or freezes.

dnyaga (daniel-nyaga) wrote :

From Agent N20's comments above, it appears that the freezing occurs where ext3 partitions were converted to ext4 partitions. I have 3 converted ext4 partitions and one fresh/new one. Will try and test that theory a little tonight.

To the other reporters: were your ext4 partitions new or converted?

dnyaga (daniel-nyaga) wrote :

Same behavior independently reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/340628

The reporter of bug 340628 provided a stack trace.

Agent N2O (agentn2o) wrote :

Forgot to mention that the freezing didn't happen ALL the time with the ext4 deletes. When I was cutting and pasting it would get a few mins in and freeze, and elsewise I was able to delete some files but others made it freeze. I suspect it may have been LARGE files but I do not have solid proof of that.

On Sun, 2009-03-15 at 20:23 +0000, Agent N2O wrote:
> Forgot to mention that the freezing didn't happen ALL the time with the
> ext4 deletes. When I was cutting and pasting it would get a few mins in
> and freeze, and elsewise I was able to delete some files but others made
> it freeze. I suspect it may have been LARGE files but I do not have
> solid proof of that.

In my bug, 340628, duped to this bug, the cause was almost certainly a
race. I had multiple deletes going on in the filesystem at the same
time.

FWIW, this is not a problem with 2.6.27-12 from Intrepid which I am
currently using with Jaunty due to this issue.

Agent N2O (agentn2o) wrote :

It looks like the converted vs native ext4 filesystem info I gave earlier was a RED HERRING! I just got another system freeze deleting files off my EXT4 partition that I had reformatted (using mkfs.ext4) yesterday. I have just dropped down to a root shell on the recovery mode to see if I can figure out which specific file (size?, type?) causes problems.

Agent N2O (agentn2o) wrote :

Well, I could not reproduce the latest system freeze. Certainly the frequency of the system freezing from EXT4 deletes is much, much lower on this new native EXT4 partition as opposed to the converted version. I am going to do some more spring cleaning to see if it will freeze up again.

Agent N2O (agentn2o) wrote :

3 more nautilus delete freezes to report (all from same "native" EXT4 partition):

1. deleted a folder with a bunch of video files totalling 7.5 GB
2. deleted 16 folders and files totalling 1.1 GB
3. deleted 6 folder and files totalling 1.6 GB

In all 3 cases it froze immediately after I said yes to the "are you sure prompt" and also I was able to carry out the exact same delete after the reboot, without issue.

dnyaga (daniel-nyaga) wrote :

The freezes I initially reported occurred when I was moving large folders between ext4 partitions (moves between partitions involve deletes). When I am doing this kind of re-organizing, I usually have several move operations going on concurrently. Could it be that the bug is triggered more easily when there are multiple delete/move operations going on concurrently?

Last night I moved 70GB of data between 2 ext4 partitions. All the 70 GB was moved in one sequential operation. The computer did not freeze. I dropped one one of the ext4 partitions, re-created it, then moved the data back. The machine still did not freeze.

This evening I will "manufacture" some data that I can afford to lose then move it helter skelter between several ext4 partitions, making sure that there is a large number of moves active at any particular time.

dnyaga (daniel-nyaga) wrote :

just had another freeze. I was deleting a virtual machine snapshot (relatively large file). when I rebooted, I was able to finish deleting.

Confirm that similar regular freezing occurs only on my machine, with ext4 FS converted from ext3. Typically the freeze occurs under high disk activity; I believe when the freeze has happened, I have had a rsync job traversing whole /home, which contains a large number of small files.

I managed to get a SysRq+L stack trace, when the freeze occurred. (Photo attached; the machine was unresponsive, so can't attach it as text.) The trace is quite similar to that reported in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/340628
It might be of note that the system did not initially respond to SysRq commands at first, but responded only after a few minutes.

These freezes occur very frequently, typically within a few hours of uptime. This bug severely affects viability of using ext4 partitions (if the problem really has to do with ext4).

Probably unrelated information: freezes occur both with the non-free Nvidia driver and the free Xorg nv driver.

yaztromo (tromo) wrote :

Posting to confirm same bug. Happens when emptying lots of files from the recycle bin, or doing a big rm -r *

Message is something like "BUG: soft locking - CPU#0 stuck for 61s!"

Xubuntu 9.04 and ext4 file system

yaztromo (tromo) wrote :

I should add that my file system is new and not a convert from ext3.

Have experienced this on a daily basis, whenever I try empty the Trash folder. There are several large files in the Trash. This is on a x86_64 system running Ubuntu, and as mentioned above, under gnome and via the comand line (using rm -rf)

Xavier Fung (xavier114fch) wrote :

Same thing happened to me when I use kdesvn-build to build KDE SVN. Usually it truncates the .svn/entries file just like what has been reported before:

kde-devel@xavier:~$ cd kdesvn/kdesupport
kde-devel@xavier:~/kdesvn/kdesupport$ svn up
svn: Working copy '.' locked
svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for details)
kde-devel@xavier:~/kdesvn/kdesupport$ svn cleanup
svn: Can't read file 'soprano/includes/Error/.svn/entries': End of file found

Whole system lockup is the end result and need a hard reset.

Eric Sandeen (sandeen-ubuntu) wrote :

When it freezes, attaching the output of sysrq-w, either via

# echo w > /proc/sysrq-trigger
# dmesg > dmesg.txt

or doing the keyboard combination, would probably be helpful for getting to the bottom of what appears to be a deadlock.

Andrius Štikonas (stikonas) wrote :

Vanilla kernel 2.6.29-rc8 works well for me. So either this problem was fixed in kernel 2.6.29-rc8, or the problem is caused by Ubuntu kernel patches.

yaztromo (tromo) wrote :

Reproducing this bug to get a trace corrupted my system so badly not even a ubuntu jaunty CD will boot without locking the system hard. I'm now stuck on my laptop since I have no way to rescue!

Theodore Ts'o (tytso) wrote :

I'm not sure this patch will fix the problem (since I haven't been able to reproduce it yet), but it is at least plausible that this reported "brown paper bag" bug might be responsible for this failure mode.

I've also had one person (irc handle SuperSquirrel) tell us on ext4 that when he went to a stock 2.6.29 kernel, he could no longer reproduce the problem which he could reproduce reliable before. If this is true, then the patch I've attached may not be the solution, and it may be caused by something else in the Ubuntu specific kernel. (Although there was one person who reported a problem very similar to the one reported here on the linux-ext4 list that I don't think was using an Ubuntu kernel, so I'm not sure what to make of this "I went to stock 2.6.29 and it went away" report.)

The patch which I've attached fixes a real bug, and it will be headed to the stable kernel series as soon as it gets accepted upstream, and I'd strongly encourage Ubuntu to pick up this patch. Whether this patch fixes the rm -rf --> soft lockup problem is a different story.

Theodore Ts'o (tytso) wrote :

One more people for folks who can reproduce this to test, from the #ext4 IRC channel:

(09:27:06 PM) SuperSquirrel: I am using ubuntu stock kernel on jaunty now and hasnt frozen since i turned app armor off.
(09:29:18 PM) SuperSquirrel: i have deleted 30000 Files in one directory

Can anyone else confirm that if they disable apparmor, the problem goes away?

Hello I am "SuperSquirrel" on the IRC.

I have compiled a 2.6.29 Kernel last night and my system has not hung yet. I have also tested the stock kernel in ubuntu jaunty alpha with app armor deleted and my system has not hung up yet. So i think the problem lies with apparmor somewhere as some ext4 developer said on IRC yesterday.

yaztromo (tromo) wrote :

Simply unloading Apparmor service doesn't help. Is there a quick way to disable apparmor in the kernel too?

Vague guess but does this bug have any relevance? http://osdir.com/ml/file-systems.ext4/2008-01/msg00083.html

It seems to be something that was fixed in 2.6.29.

Theodore Ts'o (tytso) wrote :

@21:
>Vanilla kernel 2.6.29-rc8 works well for me. So either this problem was fixed in kernel >2.6.29-rc8, or the problem is caused by Ubuntu kernel patches.

Any chance you can try a vanilla 2.6.28 kernel and see if you can reproduce the problem there? Other very interesting test points would be 2.6.28-rc5, and 2.6.28-rc7. Potential fixes that might have fixed this are:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba4439165f0f0d25b2fe065cf0c1ff8130b802eb

and

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7ce9d5d1f3c8736511daa413c64985a05b2feee3

The first patch, which I suspect is more likely the fix, was merged into 2.6.28.8 and 2.6.28-rc6. The second patch was merged into 2.6.28-rc8, and isn't yet in a 2.6.28.y series yet, although it is in the for_stable branch of the ext4 git tree.

Hence it would be interesting to see if the problem is present in 2.6.28-rc5, and fixed in 2.6.28-rc6. (And thanks to whoever can do the test, since I haven't been able to figure out how to replicate it on my systems yet.)

Theodore Ts'o (tytso) wrote :

@26:
>Vague guess but does this bug have any relevance?
>http://osdir.com/ml/file-systems.ext4/2008-01/msg00083.html

I don't think so. The date on that is January 2008, and that patch was integrated long ago.

>It seems to be something that was fixed in 2.6.29.

So you've independently confirmed that it was fixed in stock 2.6.29? If so, then I think we have two people who have confirmed that it was fixed in 2.6.29, and one person who has reported it fixed in 2.6.28-rc8. (See my previous note for potential patches that might have fixed this issue.)

Theodore Ts'o (tytso) wrote :

Apparmor seems less likely to be the cause, as does any of Ubuntu's "sauce" patches. I have a report from someone who is using a completely stock kernel who has seen this bug on 2.6.28, 2.6.28.4, and 2.6.29-rc6 (which if confirmed rules out my "most likely fix" in comment #27 above). Since apparmor isn't in a stock mainstream kernel, it now looks like the problem may have been fixed sometime between 2.6.28-rc6 and 2.6.28-rc8.

(I would appreciate if others could confirm this, though --- since at least some people seem to be able to trigger this very easily, others seem to only trigger this on order of once a month or so. So if one of you Gentle Readers who have been able to reliably reproduce this hang can check to see whether or not it is present in stock 2.6.29-rc6, and but is apparently fixed in 2.6.29-rc8, I would be most grateful for the independent confirmation.)

Thanks to all who have been helping to work this bug!

Just to make things absolutely clear:
the kernel versions you would like us to test is 2.6.29-rc6 and 2.6.29-rc8? There have been numerous references to 2.6.28-rc kernels as well above which has got me all confused.

Theodore Ts'o (tytso) wrote :

@30: Gabriel,

Yes, that's correct; if you could test 2.6.29-rc6 and 2.6.29-rc8, I would be much obliged.

Sorry for the other references to other -rc kernels. I'm gathering information from other sources, including updates from this Launchpad comment stream, and each time I can get more information about "I can reproduce the problem on kernel <foo>" and "The problem seems to go away on kernel version <bar>", we get more information. The object here is to find out which patch actually solves the problem, so I can make a recommendation to the Ubuntu kernel devs to backport that individual patch --- since at this late date it is highly unlikely they will suddenly move Ubuntu Jaunty to use the just-released 2.6.29 kernel.

Thanks, regards,

yaztromo (tromo) wrote :

@Theodore,

I justed tested 2.6.29-rc6 sourced from http://kernel.ubuntu.com/~kernel-ppa/mainline/

My usual test, which involved deleteing 40gig of video files, that reliably crashed 2.6.28 hasn't crashed 2.6.29-rc6 yet. Since I may have just gotten lucky I'll do some more testing tommorrow.

If I can't get rc6 to crash is there much point in testing rc8?

yaztromo (tromo) wrote :

Update: After doing even more testing this morning, I'm 99% sure 2.6.29-rc6 isn't affected by this bug.

dpr (dpr-aha) wrote :

Hi, I could not reproduce the bug in 2.6.29-rc6 or 2.6.29 final from the same source (http://kernel.ubuntu.com/~kernel-ppa/mainline/). But I can reproduce it in 2.6.28.9 as well as in the latest ubuntu kernel.

Theodore Ts'o (tytso) wrote :

Hmm. So two people have said they haven't been able to reproduce the bug in 2.6.29-rc6. Unfortunately, one poster on the linux-ext4 claims that he experienced the problem (including getting his file system corrupted) while running that version, 2.6.29-rc6.
I'll have to ask him to confirm this. Also, all of the most likely bug fixes in 2.6.29-rc6 were forward ported to 2.6.28.8 (and thus would have been in 2.6.28.9).

So we have some contradictory data out there. I'm not sure how to reconcile these reports.

Can those folks who say they aren't seeing a problem with 2.6.29-rc6 try with 2.6.29-rc4 and 2.6.29-rc5, to see if they can trigger the problem there?

yaztromo (tromo) wrote :

I haven't tried with rc5 but I can't trigger the lockup in rc3 or rc4 at all (after much trying too!). Going back to ubuntu 2.6.28 I can still trigger it almost immediately.

http://kernel.ubuntu.com/~kernel-ppa/mainline/ doesn't have any more built kernels lower than rc3 so I'm stuck unless someone can point me to a tutorial on compiling rc1 from source.

Andrius Štikonas (stikonas) wrote :

@36
download tarball from kernel.org
tar xf linux-*.tar.bz2
fakeroot make-kpkg --initrd linux_image

Andrius Štikonas (stikonas) wrote :

@36
I made mistake in instructions:
tar xf linux-2.6.29-rc*.tar.bz2
cd linux-2.6.29-rc*
make menuconfig
fakeroot make-kpkg --initrd kernel_image

I am now compiling rc2. Will tell the result in a few hours.

Theodore Ts'o (tytso) wrote :

@yaztromo,

Can you tell me what you do to try to reproduce the problem? As I mentioned, I haven't been able to reproduce it myself, so I've had to rely other people's bug reports. If there's someone who is familiar with "git bisect", it would be really useful to try to do a "git bisect start v2.6.28 2.6.29 -- fs/ext4 fs/jbd2", reversing the sense of "git bisect good" and "git bisect bad" (i.e., if you can reproduce it, call it "git bisect good", and if you can't reproduce the soft lock, call it "git bisect bad"). It would probably require half a dozen builds or so but at the end of it, it would point us at a patch which apparently fixed the bug. (There are 91 commits invloving either the fs/ext4 or fs/jbd2 directories between .28 and .29, and log base 2 of 91 is about 6.5; so it will require approximately 7 git bisect tests in order to localize things down to a single commit.)

Again, this is mostly useful so we can tell the Ubuntu kernel devs which patch to backport for the official Ubunut Jaunty kernel. (Fedora 11 is going to be using 2.6.29, so they won't see this issue.) So unless someone can help me reproduce it on my test system (which is a 1Gig netbook with a 5400 rpm drive running Ubuntu 8.10 with an updated kernel), I really will need someone who can reproduce it and who knows how to drive git and do kernel builds out of a git source tree to localize this down.

  • hang.py Edit (612 bytes, text/x-python; charset=US-ASCII; name="hang.py")

I've been able to reproduce this consistently on my desktop (2.5gb
ram, amd@1.6ghz singlecore, 7200rpm drive) by writing half a meg to a
couple thousand different files sequentially, dropping the cache,
deleting them, and starting over. Usually the machine hardlocks
partway into the second cycle. Under 2.6.29, the test completes fine
with no intermittent hanging or otherwise. I haven't tried any other
kernels yet.

My laptop (1gb ram, intel@1.6ghz, 5400rpm drive) hangs intermittently
on the same workload, but doesn't hardlock consistently.

Tim Gardner (timg-tpi) on 2009-04-03
Changed in linux (Ubuntu):
assignee: nobody → timg-tpi
importance: Undecided → Medium
status: Confirmed → In Progress
affects: ubuntu-website → ubuntu-release-notes
Steve Langasek (vorlon) on 2009-04-16
Changed in ubuntu-release-notes:
status: New → Fix Released
Carey Underwood (cwillu) on 2009-05-03
description: updated
Changed in linux (Ubuntu Karmic):
status: In Progress → Fix Released
Stefan Bader (smb) on 2009-07-03
Changed in linux (Ubuntu Jaunty):
status: In Progress → Fix Committed
Martin Pitt (pitti) on 2009-07-08
tags: added: verification-needed
190 comments hidden view all 270 comments

I am on .28-14;

I Didn't know about the python script; I'll try it after work.

Franz Dietzmann (tdk-le) wrote :

I just read through all the comments (I hope), and did not find this mentioned, so I thought it might be helpful..

I had the problem for a long time, but didn't bother too much. Now it got annoying and after some searching I installed mainline 2.6.30 to see if it would work.
As has been mentioned here before it does, but unfortunatly my UMTS didn't work anymore, so I just deleted my Trash and went back to .28
After logging in I found I had 10GB more space on my Home-Partition (the Trash only had ~1GB in it) The partition is only 40 GB total, so that's a lot. I checked if something was missing, but didn't find anything, which was strange.

I ran baobab just out of curiosity and there I found 5GB in ~/.local/share/Trash/expunged/
On closer inspection these were all files I supposedly deleted a long time ago, when the freeze appeared afterwards. I have no idea how they got there, I'm just a user...but maybe that info can point someone into the right direction.

On Mon, 13 Jul 2009, Franz Dietzmann wrote:

> I just read through all the comments (I hope), and did not find this
> mentioned, so I thought it might be helpful..
>
> I had the problem for a long time, but didn't bother too much. Now it got annoying and after some searching I installed mainline 2.6.30 to see if it would work.
> As has been mentioned here before it does, but unfortunatly my UMTS didn't work anymore, so I just deleted my Trash and went back to .28
> After logging in I found I had 10GB more space on my Home-Partition (the Trash only had ~1GB in it) The partition is only 40 GB total, so that's a lot. I checked if something was missing, but didn't find anything, which was strange.
>
> I ran baobab just out of curiosity and there I found 5GB in ~/.local/share/Trash/expunged/
> On closer inspection these were all files I supposedly deleted a long time ago, when the freeze appeared afterwards. I have no idea how they got there, I'm just a user...but maybe that info can point someone into the right direction.

I'm sure that this is just one of the many ways to trigger this ext4 thing, still, interested me even if not the cause of the bug.

http://ubuntuforums.org/showthread.php?t=1196171&page=2
Found this thread which seems to be same issue.

Appears that this is related to permission/ownership - so presumably you deleted read-only files.

I can imagine that might happen if, for example, the files were copied off a CD and had default read-only permissions.

I'm suprised nautilus doesn't handle this more gracefully.

Franz Dietzmann (tdk-le) wrote :

I highly doubt that it had something to do with permissions, as there were really all kinds of files (audio, video, documents..) from different sources (downloads, self-made..).

I didn't mean this to be a cause of the bug, but rather a result and maybe an indicator to where things might be going wrong.

Yep, I crashed using hang.py :

Linux lantea 2.6.28-14-generic #46-Ubuntu SMP Wed Jul 8 07:21:34 UTC 2009 i686 GNU/Linux

Filesystem Type Size Used Avail Use% Mounted on
/dev/sda5 ext4 90G 76G 9.9G 89% /

Didn't crash with 10GB of 100GB.

/dev/sda5 ext4 90G 82G 4.2G 96% /

Yep, crashed on round 3.

;-(

Hallo,

I'm sorry to say that my system still hard locks with the new 2.6.28-14
kernel in jaunty when I rsync my home partion (ext3) with my backup
partition (ext4). It does not matter wether I use 'rsync -av --delete
...' or only 'rync -av ...'. The latter one just takes a little longer
for the freeze to happen. This is on a AMD Athlon 64 Processor 3700+.

The weird thing is that a have access to another system with an Intel
Quad-Core CPU that is fully ext4 but runs without a hitch. I think that
suggests that we are really running into a timing/race problem here.

Best regards,
Stephan

Luke Maurer (luke-maurer) wrote :

Huh. My system's also a single-core Athlon 64, and I'm getting it even worse (a single "rm" hangs). Is it possible that this is a race condition that's *more* likely on a single-core box? Seems like we've exhausted every other theory :-)

Jared Heath (jared-heath) wrote :

It happened very frequently on my Dual Core i86 based system (never got more than 5 single rm commands off without a hang before I went to the higher kernel) so it certanly can happen on multi-core systems often.

Your theory on race conditions is interesting though--it certainly exhibits the behavior of a race that goes infinite and does not get caught.

Apologies, this is a qualitative post --- but now that people are talking
about different processors, I'll contribute some fluffy info.

That said, I experienced many "freezes" per day on my Core Solo laptop when
doing "dangerous" operations (svn update, rsync, rm, etc.). Then I swapped
to a Core 2 Duo, and when doing these same operations, I got about the same
number "freezes", only now they recovered faultlessly (so far...) after
second or two.
After an upgrade to 2.6.30-020630-generic #020630 from the Ubuntu Kernel-ppa
mainline, (to solve unrelated HP laptop sound issues), I have not
experienced any more "freezes" temporary, or otherwise.

c.

2009/7/16 Jared Heath <email address hidden>

> It happened very frequently on my Dual Core i86 based system (never got
> more than 5 single rm commands off without a hang before I went to the
> higher kernel) so it certanly can happen on multi-core systems often.
>
> Your theory on race conditions is interesting though--it certainly
> exhibits the behavior of a race that goes infinite and does not get
> caught.
>
>

Colin Sindle wrote:
> After an upgrade to 2.6.30-020630-generic #020630 from the Ubuntu Kernel-ppa
> mainline, (to solve unrelated HP laptop sound issues), I have not
> experienced any more "freezes" temporary, or otherwise.

I have as well now manually switched to the 2.6.30-020630 kernel and the
freezes are gone...

Best regards,
Stephan

I'm going to reboot, after installing 2.6.30 + newer (185.18.14) NVidia drivers
see https://bugs.launchpad.net/ubuntu/+source/nvidia-common/+bug/384639/comments/8
to do it with NVidia working ;-)

Hoping this will solve the crash problem. I would suggest to others to try the same, since the problem was
apparently solved and NOBODY decided to just backport the damn patches from the more recent kernels... I mean, its been MONTHS, and I'm not running Linux to have random crashes.

Borph (borph) wrote :

Full acknowledgement!

For me, I installed Kubuntu Jaunty fresh with native ext4 and external backup drive, also ext4. Actually it was because of a system crash in which I lost my complete partition. So I want to have the backup-system working now before I proceed! But I was stuck because of this ext4-bug, system freezed very often!

I'm just a user and didn't want to experiment!! Ext4 is not the default fs on ubuntu I read above, ok but I really regret that I chose this during graphical installation! Sorry that I didn't read the full release notes, I had no idea that it is that experimental!

Anyway, now I'm stuck, as don't want to re-format my disks, especially not for an issue which doesn't occure in mainline kernel. So I decided to tweak the system and get the kernel 2.6.29 (the 2.6.30 seems to have other problems..), following:

http://www.ramoonus.nl/2009/03/24/linux-kernel-2629-installation-guide-for-ubuntu-and-debian-linux/

But this doesn't put it in GRUB, so you have to change your menu.lst and do update-grub and update-initramfs.

Well, no crashes so far, even copying about 30gig. I actually removed the "nodelalloc" mount option, still stable so far.

I really recommend to get a newer kernel ( >=.29), especially because this is just an Ubuntu problem and Ted Ts'o is probably busy fixing more important stuff :) But the ubuntu guys should provide indeed an automatic update for the _really_ unexperienced people!

JoseStefan (josestefan) wrote :

I've also been using the Karmic kernels on Jaunty (and the new nvidia drivers) as suggested by martinm1000. Unfortunately, it seems to require also updating the graphics drivers, in my case nvidia.

I've applied this temporary fix a while back, seeing this is taking too long to fix. I also vote for a backport as a temporary fix, instead of having inexperienced users jump through hoops. Most of the solutions posted so far seem to mess with your 3d acceleration, either requiring an update to the video drivers or manual installation. Another reason why i think a backport would be preferred.

I understand package policy would make it difficult for kernel 2.6.29 or newer to make it into jaunty. But isn't that what "jaunty-backports" is for? Using mainline kernels or getting karmic packages is not exactly a 1 click installation, and in fact could break your system. A backport on the other hand can be enabled using the GUI. And could provide an easier fix for those who need it.

The solution i adopted is very similar to having a backport:
1) Add a pin, by editing /etc/apt/preferences
Package: *
Pin: release a=karmic
Pin-Priority: 50

2) Append karmic to your sources.list:
deb http://us.archive.ubuntu.com/ubuntu/ karmic main restricted

3) Update your repositories.
sudo apt-get update

4) Use apt or synaptic to get the packages you want.
linux-image-2.6.31-3-generic
linux-headers-2.6.31-3-generic
linux-headers-2.6.31-3
nvidia-glx-180
nvidia-kernel-common

2009/7/17 JoseStefan <email address hidden>:
> I've also been using the Karmic kernels on Jaunty (and the new nvidia
> drivers) as suggested by martinm1000. Unfortunately, it seems to require
> also updating the graphics drivers, in my case nvidia.

Because I'm using Nvidia, too, and read about some problems, I went
for 2.6.29 and it worked, I have 3D.

> I understand package policy would make it difficult for kernel 2.6.29 or
> newer to make it into jaunty. But isn't that what "jaunty-backports" is
> for? Using mainline kernels or getting karmic packages is not exactly a
> 1 click installation, and in fact could break your system. A backport on
> the other hand can be enabled using the GUI. And could provide an easier
> fix for those who need it.

I actually didn't even enable jaunty-proposed or jaunty-backport, I
wanted just a normal failsafe ubuntu. It took so much time to figure
out it's actually ext4 causing the troubles! There should be an update
even for users who got scared with the sentence "if you enable
'proposed' or 'backport', your system maybe not stable anymore!".

Your "pin" sounds promising, I will try this. But with care, as it's
currently running! :)

Peter

enb (elitenoobboy) wrote :

Updating the kernel fixed this for me. Thanks JoseStefan for easy instructions. I think one of the hardest parts of trouble shooting this is that it only seems to happen on certain hardware configurations, which means that initially I thought it was a hardware glitch of some kind due to it not happening on any other computers with almost the same software setup.

Wei-Yee Chan (chanweiyee) wrote :

This sounds similar to a problem that I experienced yesterday.

I did a fresh installation of Ubuntu 9.04 recently and formatted every partition to ext4. Yesterday, I was moving huge video files from my home directory to a removable USB hard disk (formatted to ext4 as well) when the system froze permanently (i.e. all hard disks stopped running completely). I did this with a couple of my other removable USB hard disks and the same thing happened many times.

The problem can be replicated by copying or moving files within the same IDE hard disk as well. Just a while ago, the system froze when I emptied Trash.

The computer has Windows XP installed, and no such problem problem occurs when I'm running it.

However, as far as I know, I have not experienced any data loss.

With reference to a few of the comments made above, I have more than 40Gb on every partition at any time, so the locking up seems unrelated to the amount of free hard disk space that one has.

getaceres (getaceres) wrote :

I've installed the kernel in Jaunty proposed some days ago and since then I haven't had any hang. My system seems much more stable now.

Keith Moyer (keithmoyer) wrote :

I have the -14 kernel, and just hit this bug again last night (actually caused me to lose a fair amount of data).

Are people still looking into this? By most accounts, the "fix committed" doesn't fix the problem.

Borph (borph) wrote :

@getaceres:
Which kernel version are you using exactly?
Mine is 2.6.29-020629-generic, manually installed. But I would prefer to have a system with standard components. But I don't want to risk loosing my data again.

Igor Tarasov (tarasov-igor) wrote :

I've tried using latest kernel from proposed (2.6.28-15) but I had two lockups, though they might be not that easy provoking. So, the bug is not fixed, I am back on 2.6.29-02062906

Xavier Guillot (valeryan-24) wrote :

Since the last updates, it worked better : I could suppress definitively files in Nautilus without crashing.

But one time doing this I got a freeze, and 2 times also during copy / cut - paste of files (around 9 Gb), on a partition with a lot of space available for the first one.

SInce yesterday, due to this recurring problem (and risk of important datas loss), I installed Karmic alpha 3...

Changed in linux (Ubuntu Jaunty):
status: Fix Committed → Confirmed
Steve Langasek (vorlon) on 2009-08-17
tags: added: verification-failed
removed: verification-needed
Launchpad Janitor (janitor) wrote :
Download full text (7.2 KiB)

This bug was fixed in the package linux - 2.6.28-15.48

---------------
linux (2.6.28-15.48) jaunty-proposed; urgency=low

  [ Andy Whitcroft ]

  * SAUCE: pnp: add PNP resource range checking function
    - LP: #349314
  * SAUCE: i915: enable MCHBAR if needed
    - LP: #349314

  [ Brad Figg ]

  * SAUCE: Add information to recognize Toshiba Satellite Pro M10 Alps
    Touchpad
    - LP: #330885

  [ Colin Ian King ]

  * Input: atkbd - add forced release keys quirk for Samsung Q45
    - LP: #347623

  [ Manoj Iyer ]

  * SAUCE: Added quirk to enable the installer to recognize NetXen NIC.
    - LP: #389603

  [ Stefan Bader ]

  * SAUCE: input: Blacklist digitizers from joydev.c
    - LP: #300143

  [ Tim Gardner ]

  * Revert "SAUCE: md: wait for possible pending deletes after stopping an
    array"
    - LP: #334994

  [ Upstream Kernel Changes ]

  * bonding: Fix updating of speed/duplex changes
    - LP: #371651
  * net: fix sctp breakage
    - LP: #371651
  * ipv6: don't use tw net when accounting for recycled tw
    - LP: #371651
  * ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c)
    - LP: #371651
  * netfilter: nf_conntrack_tcp: fix unaligned memory access in tcp_sack
    - LP: #371651
  * xfrm: spin_lock() should be spin_unlock() in xfrm_state.c
    - LP: #371651
  * bridge: bad error handling when adding invalid ether address
    - LP: #371651
  * bas_gigaset: correctly allocate USB interrupt transfer buffer
    - LP: #371651
  * USB: EHCI: add software retry for transaction errors
    - LP: #371651
  * USB: fix USB_STORAGE_CYPRESS_ATACB
    - LP: #371651
  * USB: usb-storage: increase max_sectors for tape drives
    - LP: #371651
  * USB: gadget: fix rndis regression
    - LP: #371651
  * USB: add quirk to avoid config and interface strings
    - LP: #371651
  * cifs: fix buffer format byte on NT Rename/hardlink
    - LP: #371651
  * b43: fix b43_plcp_get_bitrate_idx_ofdm return type
    - LP: #371651
  * Add a missing unlock_kernel() in raw_open()
    - LP: #371651
  * x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot
    - LP: #371651
  * security/smack: fix oops when setting a size 0 SMACK64 xattr
    - LP: #371651
  * x86, setup: mark %esi as clobbered in E820 BIOS call
    - LP: #371651
  * dock: fix dereference after kfree()
    - LP: #371651
  * mm: define a UNIQUE value for AS_UNEVICTABLE flag
    - LP: #371651
  * mm: do_xip_mapping_read: fix length calculation
    - LP: #371651
  * vfs: skip I_CLEAR state inodes
    - LP: #371651
  * net/netrom: Fix socket locking
    - LP: #371651
  * kprobes: Fix locking imbalance in kretprobes
    - LP: #371651
  * netfilter: {ip, ip6, arp}_tables: fix incorrect loop detection
    - LP: #371651
  * ALSA: hda - add missing comma in ad1884_slave_vols
    - LP: #371651
  * SCSI: libiscsi: fix iscsi pool error path
    - LP: #371651
  * SCSI: libiscsi: fix iscsi pool error path again
    - LP: #371651
  * posixtimers, sched: Fix posix clock monotonicity
    - LP: #371651
  * sched: do not count frozen tasks toward load
    - LP: #371651
  * spi: spi_write_then_read() bugfixes
    - LP: #371651
  * powerpc: Fix data-corrupting bug in __futex_atomic_op
    - LP...

Read more...

Changed in linux (Ubuntu Jaunty):
status: Confirmed → Fix Released
Steve Langasek (vorlon) wrote :

verification failed, but the patch doesn't appear to have introduced regressions, so the updated kernel has been published to jaunty-updates. Resetting for the next pass.

Changed in linux (Ubuntu Jaunty):
status: Fix Released → Confirmed
tags: removed: verification-failed
Phil Norbeck (ptn107) wrote :

I can reproduce this every single time when deleting large files from ext3 partitions as well as ext4. I have too noticed that it is easier to reproduce when the working partition is low on free space. In my case though when reviewing the log files each soft lockup instance has lines in common relating to 'eCryptfs'. My other kernels 2.6.29.6 and 2.6.30.5 do not have this problem.

Logs attached.

Ubuntu 9.04 x86_64
Linux phil-desktop 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC 2009 x86_64 GNU/Linux

Im running a fully updated Jaunty and I am still experiencing lockups when deleting large files/directories. Any idea of when will have a fix release for jaunty?

Theodore Ts'o (tytso) wrote :

At this point, it seems pretty clear to me that no one is really working on this for Jaunty; if you must use Januty, the only thing I can suggest is to use a mainline kernel --- any mainline kernel, whether it is 2.6.28, 2.6.29, or 2.6.30 will work fine. The problem seems to be in Canonical's backports of patches to the 2.6.28 kernel, and the only people who could work on it are busy working on the Karmic release and/or the Karmic kernel. Those of us (like myself) who are working on the upstream ext4 are busy working on the latest set of improvements and bug fixes that will go into 2.6.31 or 2.6.32.

For those of you who need some proprietary drivers, I'm sorry to say, the only thing you can really do is wait for them to become ported to the Karmic kernel (or port them yourself).

Andrew Berry (andrewberry) wrote :

Is there a list somewhere of notable patches / features which Canonical has integrated into their kernel? I'd like to switch to a mainline kernel to avoid this bug (which is still affecting me), but want to be sure I'm not missing anything critical which Canonical has changed.

papukaija (papukaija) wrote :

Should we close this bug for Jaunty as no one is working for it (see comment 256) ?

Saivann Carignan (oxmosys) wrote :

No, Jaunty is still supported (it's still the latest release) and the bug is still confirmed, therefore closing it would be inappropriate. It would also don't help developers to track the bug and work on it later.

tiagolp (tiagolp) wrote :

mounting the ext4 filesystem with the mount options "sync,barrier=1" seems to solve the problem on my case (2.6.28-15-generic).

Logicwax (logicwax) wrote :

thanks taigolp! I can confirm as well that mounting my native ext4 with "sync,barrier=1" option in my fstab solves the problem on Jaunty.

Logicwax (logicwax) wrote :

actually I'm sorry, I take that back. I was trying to rm -rf over 1.3TB of data, composed of over 17,000 sub directories each a dozen or so files located inside.

I too had complete system lock-up when I would try deleting them (moving and copying was fine).

I tried to move the directories in blocks of about 100 or so to another directory, then tried deleting those. I had the same lockup issues.

The method that taigolp proposed helped a lot, but didn't completely solve my problem. While I could delete about a 100 or so directories now, I still can't delete the entire 17k directory tree without a full lock-up.

for the record, I'm running jaunty 32-bit, 2.6.28-15-generic. ext4 native on a LVM volume spanned across two 1.5TB sata drives on a silicon image SATA pci card.

Andrew Berry (andrewberry) wrote :

It seems to me that this is fixed in the patches committed from #418197. Can anyone else confirm? I was able to delete around 2.6 million links and files in a single rm -rf, which would previously cause a lockup in a minute or two.

Andrew Berry (andrewberry) wrote :

Link since comments don't autolink to bug numbers: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/418197

Rene (g.xrc) wrote :

Since I upgraded to
Linux rgm 2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:49:34 UTC 2009 i686 GNU/
no freeze when deleting big files together (> 1GB)
no "BUG: soft locking - CPU#0 stuck for 61s! [uic: 5356]"
mean 2 PC had the problem, 2PC solved !!!
Previously I had to switch to mainline kernel (I chose 2.6.30.6).
Thank you.

Changed in linux (Ubuntu Jaunty):
status: Confirmed → Fix Released
ViPeRaY (mail-erayyilmaz) wrote :

It seems like the fix has been released for this but I am still having this problem. I can copy large files (around 15-20 gig) to a NTFS hard drive and there is no problem. However when I try to copy same files to an internal hard drive which uses ext4, the system freezes. I am using Karmic with kernel 2.6.31-16-generic.

My question is, how do I get the fix? I get auto updates but do I have to manually install the fix? And where is the patch files are located?

Thanks,

enb (elitenoobboy) wrote :

"However when I try to copy same files to an internal hard drive which uses ext4, the system freezes."

This would be a different bug, as this bug only occurs when removing files.

"My question is, how do I get the fix?"

It looks like the latest karmic kernel release is 2.6.31-17. You might want to try installing that.

If that doesn't work, and assuming that it really is a kernel problem and not caused by something else, you could try the 2.6.32 kernel from lucid's repository, though since lucid is still in alpha stages, it might be best to find out if it really is being caused by the kernel first.

hoover (uwe-schuerkamp) wrote :

I have experienced a similar bug removing largish video files (about 4GB or so) from an internal SATA drive formatted with an xfs filesystem.

Sometimes when doing an "rm -rf" on a directory on that file system, the rm will hang and remain pegged at 100% cpu usage. As opposed to other posters in this thread, I don't see any suspicious messages in dmesg about hangs or timeouts, and usually I'm able to "rm -rf" the directory in question from another terminal session without a hang.

The only thing that kills the rm is a reboot, kill -9, Ctrl-C and so on all won't work on that process.

Please let me know if you need any further logs, I'm running kernel 2.6.32-27-generic #49-Ubuntu SMP Wed Dec 1 23:52:12 UTC 2010 i686 GNU/Linux on Linux Mint10 which is based on Maverick 10.10.

reini (rrumberger) wrote :

Since this report is about ext4 and you're having problems with xfs, you really should open a separate report...

LinkedIn
------------

Bug,

I'd like to add you to my professional network on LinkedIn.

- Pritam

Pritam Ghanghas
Technology specialist at Infosys
Bengaluru Area, India

Confirm that you know Pritam Ghanghas:
https://www.linkedin.com/e/-xbysru-h6rdrmt4-2s/isd/8524018569/kdL7IApK/?hs=false&tok=1nzfTR9Cd5ylo1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-xbysru-h6rdrmt4-2s/u8T3vuO4neBI5tyng4kKHld4Y3irWqJhOpbybZf/goo/330824%40bugs%2Elaunchpad%2Enet/20061/I2866543655_1/?hs=false&tok=2vmglBmsx5ylo1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.

Displaying first 40 and last 40 comments. View all 270 comments or add a comment.