ubuntu 10.04 ext4 fast Ram and Fast hd's freezes at random

Bug #632346 reported by Christophe Van Reusel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
e2fsprogs (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: e2fsprogs

Hello, I hope it's the good package I write this bug for. Currently as You can see i'm back to ext3 file system.

But when I try to install 10.04 with standard ext4 sometimes already a freeze occur during install proces when formatting HD. Sometimes it work's but take's very very long to format.

Once installed, (with ext4) Everything seems to be fine, except that the whole system freezes sometimes instant. It is very random and sometimes pc work's a week whitout a freeze other days it freezes 3 times a day. I did checked all ram update bios and so one, even in despit of good ram test's I even tried with other ram. Problem in all cases the same. The only option after freeze is a hard reset. There is no log information about what went wrong. But there was one thing which happende after each reset when I had a freeze . Into dmesg I found

[ 4.868112] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
[ 4.868115] EXT4-fs (sda1): write access will be enabled during recovery
[ 6.110195] ieee1394: Host added: ID:BUS[0-00:1023] GUID[002b7aa800001fd0]
[ 7.182816] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 7.182821] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 16649995
[ 7.182868] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 16649877
[ 7.182879] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 15335437
[ 7.182886] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 15335435
[ 7.182891] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 15335434
[ 7.182896] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 15335433
[ 7.182901] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 15335432
[ 7.182905] EXT4-fs (sda1): 7 orphan inodes deleted
[ 7.182907] EXT4-fs (sda1): recovery complete

This was not there when I did normal shutdown and restart.

If I put the ram speed down into the bios freezes are less frequent but occur still especially when performing large data transfers.

When formatting with ext3 everything goes fine and never a freeze occur. Looking around on the web I found similar problems into ubuntu forums. The common faktor seems to be that they all occur when using fast ram and fast hd's with ext4 file system.

For me using WD raptor HD seems to be touched by that problem.

It look's like there is some were a runtime problem when writing and or reading to EXT4 formated drives which only occurs using fast ram in combination with fast HD.

I even think that it can be solved by changing some were a small rule into ext4 source. But due to the small amount of info after the freeze occur, now it wil be very difficult to find.

The only thing I can deduct that at each freeze some garbag was written to the ext4 HD.

If other persons are affected by this bug, I suggest that you use ext3 for the moment.

Hoping that there is enough info here ( i'm well back to ext3 now )
gr christophe

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: e2fsprogs 1.41.11-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.32-24.42-generic 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Tue Sep 7 13:12:15 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100429)
ProcEnviron:
 LANG=nl_BE.utf8
 SHELL=/bin/bash
SourcePackage: e2fsprogs

Revision history for this message
Christophe Van Reusel (christophevr) wrote :
Revision history for this message
Christophe Van Reusel (christophevr) wrote :

O yes I just add extra info over here about hardware used.

Gigabyte Ep45 Extreme MB

Processor Intel Core Duo 3333 Mhz (not overlocked well running at full speed)
Ram 2 X 2048 MB DDR2-400 OCZ2RPR10662G /dual Channel
CAS 5-5-5-15
Ram at max speed now runs perfect under ext3

Ram test even after 3 passes error free.

HD 2 X WD 320 GB raptors
HD 1 X Seagate 1 TB

just cause I think this info maybe usefull

gr. christophe

Revision history for this message
Theodore Ts'o (tytso) wrote :

If you're getting a freeze while you're formatting the disk, since mke2fs is a userspace program and the ext4 kernel code isn't running at all, and you are also sometimes getting a freeze when writing to the file system, the code which is in common in those two cases is the block device layer and the device driver itself. (And perhaps the disk controller hardware, I suppose.)

What would be really useful in debugging this is would be to determine if there is any kind of crash dump before the system freezes. Unfortunately, if you have X running, the crash dump might not be visible, unless you can set up a serial console. Or if you could make sure you are in the VT console while trying to format the disk. At that point, if even if you don't get a crash dump, we can use the magic sysrq key (alt-sysrq if you are using a VT console, or break if you are using a serial console) to get some information about the frozen system. See the Wikipedia article on "magic sysrq key" if you're not familiar with this option.

When debugging system freezes, I find that sysrq-p multiple times, to and recording the IP (instruction pointer aka program counter on other systems), to see where the kernel is executing, sysrq-w, to see list of blocked tasks, sysrq-d, to see all currently taken locks, are most helpful. If it looks like the system is swapping its brains out, or is behaving as if it's running into memory problems, sysrq-m is also useful. I doubt this is happening in your case, though.

Note that some of these sysrq messages will print a *lot* of information, far more than can be held in a VT screen, or even the VT scrollback buffer. So setting up a serial console is also a good idea. There are a number of resources on the web which will give you instructions on making this work. Note that you need to modify the kernel boot command line to specify the serial console, since what's important for this case is not logging into the system via the serial console (which is what configuring inittab and/or getty will do) but rather petting kernel messages be sent to the serial console (which requires setting up the serial console via the kernel command line at boot time).

Hope this helps you gather the information you need....

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Hello Thank's for the Reply But Wooowh, It's all chinese for me . However i gonna look up and search what to study to have a bit a clue about what you told. :-)

The only think maybe to add here:

The freeze during formating was actually when installing ubuntu from the ubuntu install cd. (when using ext 4) . So yes I used X and indeed No during the formatting process the ext4 kernel code wasn't running.

Since after reseting booting again from cd and restarting the installation, I saw that the concerned HD on which I installed ubuntu was indeed formatted ext4 so I think the formating process went fine but the freeze occured shortly after the formating when the installer started to write to the just formated drive. But the X still displayed the formatting step.

As far as I think to understand over here I'll have the guts to say problems arrise when ext4 kernel code starts to run and to be used ?
Somehow fast ram and fast HD are trigering a run time error when using ext4. Course it maybe as well related to specific hardware combinations.

I as well noticed on several forums that this problem is not very unique, Off course I had to rule out all freezes and so more related to the use off X program's , But sorting out all the only common factors are.
Fast HD Fast RAM ext4 . Mostly into server combinations and raid drives (the reason why they use fast drives)
On all the situations everything was working fine using ext3 so problems came only when using ext4
and yes all very random not frequent well unpredictable whithout any debug info. So even if I should use serial console and so I don't think I will have any info more since the freeze is total. I even waited sometime more then 2 hours with a freezed pc The screen stayed on showing the latest opened application Date and time stayed exactly on moment it freezed. Removing the power from screen and restoring did straight show the same excact screen. Pushing all keys on the keyboard did not produce even an error beep from pc speaker.

I've now been busy with this problem since about 2 month's having several times ext4 then back ext3 and so on. Just cause I tried to have as much as possible info. But my pc is needed for a lot of other stuff .

So I hope that other persons having similar issues with ext4 are giving there info about it as well cause I think it will be the fastes way to track the ext4 kernel problem or bug.

I'm almost shure it is into the ext4 kernel code itself (bud will not put my head on it )

I used package e2fsprogs ubuntu for it cause I did not known exactly which packet to use for general ext4 problem.

But it would be nice if someone who has more a clue about it then me put's it on the right place.

gr christophe

Revision history for this message
Theodore Ts'o (tytso) wrote :

>The freeze during formating was actually when installing ubuntu from the ubuntu install cd. (when using ext 4) . So yes I used X >and indeed No during the formatting process the ext4 kernel code wasn't running.

I'm having a bit of problem parsing this, but I think what you are saying is that freeze didn't actually happen when the file system was really being __formatted__ (specifically, when mk2fs is running), but rather as it was unpacking packages and copying files into the file system. Is that correct? If correct, then "formatting" really is extremely misleading. Can you tell me exactly what part of the install process was happening when it freeze happened?

If in fact it was while the install process was copying files into the file system, then let me ask me another question; are you willing to try using a newer, beta-test kernel? For example, can you try using the Ubuntu-lts-2.6.35-20.29 kernel? I believe the Ubuntu Kernel team has various newer kernels available here: https://launchpad.net/~kernel-ppa/+archive/ppa

There are indeed a bunch of bug fixes which aren't necessarily fixed in the 2.6.32-based kernel used by Ubuntu Lucid. So it might be worthwhile seeing whether it is fixed in 2.6.35.... Note that a bleeding edge kernel may not work well if you are using proprietary kernel modules. So if you are using an Nvidia graphics board, for example, you will need to update to a newer nvidia-current package (256.53), which will in turn require some new prerequisites. But if you are using open hardware, then it should be easy to try using a newer kernel.

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Hello, Yes I can't say were it bugged during the installation cause pc freezed fully with X screen on the actually formatting step, But Yes this is what I meant That freeze didn't happen during the formatting process itself but When it was just done and the installer loaded the ext4 core and start using it to start writing the kernel and so to the HD.

Now why I think that :, Just to recover the freeze I of course performed I hard reset, And restarted the installation from scratch. Of course on the menu where tho choose the disk and so I took I closer look and saw that the HD in my case sda1 was indeed formatted ext4. Should it have crashed during the formating It should have be an unknown HD formatting.

But here again it does have nothing to do with the installer or formatting program itself, Since I alway's succeeded finally to have a succesfull installation of ubuntu 10.04 . But once it was installed I suffered from the random freezes at will. Unpredictable not very frequent at all. on ext4 file system .

When I install the 10.04 But during the format step I choose to format ext3 off course append the / as directory, And yes I reserve between 4200 MB and 5000 MB for a swap partition (just a little more then my RAM capacity) Which a format into an extended linux swap partition. It goes like a train +- 15 min to have a full install from scratch and this is fast. Off course after I need to update and so one that's logic.

I'm willing to use an other kernel, But it needs to have new basic ext4 kernel drivers as well as there is the problem I guess.

And The whole pc is free hardware But not my graphic card it is indeed NVIDIA . But this will not be a big problem as it runs fine whithout the proprietary NVIDIA kernel modules (of course overlay, 2D, 3D and all those stuff will not work then but the basic functions well after I still can compile the nvidia proprietary kernel at least ,,?? according info into lucid the current stable kernel does not support the basic compile programs provided by nvidia itself so the only way to install them is true the non free driver installation program from ubuntu itself for me currently they are NVRM: loading NVIDIA UNIX x86_64 Kernel Module 195.36.24

What i can try is the basic Maverick If i'm not wrong they are all on the beta release ? If there the ext4 core drivers are updated it's very usefull and it will not be a wast of time in trying .

To Put something's clear (yes sorry my English isn't good at all and the writing even a disaster)

The problem is NOT an installation and NOT a formating issue.
   But installation can be affected from this problem.
The problem only and only occurs on ext4 formated HD witht fast ram and fast HD
  so ubuntu on ext3 NOT affected

The effect's are random pc freeze NOT frequent absolutely unpredictable variating from once a week till sometimes 3 times in a day.

Due too the sudden instant freeze no debug or error repport is made as the pc full lock's up.
After restart (hard reset) THERE is ALWAYS a REPORT from HD recovery about some wrong inodes etc....

Very few I now but well fact's

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Hello,

Surfing the web, I found A lot of extra problems concerning ext4. somewhere I found that it is indeed linux kernel related.
According der kernel org the problems are on 2.6.31 and 2.6.32 kernel and should be gone as from 2.6.33 but or fore some hardware 2.6.34 .

So I still had one of my raptors with a windows installation which I actually never used.Since the very few things I still need to do via windows are all covered true vmware xp installation. I just reformated it.

On that drive I installed Ubuntu Beta release Maverick which is now running on 2.6.35-20 .
Installation (ext4) went like a train +- 12 minutes basic install. (A very small hickup as pc did not want to shutdown after the install procedure, but it was not a freeze after all it's still a beta release)

Till now it seems to run very fine .

Off course freezes before where random and sometimes 3 a day till sometimes a week without a freeze.

My first impession seems that the bug seems to be solved with this kernel and maverick. However I will monitor it closely, and still have to reïnstall data and configure my local servers.

If there is a problem or it's freezes again A let it know here.

If It now runs perfect during a good week I will let it know as well as then I guess this bug maybe closed with mentioning :

If You run a kernel 2.6.32 xxx and encountering freezes from ext4 use ext3 or upgrade to kernel 2.6.35

greetings christophe

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Sorry again freezes occur :-(

even with maverick 2.6.35-20

Also the only repport I found after start is

[ 4.875717] EXT4-fs (sdb1): INFO: recovery required on readonly filesystem
[ 4.875720] EXT4-fs (sdb1): write access will be enabled during recovery
[ 5.472499] EXT4-fs (sdb1): orphan cleanup on readonly fs
[ 5.472505] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 3146105
[ 5.472542] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 3146023
[ 5.472588] EXT4-fs (sdb1): 2 orphan inodes deleted
[ 5.472590] EXT4-fs (sdb1): recovery complete
[ 5.934470] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)

This time it was during copy from large amount of date from one HD to another using nautilus clipboard.

Total amount off data was 100 GB freeze occur after 22 GB was done.

So ext4 problem with some hardware still present

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

This time better nice :-) Think a found a way to let work ext4 freeze free.

Somewhere I found that there are sometimes data losses using ext4 which can lead to an Xlock

By adding nodelalloc into fstab it solved the problem.
So now I replaced my fstab rule

UUID=4b636bec-6ab8-46e2-97a7-9ad8616520b3 / ext4 errors=remount-ro 0 1

Into

UUID=4b636bec-6ab8-46e2-97a7-9ad8616520b3 / ext4 rw,relatime,errors=remount-ro,nodelalloc 0 1

so i added rw,relatime,nodelalloc.

Performance when freeze occurs was 98 MB/s before I modified the fstab rule
Performance after I modified the fstab rule 89 MB/s

So a lost a bit but off performance but Copied already over 300 GB erased it and no freeze occurd anymore

gr christophe

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Hello,

Yes sorry here I'm again. Yes unfortunately again sometimes freezes.

But this time Think a found ONE possible reason Even Think it's THE reason. But will need some help.

Pc bios is SetUp to use AHCI on the sata bus (native mode enabled as well.)

But here ubuntu uses always pata_it823 for my HD's,

Just here the scsi drivers loaded obtained with dmesg

[ 0.542595] scsi0 : pata_it8213
[ 0.547194] Floppy drive(s): fd0 is 1.44M
[ 0.549205] scsi1 : pata_it8213
[ 0.549233] ata1: PATA max UDMA/66 cmd 0xd000 ctl 0xd100 bmdma 0xd400 irq 19
[ 0.549235] ata2: DUMMY
[ 0.549393] firewire_ohci 0000:04:06.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 0.563717] FDC 0 is a post-1991 82077
[ 0.601297] firewire_ohci: Added fw-ohci device 0000:04:06.0, OHCI v1.10, 4 IR + 8 IT contexts, quirks 0x2
[ 0.610082] scsi2 : ahci
[ 0.610144] scsi3 : ahci
[ 0.610188] scsi4 : ahci
[ 0.610221] scsi5 : ahci
[ 0.610266] scsi6 : ahci
[ 0.610313] scsi7 : ahci

When I don't use the AHCI bios into startup No freezeing occurs anymore.

But sorry system processors and ram are made and assembled to use AHCI which is a very considerable increase in performance.

Like said before I'm almost shure that problems with freezing arise due to somewhere an runtime fault.

The fact that bios is setup for AHCI use But ubuntu finnaly loads pata drivers instead of the correct ones for my HD's is not normal at all. I'm not a programmer But somehow this does not look normal at all and in my opinion can do nothing else then bug somewhere at the end.

So now here I need yes sorry HELP As i'm now searching on al how to's for almost 8 hours in a row to trie forcing AHCI driver for my hd instead of the pata_it8213 Whatever I do ubuntu still loads pata_it8213 even if it's blacklisted.

If somebody nows how to perform that ?? would be nice . This at the same time I'm shure could run out a lot off problems which seems to be the use off EXT4 while it's not the EXT4 but eighter conflicting driver issue. But due to the hard freeze (it's is not only an Xfreeze or so but a real full computer lock up) there is no debug information even possible about it.

So the HELP question HOW TO FORCE another driver then the pata_it8213 on ubuntu ???

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

oh shit again wrong those pata seems to be good as scsi 0 and 1 are my two dvd readers writers.

So then my only ?? question is why this happens ??

this dmesg msg
    3.830357] sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

When using AHCI dpo and FUA should be enabled .

I just putted an attachement.txt from my dmesg of only one startup.

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Hello, finnaly I off course did not give up on ext4. I think that I found the Problem.

First I now think the problem is not situated into ext4 itself however well triggered by it as ext4 is made and has a very good read write performance. As the matter of facts it uses really the maximum possibility of hardware.

I did off course update my bios to the latest stable version (f8 bios ver gigabyte ep-45-extreme MB)
But yes sorry I was stupid to forget one very important thing after bios upgrade, And that's just to reload optimized bios defaults, wich causes the bios setup to reppol all your devices. So after checking and thinking about wath the hell happened i came on that. I did that and just change again some settings by me like disabling usb legacy support and setting AHCI bios actif in native mode.

Now pc seems to run fine. Not a single freeze happened, A marvelous read write perfo of +- 93 MB/s for big files, 78 MB/s for a mix with smaller ones and still 76 MB/s for a big amount of small files.

I'm well running kernel 2.6.35-21 maverick, in the mean time i had to customize my kernel to add oss support and average kernel compile time is between 45 and 60 min. Wich seems to be very fast.

So One very important question is it possible that this was just the cause ?

If so , Would be nice that some wiki is made about this problem (and my own stupidity with bios upgrade) so that other users having same issues wil not loose to much time and disturb the ext4 maintainers developpers with bugs wich at the end are not really caused by ext4 itself.

Off course mark this bug as solved and closed as well

Thank's

gr christophe

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.