Bug #346691 “2.6.28-11 causes massive data corruption on 64 bit ...” : Bugs : linux package : Ubuntu

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-03-22:

#1

portatile_lshw.txt Edit (19.1 KiB, text/plain)

Revision history for this message

BoomSie (gideon-poort) wrote on 2009-03-23:

#2

lshw.txt Edit (19.6 KiB, text/plain)

Same issue over here. Saturday I decided to give Jaunty a try, after a collegue of mine warned for the issues with ext4.

I looked out that the dist-upgrade wouldn't touch the filesystem NOR update it. Nevertheless:

* First boot OK, few time crashing nautilus & applets in gnome though
* Reboot and everything was f*cked
* sbin/init* was gone
* fsck'd both home&root partition to recover
* yesterday night it was up and running again, so I figured I could work 'normally' today on my laptop again
* this morning I boot, login into gnome -> CRASH. Apparently some configuration issues, figured that my root filesystem was mounted read only. So again, fsck and a shitload of files in Lost+Found now

I have a Compal as well btw. A JFL92 (see attachment)

Hope you guys figure out whats going on before the official release next month.

Cheers & keep up the good work

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-03-24:

#3

Something to do with this?

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-jaunty.git;a=commit;h=b29e79bf557ce777878518da154f4a0becb1de0e

Leann Ogasawara (leannogasawara) on 2009-03-24

Changed in linux (Ubuntu):
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

BoomSie (gideon-poort) wrote on 2009-03-24:

#4

Yes Graziano, you hit the nail on the spot I'm afraid.

I'm not really familiar how it works with those patches or when to expect this update/grade available in the Alpha/Beta repo's, so I'll watch from a distance for the coming week(s).

(Unless someone can guarantee me, the fix is already there, then I'm MORE then happy to do a clean alpha 6 install to fiddle around some more with this stunning new release)

~..~
(oo) <<< MOOh

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-03-25:

#5

Well, after some days of work on the machine, I can say that at least jaunty alpha6 is stable (with all updates installed) IF keeping the 2.6.28-9 kernel running (I have modified the default target in grub menu.lst, just to be sure).
As of today, on the same system, starting with 2.6.28-11 leads to filesystem fault. And without having in hand a system rescue cd with filesystem repair tools (and being lucky on where the faults happened) the system gets unusable.
Note that on a desktop system at home (AsRock eSata motherboards, E6400 Core processor) the kernel 2.6.28-11 has not this problems (at least not so visible: tracker sometimes segfaults, but system does not crash and filesystem is preserved).
Seems something triggered by the hardware, but which is present latent in the codepath.

I am completely with BoomSie: the official release cannot live with this problems around.

Revision history for this message

BoomSie (gideon-poort) wrote on 2009-03-28:

#6

Bug remains in the Beta release of Jaunty too. How can one tell whether this bug is 'fixed' in a release or not? (without rereading the entire changelog)

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-04-03:

#7

The patch mentioned above is already in the jaunty Kernel.

Revision history for this message

god-mok (god-mok) wrote on 2009-04-03:

#8

I got the same problem on my clevio m57ru. As long as i boot to the older kernel 2.6.28-9 everything is fine.

An hour ago i tried the 28-11 kernel and it worked for a few minutes. But as i tried to search with synaptic, it closed. Under the terminal i got only a core dump. After that i coudn't do anything with apt-get, or some other programs. My wlan and any network connection was dead.

As i let it be and looked out for logs it seemed that my windows disappeared. The border were there but the content, menu and buttons were gone under nautilus. as i recovered everything i tried the same under kde. The same happened for the system, but there the content of the windows didn't disappeared.

The windows content didn't disappear every time i tried it. My system freeze often before that point.

After the freeze the system can't mount my user partition, or get any network to work. The log files won't show anything strange until i saw that the log were before the freeze. As i tried to run dpkg it told me, that the it could not write to /var/cach/apt folder. I looked there, and everything was fine.

As i started gdm it seemed that X couldn't initialize the xorg.conf. So i looked after that, but nothing was wrong with it. I tried to run apt-get update as a test and there it tells me, the system seems to be in a read only state.

No wonder that after the freeze all the logs were untouched.

After that i tried to boot under 28-9 but no luck. There the devil spread out his wings. Hope someone will look after that.

Oh yeah: after a fresh reinstall with the beta iso i got the same problem from the beginning without anything done, only looked out for files with gedit and nautilus. The more strange was, that is could not see any driver under hardware-driver but it worked on the usb-stick and the livecd.

And sorry for my bad english :)

Manoj Iyer (manjo) on 2009-04-03

summary:

- jaunty kernel 2.6.28-11 kernel destroy system
+ jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-04-03: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#9

The patch mentioned above was suspected by me to cause lot of trouble, but it NEEDS to be something related also to HW configuration.

Tried today again with 2.6.28-11 and got EXACTLY same behaviour as god-mok. I was really lucky to recover filesystem and get back here. AGAIN: No Problem with 2.6.28-9. System gets unusable (actually all filesystem can be lost, and if You call it unusable, I do prefer destroyed, as I had multiple times to recover from backups) using 2.6.28-11 for more than some minutes.

I repeat: Seems something related ALSO to hardware, as I have tried the same schema on multiple hardware (fresh install with 2.6.28-9, then update to 2.6.28-11) and I had problems ONLY on my notebook.

Please god-mok, would You please post the output of lshw, as I have not find complete specs of Your Clevio?
Looking at a summary hardware, it seems really similar to the COMPAL setup:

• 17" WSXGA Aktiv Matrix Glare TFT (1680*1050)
• inkl. 1.3 MPix Webcam
• nVIDIA GeForce 8800M GTX 512MB
• Core2Duo T9300 / 2,5GHz 6MB/800 MHz
• 4096 MB (2x2048) SO-DIMM DDR2 800MHz
• 200 GB / 7200 U/min S-ATA
• DVD±R/±RW DL (Dual Layer) 8x/8x Multinorm-Brenner
• Intel Wireless WiFi Link 4965AGN
• int. Bluetooth-Module

But I am really curious about chipset used. Let us sort this trouble out!

Manoj Iyer (manjo) on 2009-04-03

Changed in linux (Ubuntu):
assignee:	nobody → manjo

Revision history for this message

god-mok (god-mok) wrote on 2009-04-03:

#10

lshw.txt Edit (50.1 KiB, text/plain)

sorry, totaly forgotten. Here is my lshw file.

At the time i have no battery attached, but doesn't matter, 'cause it was the same.
And yeah, our hardware seems very simmilar.
Only thing is i do not have the Bluetooth-Module. Everything else is veri similar to my hardware.

Oh yeah, and another thing: I tried to recover files with testdisk and some other programs, but most of it failed. I found some Pics and some movie files (totaly splitted) and so it was totaly lost time.

But before that, ich checked the partitions like everyone. I even tried it with ext3 and ext4 with my home partition, but it was the same: many inodes issues (199), and after that nothing changed. Rerun the check, and it happend the same again, as it could not be changed (read only).
As i checked every time the disk, then once something happend: fsck tried to change the filesystem ext4 to ext2. I don't know why that happened, and it also didn't worked, but I tried :)

Maybe the last point doesn't matter, but for me it was very strange...

Oh, and under livecd i could not find the home partition with gparted, but the root partition was ok, and i could even mount it.

Well, i think even now why the system thinks it is on read only status, but i can manage, change and do everything with the files...

If you need any more specific hardware details, than ask. I will gladly look at my paper for it.

Revision history for this message

mirix (miromoman) wrote on 2009-04-06:

#11

Installing 9.04 beta (AMD) via the Live CD and then compiling the kernel to 2.6.29.1 (before doing any other update via Aptitude/Synaptic) renders a very stable and "performant" system.

Revision history for this message

god-mok (god-mok) wrote on 2009-04-06:

#12

@mirix: too bad, if someone knows what he have to do, than it's no problem, but i don't think thats the idea behind everything. As long as there is no support it's not such a good idea, right?

Today I had another crash. I reinstalled again, and after reboot the bug came at first boot, not as always at the second after the freeze. So like again, I mounted my home partition manually and everything worked so far. Updates and diver installed, added some repos to my source list and than there was the freeze again.

Thought it could be some new package problem, but it happened not, because i reinstalled again, and after the first save boot i didn't do anything. I let it run for almost 30 minutes, than i rebooted. Same problem, so i had to mount manually, but sometimes it didn't worked.
There was a notice after the failed boot, something like "two files share same sector/inode" in a folder "/home/god-mok/???/a9...x86_64..." Too bad I have no "???"-folder and i didn't memorized the whole numbers so thats all now. The numbers looked like a md5 hash until it reached x86_64. I have no idea where that came from after the fresh install with formated partitions.

Revision history for this message

mirix (miromoman) wrote on 2009-04-07:

#13

@god-mok:

when you install a beta version you take some risks. as far as I know, the problem only happens when you upgrade the kernel. so if you stick to the kernel provided by the installation CD (2.6.28-9) until the problem is solved, you should be safe. I guess this is a problem specific to the Ubuntu kernel 2.6.28-11.

I upgraded to the latest Linux kernel (2.6.29.1) because I wanted to. but I was not recommending or even suggesting to anybody to do so.

cheers

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-04-13:

#14

Just to bump. Problem is still here. 2.6.28-9 is stable, 2.6.28-11 is unusable on this hardware. As all hardware is currently working, I have no problem in living with -9, but am worried about the problem.

Today I just tried again with 2.6.28-11.41 just to have program segfaulting after a minute or two, and a filesystem repair process at startup. Now I am with ext4, cannot report anymore on jfs.

I have read all patches introduced in the -9 -> -11 jump, and I was unable to find something really interesting.
Jaunty will for sure ship with 2.6.28, and as we are 10 days now from release, the problem is better handled by some kernel people.

I decided to go unstable just to help, as I am a somewhat experienced user who knows how to recover from backups and knows how to live with system hiccups. If we are here to build a community system, someone has to test it, and discover bugs for not experienced users to avoid them. Ubuntu, was something about community, or am I missing something?

I am plenty of options to live bleeding edge, am here just to help the Ubuntu project. I won't stay with Jaunty, will jump on Karmik as soon as it starts its development phase: lot of bug reporters will be available for official release, and I am of much help ahead of them.

Revision history for this message

Steffen Rusitschka (rusi) wrote on 2009-04-15:

#15

I'm also a bit concerned about the RC/final of Jaunty... Anyway, here's a short summary of all lshw.txt attached to this bug an its duplicate:

Common to all machines:
- Intel Core 2 Duo
- 4 GB RAM
- 4965 AG(N) WiFi
- Intel 965 Memory Controller
- Nvidia Graphics Card: 8x00M

I'm not sure if everyone is running a 64-bit version of Jaunty.

But: those hardware combinations are far from being exotic - almost all new Laptops have a similar configuration...

Revision history for this message

KJ (cortexbuster) wrote on 2009-04-15:

#16

I have exactly the same HW and I'm running the 64 bit Version of Jaunty.
And I experience also the same problems you all have.
I can't run 2.6.28-11 without severe data loss.

Revision history for this message

Louis-Dominique Dubeau (ldd) wrote on 2009-04-16:

#17

I'm experiencing the same problem on a Compal IFL90 (aka Sager NP2090). Running Jaunty 64-bit.

No data loss on my side but I've experienced some random kernel and process crashes. (Emacs works fine but a few hours later all executions of emacs result in automatic segfaults!)

Downgrading to 2.6.28-9 fixed this issue but created other issues. I'm now running into this bug:

https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/330814

Fixing it requires a kernel upgrade!

Revision history for this message

KJ (cortexbuster) wrote on 2009-04-18:

#18

the trouble continues with linux-image-2.6.28-11-generic (2.6.28-11.42)
filesystem access fails after a few minutes.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-18:

#19

lshw.txt Edit (13.7 KiB, text/plain)

I am also getting the same things as people above. I am currently running off live cd and found this bug here, which hits it right on the mark for me. I have the exact system specs as the common list and only seemed to have the problem after kernel upgrade. Hope there is a fix for release.

Attached lshw just in case.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-18:

#20

I forgot to say on the above that I installed with the RC.

Revision history for this message

KJ (cortexbuster) wrote on 2009-04-18:

#21

since this is such a showstopper for everyone with this hardware config I'd really appreciate any dev comment.
is anybody working on the issue?
do you need more information?
what can be done to track down the error?
we're close to the official release. this could turn into a disaster for a lot of regular users upgrading to jaunty.
I'm concerned about the quality of ubuntu. it's a great os. I use it on a couple of servers as well das desktops / notebooks. but such a widespread error so close to a release can do the project real harm.
just my two cents.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-18:

#22

I just did a disk repair in order to boot up again. Because I'd rather not lose anything else, I'm going to use the ppa for .29 kernel until something better is shown here.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-18:

#23

I found out after rebooting to try and install the new kernel that my partition was too far gone to boot up anymore, so I had to reinstall. I also want to add that I found out that when I reinstalled Kubuntu RC for the fourth time that even if I don't upgrade to the latest kernel, it still messes things up. IDK what kernel is in the RC by default, but it seems I will for sure have to use a different kernel. Here goes a fifth time.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-18:

#24

Oooh, I found something else interesting out. It's too bad it's this close to release, but it seems the upgrade to the .29 kernel also fixed a bug I(and others) seemed to have about network manager not connecting to encrypted networks. So far, I am able to connect to my WEP wifi hotspot which I couldn't do on livecd or fresh install. I will try my university's WPA2 Enterprise connection next. I do wish this was the default kernel in Jaunty, because it'd make things work better and make life less difficult.

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-04-19:

#25

IIRC Jaunty will definitively ship with a 2.6.28 kernel.

https://lists.ubuntu.com/archives/kernel-team/2009-February/004321.html

Can confirm bug is not present using mainline 2.6.29.1, but this will be of some help with Karmic.
For people who do not know how to get mainline, here You can find "stock" linux kernels compiled for Ubuntu:

http://kernel.ubuntu.com/~kernel-ppa/mainline/

I do not recommend in any way to do this, but if You really have problems, this is the way to go. Remember that You will not find help on kernel related problems using mainline from Ubuntu developers!
If want to use Ubuntu kernel, keep reading this thread for possible solutions. As we are really close to release, I expect this bug to be tracked down by developers after April, 24.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-19:

#26

Just to comment on my previous comment, it seems WPA2E still doesn't work.

Revision history for this message

Kevin W. (eyecreate) wrote on 2009-04-21:

#27

I have gotten WAP2E to work by using wicd as my network manager. Sounds like I have another bug to search for.

Revision history for this message

Steffen Rusitschka (rusi) wrote on 2009-04-24:

#28

Did anyone try if the final version of Jaunty still has this issue?

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-04-24:

#29

AFAIK 24/04 is release date for jaunty and on my system the current ubuntu kernel version (not the one I am running with, but the one linked to the linux-image-generic) is 2.6.28-11.42. So the message from KJ above,

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/346691/comments/18

answers Your question. Last I personally tried is 11.41.

Revision history for this message

Ivo Smits (ivo-ufo-net) wrote on 2009-04-24:

#30

lshw.txt Edit (18.3 KiB, text/plain)

Have seen the same problem here on my HP Compaq 8510w. While downloading updates (using the update manager) the filesystem suddenly locked up (switched to read-only mode). At that moment I had one virtual machine (VirtualBox) running as well, but I do not think think that's related to the problem.

After trying ubuntu for the first time, a friend of mine told me about this problem (exactly the same hardware). I told him it couldn't be Ubuntu's fault and finally decided to try myself. My system worked fine for at least a week before it crashed, and fsck was able to repair the filesystem.

I have Ubuntu Jaunty amd64 running on a VMWare virtual machine and have not seen any problems there (yep.. the issue seems to be hardware-related...).

I'll attach my lshw result as well.

Revision history for this message

®om (rom1v) wrote on 2009-04-27:

#31

Maybe related : bug 350268

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2009-04-27:

#32

Seems like it's the same problem I have... http://bloc.eurion.net/archives/2009/one-week-with-debian/

Revision history for this message

KJ (cortexbuster) wrote on 2009-04-27:

#33

as far as I see yes.
either the current 2.6.29 mainline kernel or 2.6.28-9 can help you out.

Revision history for this message

fuchur (ckellner-gmx) wrote on 2009-04-27:

#34

Ok, as this seems the original bug report:

Until this is fixed, I am thinking about using a new kernel, but

a) which problems could this cause - there must be a reason that kernels are never replaced during a release cycle

b) how would I do this - apt-get typically hangs after a few packages are downloaded, and the file system quickly gets read-only.
could I install the new kernel to intrepid and then dist-upgrade to jaunty? how is the correct procedure

c)If this bug was known march 20, and confirmed some days before the release, why is there not even at least some remark in the release notes ?

Revision history for this message

giorgio130 (gm89) wrote on 2009-04-27:

#36

lshw.txt Edit (12.8 KiB, text/plain)

same problem here on a compal jhl90. I managed to install 2.6.29 and it seems to run fine now.

Revision history for this message

str0g (buskol-waw-pl) wrote on 2009-04-28:

#37

lshw.txt Edit (18.6 KiB, text/plain)

i have same bug, on my jhl90 but downgrading kernel doesnt work :/

Revision history for this message

adamski (adam-hasselbalch) wrote on 2009-04-28:

#38

I am amazed that this bug is not marked as Critical!

Obviously, this affects a great deal of users in a way that is extremely destructive.

In my opinion, this bug alone renders Jaunty completely unfit for a production environment! There's no getting around that data loss and file system corruption due to a kernel error is absolutely and 100% unacceptable in a so-called "stable" release. The fact that it apparently happens on very common hardware does not help.

Sorry for the harsh words, but this is simply Not Good Enough!

Revision history for this message

str0g (buskol-waw-pl) wrote on 2009-04-28:

#39

I tell you why it isn't critical, my desktop c2q,x38+ich9r,4gb,3 hdd, works great with it, my friend laptop with x2, and dekstop with x2 also works. Its realy hard to say what when wrong but obviusly developers should be test system on modern laptops, to avoid this kind of problems...

I've install kerenel Linux lukasz-laptop 2.6.29-02062901-generic #02062901 SMP Fri Apr 3 13:36:07 UTC 2009 x86_64 GNU/Linux

and i have some minnor issues like to days acpi update cannot be installed and there are some minnor errors with kerenel header installetion, but system is stable, and there are no data losses.

Revision history for this message

mirix (miromoman) wrote on 2009-04-28:

#40

I agree that it is not acceptable to release a so called "stable" version being aware of such a serious bug.

Fuchur:

I have compiled kernel 2.6.29.1 (2.6.29.2 is already available but I have not tested it) a few weeks ago and I have not had any issues since then.

You can download precompiled packages from the Ubuntu site (I do not have the URL) and install them with dpkg.

A few people describe easy ways to compile it from source:

http://izanbardprince.wordpress.com/2009/03/26/how-to-fix-ubuntu-jaunty-warning-hacks-ahead/

http://koroshiyaitchy.wordpress.com/2009/04/25/ubuntu-904-jaunty-jackalope-customised-for-performance-on-a-nexoc-osiris-e705iii-clevo-m57ru-laptop/

I followed these older instructions:

http://symbolik.wordpress.com/2007/11/10/vanilla-kernel-26231-on-gutsy-gibbon/

Just changing the obvious parts. I guess all three methods are actually the same. Just a few kernel configuration options change.

The only annoying and unresolved issue I have found this far is related to this:

http://ubuntuforums.org/showthread.php?p=3593262

I have followed the instructions on that how-to to no avail. I have also tried a Gentoo method with uvesafb with no better luck. However, regular Ubuntu installations also give similar problems if you install the proprietary NVIDIA or ATI drivers.

In fact, provided the big deal of manual configuration I have ended up carrying out, I am seriously considering moving back to good old Debian, which is far more stable, faster and less buggy than Ubuntu. Ubuntu is more modern, but less than, for instance, Fedora.

Revision history for this message

kikvors (kikvors) wrote on 2009-04-28:

#41

I can tell you that I lost two days work with this "bug". Next time I will not assume a final release is stable and wait before upgrading.

Revision history for this message

KJ (cortexbuster) wrote on 2009-04-28:

#42

fuchur:
actually I first installed 8.10 and performed a dist upgrade. after the first reboot I selected the old ubuntu 8.10 during the grub menu. once the system was up and running again I downloaded 2.6.28-9 debs from packages.ubuntu.com and installed it.

the strict ubuntu release cycle harms the renown of ubuntu. such bugs should simply delay a release. there are so many laptops out there which suffer from this bug.

Revision history for this message

®om (rom1v) wrote on 2009-04-29:

#43

I agree this bug should be critical.
It seems to affect only jaunty 64 bits, but it makes the system totally unusable.

Once resolved, a new .iso of Jaunty must be released (9.04.1), because it is not possible to install the current final release (which doesn't work) to apt-get upgrade (which segfault due to this kernel).

Jaunty alpha4 was more stable...

Revision history for this message

giorgio130 (gm89) wrote on 2009-04-29:

#44

@ ®om: let's hope it's solved before karmic....

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2009-04-29: Re: [Bug 346691] Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#45

You can workaround the apt-get segfault with "sudo rm
/var/cache/apt/*.bin" and then install a different kernel.

Revision history for this message

®om (rom1v) wrote on 2009-04-30: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#46

But apt-get is not the only bug : many files could be corrupted... cf my duplicate of this bug : bug 350268
How to be sure there is no problem after installing a new kernel which fixes the problem?

Revision history for this message

giorgio130 (gm89) wrote on 2009-04-30:

#47

After you've installed kernel 2.6.29 or 2.6.28-9, run a fsck from a live cd.

Revision history for this message

str0g (buskol-waw-pl) wrote on 2009-05-02:

#48

2.6.29-02062901-generic
fsck found no error, works good for me.

Revision history for this message

KJ (cortexbuster) wrote on 2009-05-02:

#49

mainline kernel 2.6.29 indeed does not contain the bug anymore. but this is quite inconvenient for the regular user since the modules for proprietary drivers (nvidia) are missing...

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2009-05-02: Re: [Bug 346691] Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#50

2009/5/2 KJ <email address hidden>:
> mainline kernel 2.6.29 indeed does not contain the bug anymore. but this
> is quite inconvenient for the regular user since the modules for
> proprietary drivers (nvidia) are missing...

They work perfectly here. If you install the new kernel using one of
the .deb's from kernel.ubuntu.com the nvidia drivers will
automatically get rebuild for it.

Revision history for this message

KJ (cortexbuster) wrote on 2009-05-04: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#51

Indeed, Nvidia works in the current 2.6.29 perfectly. But VMWare Server won't compile the modules needed, even though 2.6.29 headers are installed and even though the gcc version it requests is installed...

Revision history for this message

Martin Peterka (martin-peterka) wrote on 2009-05-05:

#52

I had same problem.
System was running fine, until I updated to jaunty (64bit). After second reboot auto fsck failed (on one of my ext3 partitions). So I`ve repaired it manually. Some data was lost, so I buy an external usb harddisk and start back-up. System freeze and didn`t boot anymore. I try repair it, without success. I reinstaled it (jaunty again, kernel 2.6.28-11), but backup (on external disk with vfat) was also corrupted.:(
dmesg was full of error messages about filesystem, a lot of programs (sudo eg.) stop working, filesystem was remounted readonly.
If I boot to jaunty live cd, output of fdisk -l /dev/sda was very strange. (Error messages about partitions which have begin and end in another partition. I havent this output - and if yes, it will be lost.)
Using fdisk I recreated partitions, install jaunty again. (+ ext3 was replaced by ext4.) The filesystem was full of errors again. I found this bug, so I`ve tried to install another kernel from kernel.ubuntu.org (2.6.30-020630rc4-generic). (It isnt easy if /var/lib/dpkg/available is also corrupted file.)

ok, in this post isn`t a lot of useful information - maybe just that one about fdisk output. I hope that helps.

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-05:

#53

On my T61p I got all sort of filesystem problems just like Martin Peterka (no io errors in dmesg). Applications were crashing with memory corruption error and so on so I decided to reinstall the system from scratch. Install went fine, but after the second reboot I got back initrd prompt saying that init could not be found although I was able to mount rootfs manually. I'll do a new reinstall and check when things get screwed. I'll let you know about the result.

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-05:

#54

Reinstall done:
- first boot ok
- install all upgrades
- reboot ok
- install nvidia 180 driver
- first reboot stuck before splash screen !?!
- second reboot successful
- apt-get install wireshark tshark vim-full compizconfig-settings-manager mc ----> complains about corrupted /var/lib/dpkg/status
- dmesg still doesn't show any io errors, but I'll check smart once I fix dpkg

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-05:

#55

Download full text (4.6 KiB)

The saga continues:
- cp /var/lib/dpkg/status-old /var/lib/dpkg/status
- apt-get -f install runs fine
root@kolibri:~# apt-get install wireshark tshark vim-full compizconfig-settings-manager mc
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  libadns1 liblua5.1-0 libruby1.8 menu python-compizconfig tcl8.4 vim-gnome
  vim-gui-common vim-runtime wireshark-common
Suggested packages:
  adns-tools arj xpdf dbview odt2txt tclreadline cscope vim-doc ttf-dejavu
The following NEW packages will be installed:
  compizconfig-settings-manager libadns1 liblua5.1-0 libruby1.8 mc menu
  python-compizconfig tcl8.4 tshark vim-full vim-gnome vim-gui-common
  vim-runtime wireshark wireshark-common
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B/26.2MB of archives.
After this operation, 104MB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Selecting previously deselected package libadns1.
(Reading database ... 101625 files and directories currently installed.)
Unpacking libadns1 (from .../libadns1_1.4-2_amd64.deb) ...
Selecting previously deselected package liblua5.1-0.
Unpacking liblua5.1-0 (from .../liblua5.1-0_5.1.4-2_amd64.deb) ...
Selecting previously deselected package libruby1.8.
Unpacking libruby1.8 (from .../libruby1.8_1.8.7.72-3_amd64.deb) ...
Selecting previously deselected package mc.
Unpacking mc (from .../mc_2%3a4.6.2~git20080311-4ubuntu1_amd64.deb) ...
Selecting previously deselected package menu.
Unpacking menu (from .../menu_2.1.41ubuntu1_amd64.deb) ...
Selecting previously deselected package tcl8.4.
Unpacking tcl8.4 (from .../tcl8.4_8.4.19-2_amd64.deb) ...
Selecting previously deselected package wireshark-common.
Unpacking wireshark-common (from .../wireshark-common_1.0.7-1ubuntu1_amd64.deb) ...
Selecting previously deselected package tshark.
Unpacking tshark (from .../tshark_1.0.7-1ubuntu1_amd64.deb) ...
Selecting previously deselected package vim-gui-common.
Unpacking vim-gui-common (from .../vim-gui-common_2%3a7.2.079-1ubuntu5_all.deb) ...
Selecting previously deselected package vim-runtime.
Unpacking vim-runtime (from .../vim-runtime_2%3a7.2.079-1ubuntu5_all.deb) ...
Adding `diversion of /usr/share/vim/vim72/doc/help.txt to /usr/share/vim/vim72/doc/help.txt.vim-tiny by vim-runtime'
Adding `diversion of /usr/share/vim/vim72/doc/tags to /usr/share/vim/vim72/doc/tags.vim-tiny by vim-runtime'
dpkg-deb: subprocess paste killed by signal (Broken pipe)
dpkg: error processing /var/cache/apt/archives/vim-runtime_2%3a7.2.079-1ubuntu5_all.deb (--unpack):
short read in buffer_copy (backend dpkg-deb during `./usr/share/vim/vim72/doc/tags')
Selecting previously deselected package wireshark.
Unpacking wireshark (from .../wireshark_1.0.7-1ubuntu1_amd64.deb) ...
Selecting previously deselected package python-compizconfig.
Unpacking python-compizconfig (from .../python-compizconfig_0.8.2-0ubuntu1_amd64.deb) ...
Selecting previously deselected package compizconfig-settings-manager.
Unpacking compizconfig-settings-manager (from .../compizconfig-settings-manager_0.8.2-0ubuntu1_all.deb) ...
Selecting previously de...

The saga continues:
- cp /var/lib/dpkg/status-old /var/lib/dpkg/status
- apt-get -f install runs fine
root@kolibri:~# apt-get install wireshark tshark vim-full compizconfig-settings-manager mc
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  libadns1 liblua5.1-0 libruby1.8 menu python-compizconfig tcl8.4 vim-gnome
  vim-gui-common vim-runtime wireshark-common
Suggested packages:
  adns-tools arj xpdf dbview odt2txt tclreadline cscope vim-doc ttf-dejavu
The following NEW packages will be installed:
  compizconfig-settings-manager libadns1 liblua5.1-0 libruby1.8 mc menu
  python-compizconfig tcl8.4 tshark vim-full vim-gnome vim-gui-common
  vim-runtime wireshark wireshark-common
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B/26.2MB of archives.
After this operation, 104MB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Selecting previously deselected package libadns1.
(Reading database ... 101625 files and directories currently installed.)
Unpacking libadns1 (from .../libadns1_1.4-2_amd64.deb) ...
Selecting previously deselected package liblua5.1-0.
Unpacking liblua5.1-0 (from .../liblua5.1-0_5.1.4-2_amd64.deb) ...
Selecting previously deselected package libruby1.8.
Unpacking libruby1.8 (from .../libruby1.8_1.8.7.72-3_amd64.deb) ...
Selecting previously deselected package mc.
Unpacking mc (from .../mc_2%3a4.6.2~git20080311-4ubuntu1_amd64.deb) ...
Selecting previously deselected package menu.
Unpacking menu (from .../menu_2.1.41ubuntu1_amd64.deb) ...
Selecting previously deselected package tcl8.4.
Unpacking tcl8.4 (from .../tcl8.4_8.4.19-2_amd64.deb) ...
Selecting previously deselected package wireshark-common.
Unpacking wireshark-common (from .../wireshark-common_1.0.7-1ubuntu1_amd64.deb) ...
Selecting previously deselected package tshark.
Unpacking tshark (from .../tshark_1.0.7-1ubuntu1_amd64.deb) ...
Selecting previously deselected package vim-gui-common.
Unpacking vim-gui-common (from .../vim-gui-common_2%3a7.2.079-1ubuntu5_all.deb) ...
Selecting previously deselected package vim-runtime.
Unpacking vim-runtime (from .../vim-runtime_2%3a7.2.079-1ubuntu5_all.deb) ...
Adding `diversion of /usr/share/vim/vim72/doc/help.txt to /usr/share/vim/vim72/doc/help.txt.vim-tiny by vim-runtime'
Adding `diversion of /usr/share/vim/vim72/doc/tags to /usr/share/vim/vim72/doc/tags.vim-tiny by vim-runtime'
dpkg-deb: subprocess paste killed by signal (Broken pipe)
dpkg: error processing /var/cache/apt/archives/vim-runtime_2%3a7.2.079-1ubuntu5_all.deb (--unpack):
 short read in buffer_copy (backend dpkg-deb during `./usr/share/vim/vim72/doc/tags')
Selecting previously deselected package wireshark.
Unpacking wireshark (from .../wireshark_1.0.7-1ubuntu1_amd64.deb) ...
Selecting previously deselected package python-compizconfig.
Unpacking python-compizconfig (from .../python-compizconfig_0.8.2-0ubuntu1_amd64.deb) ...
Selecting previously deselected package compizconfig-settings-manager.
Unpacking compizconfig-settings-manager (from .../compizconfig-settings-manager_0.8.2-0ubuntu1_all.deb) ...
Selecting previously deselected package vim-gnome.
Unpacking vim-gnome (from .../vim-gnome_2%3a7.2.079-1ubuntu5_amd64.deb) ...
Selecting previously deselected package vim-full.
Unpacking vim-full (from .../vim-full_2%3a7.2.079-1ubuntu5_all.deb) ...
Processing triggers for man-db ...
Processing triggers for doc-base ...
Processing 1 added doc-base file(s)...
Registering documents with scrollkeeper...
Errors were encountered while processing:
 /var/cache/apt/archives/vim-runtime_2%3a7.2.079-1ubuntu5_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
- rm /var/cache/apt/archives/vim-runtime_2%3a7.2.079-1ubuntu5_all.deb
- root@kolibri:~# apt-get -f install
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following extra packages will be installed:
  vim-runtime
The following NEW packages will be installed:
  vim-runtime
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
14 not fully installed or removed.
Need to get 5809kB of archives.
After this operation, 25.1MB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Get:1 http://hu.archive.ubuntu.com jaunty/main vim-runtime 2:7.2.079-1ubuntu5 [5809kB]
Fetched 5809kB in 3s (1726kB/s)       
Segmentation fault

The whole system is really unstable and as the problem seems to be on filesystem level I wouldn't use it in production until the problem is fixed. Strange enough I haven't seen this problem on 64 bit desktop systems but only on this 64bit laptop.

Revision history for this message

chriz (christian-seipel) wrote on 2009-05-05:

#56

I will note here that corrupted RAM modules also could result in corrupted files and file system errors. So in my case with Ubuntu 9.04 x64 and ext4. Replacing the corrupted modules stopped getting more and more file system errors. But nevertheless were some errors remaining what is related to this bug in my opinion.

Revision history for this message

Edmundo (eantoranz) wrote on 2009-05-05: Re: [Bug 346691] Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#57

Could a memtest run detect a problem with RAM then?

On Tue, May 5, 2009 at 12:24 PM, chriz <email address hidden> wrote:
> I will note here that corrupted RAM modules also could result in
> corrupted files and file system errors. So in my case with Ubuntu 9.04
> x64 and ext4. Replacing the corrupted modules stopped getting more and
> more file system errors. But nevertheless were some errors remaining
> what is related to this bug in my opinion.
>

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-05-05: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#58

Not a memory problem here. System is rock solid with either 2.6.28-9 kernel OR 2.6.29 mainline. Definitively NOT a memory problem: it is just unusable with 2.6.28-11 which is jaunty final kernel.
I am now on karmic devel, using 2.6.29-02062902-generic and have NO problems on same hardware.

Revision history for this message

chriz (christian-seipel) wrote on 2009-05-05: Re: [Bug 346691] Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#59

Yes. In my case has memtest reported errors in my RAM modules.

Edmundo wrote:
> Could a memtest run detect a problem with RAM then?
>
> On Tue, May 5, 2009 at 12:24 PM, chriz <email address hidden> wrote:
>
>> I will note here that corrupted RAM modules also could result in
>> corrupted files and file system errors. So in my case with Ubuntu 9.04
>> x64 and ext4. Replacing the corrupted modules stopped getting more and
>> more file system errors. But nevertheless were some errors remaining
>> what is related to this bug in my opinion.
>>
>>
>
>

Revision history for this message

god-mok (god-mok) wrote on 2009-05-05: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#60

In my case there was no problem with memtest. I even checked it half an hour without problems. Well, i have Jaunty on another notebook, and there it does the job right. After all i installed the 2.6.29 kernel. everything works fine. Too bad the 30rc4 won't install because it fails to build the nvidia parts. There is a trick to get over it, but... well, it is a little bit dirty ;)

Revision history for this message

Edmundo (eantoranz) wrote on 2009-05-05: Re: [Bug 346691] Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#61

I C.... well.... over here, it's a 64 bit laptop AMD based... I'm
using ext3 partitions with a .28 kernel. It's stable. As a matter of
fact, now that I think about it.... why am I associated in this bug?
:-D Anyway.

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-06: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#62

Hi, I ran memtest and it reported no error. The very same laptop is stable with no errors if booted from en external drive with intrepid.

Revision history for this message

Claudio Schnell da Silva (claudio-schnelldasilva) wrote on 2009-05-07:

#63

lshw.txt Edit (18.4 KiB, text/plain)

Good morning,

my FSC-notebook has the same problems described above. IT was upgraded to jaunty (stable) from Ibex. After 2 days it didn't boot anymore. A trial to save data only had partial success - lots of data are missing :-(

Did some reinstalls with several FS-types (reiser, ext4, ext3) with always the same effect - some reboots ok, suddenly at runtime a read-only FS. The system does not react anymore. Only a hard reset helps. After this the FS is broken.

As it is my production notebook I reinstalled it with Intrepid Ibex and everything is running fine :-(((

Attached is the output of lshw.

It looks strange to me that canonical does not fix this bug for so much time. I have been appreciating their work for years and many releases since 4.07 I think.

Best regards,

Claudio

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-05-07:

#64

Just for developers of karmic, bug is no more present in Ubuntu 2.6.30-2.3-generic.

Revision history for this message

Radoslav Georgiev (valsodarg) wrote on 2009-05-08:

#65

I was also a victim of this bug, in which I experienced the following:
1. My applications started receiving segfaults. I tried to strace them and found that the segfaults were mostly due to problems with files (corrupted files)
2. Then my administrative applications started segfaulting and after that my kernel wouldn't boot as a results in a heavily corrupted file system. I ran fsck.ext3 on my boot partition and all the files were corrupted.

To solve:
Reinstalled ubuntu from the live CD and updated the kernel to the 2.6.29-020629. So far I have had no problem.

I am a little surprised that my brother who has a 32 bit system does not have this problem with the 2.6.28-11 kernel and as it seams that only 64bit systems are affected. Can anyone with 32 bit system confirm this. I also believe that it has to do with how files are written/updated as many of the errors that fsck reported were:
1. Inode referenced in future data
2. Inode reference count too high
3. Unattached Inodes

Another note is that none of my boot files (as my grub is on its own partition) were corrupted.
System specs:
64 bit, Intel Core 2 duo, 4 GB ram, 320 GB WD hdd (2.5" inch)
Multiboot Setup

Revision history for this message

Radoslav Georgiev (valsodarg) wrote on 2009-05-08:

#66

Also I forgot to mention all partitions were ext3 formatted during installation.

Revision history for this message

Petter Eklund (denbevingadebaevern) wrote on 2009-05-10:

#67

Were there any changes in the ata_piix driver (the driver used by 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller which 5 of the people suffering from this bug have in their machines according to posted output from lshw) between 2.6.28.9 and 2.6.28.11?

Revision history for this message

Martin Peterka (martin-peterka) wrote on 2009-05-10:

#68

lshw output from affected system Edit (15.5 KiB, text/plain)

I can confirm that on affected system is "82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller".
On another computer without mentioned IDE Controller is 2.6.28-11 running fine.

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-05-10:

#69

I've got an ICH9M/M-E here, so it seems both 8 & 9 are affected.

Revision history for this message

str0g (buskol-waw-pl) wrote on 2009-05-10:

#70

wired if its controler error why, smart doesn't show any errors on ich9 ? :P

Revision history for this message

Petter Eklund (denbevingadebaevern) wrote on 2009-05-10:

#71

If it is an error in the drivers (e.g. software), why should it show up in a hardware test?
From what information I've got it looks like something was broken in the ata_piix kernel module between versions 2.6.28-9 and 2.6.28-11 and then subsequently fixed (it could also be an error somewhere else that triggers erroneous behaviour in the ata_piix driver).

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-05-10:

#72

Here is the whole changelog:
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_2.6.28-11.42/changelog

It seems there have been one or two changes to ata_piix. Maybe the problem could be related to some of the libata changes.

Revision history for this message

KJ (cortexbuster) wrote on 2009-05-11:

#73

My failing system also contains a 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller
Just for your information.

Revision history for this message

str0g (buskol-waw-pl) wrote on 2009-05-12:

#74

if its controller driver error , why on my foxconn x38 ich9r there are no errors?

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-12:

#75

Hi,

I have ICH8 as well:

00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller (rev 03)

Revision history for this message

giorgio130 (gm89) wrote on 2009-05-12:

#76

anyone has tested 2.6.28-12? it is in the "proposed" repository.

Revision history for this message

adamski (adam-hasselbalch) wrote on 2009-05-12:

#77

I have not tested 2.6.28-12, but the changelog does not mention this bug being closed.

Changelog is at https://launchpad.net/ubuntu/jaunty/+source/linux/2.6.28-12.43

Revision history for this message

PsYcHoK9 (psychok9) wrote on 2009-05-12:

#78

I've same problem with Ext4, Jaunty 9.04 x64 and kernel 2.6.28-11).
Sometime, without blackout or crash, the applications on Jaunty diseappear or become not visible.
Last time I've started GParted and don't found ntfsprogs but is installed!
Sometime diseappear the icon on gnome menu of some application...
Sometime diseappear a aMule configuration files...

Revision history for this message

josh04 (josh04) wrote on 2009-05-13:

#79

Confirming that I too have an ICH8M/M-E and had the problem. I upgraded to 2.6.29 and it's not gotten any worse, but I need to fsck.

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-14:

#80

Those who have upgraded to 2.6.29: did you use vanilla kernel, or you used some ubuntu repo

My only problem with the kernel upgrade is, that first you'd need to install ubuntu from the CD with the same buggy kernel, and only after that you can install the new kernel. This means that by the end of the installation you can't tell what got correctly written to HDD and what did not.

Revision history for this message

kikvors (kikvors) wrote on 2009-05-14:

#81

I used the ubuntu repo System was stable enough with buggy kernel to upgrade to the newer one. After that I ran fsck in case there were any errors. So far no problems with the newer kernel.

Revision history for this message

str0g (buskol-waw-pl) wrote on 2009-05-14:

#82

http://kernel.ubuntu.com/~kernel-ppa/mainline/

after instaling system just
sudo dpkg -i generic.deb all.deb image.deb
and there wont be any other errors trust me ;-)

Revision history for this message

PsYcHoK9 (psychok9) wrote on 2009-05-14:

#83

lshw.txt Edit (20.1 KiB, text/plain)

This is my lshw.

2.6.30rc5 (vanilla) don't seem to be affected.
I've downloaded it from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/

Revision history for this message

MiceX (rsmad) wrote on 2009-05-18:

#84

lshw.txt Edit (19.4 KiB, text/plain)

lshw of my system. Affected with x64 and not affected with x32.

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-05-19:

#85

This bug has been open for two months, and was fixed upstream some time ago, but not in Ubuntu.

As a consequence, 2.6.28-11-generic and 2.6.28-11-server - the production release media kernels deliver silent and show-stopping data corruption, to the extent that the kernel kills user processes accessing parts of the filesystem corrupted in a particular way. That's as "game over" as it can get, and damaging for Ubuntu's image.

On the upside, there are a bunch of related fixes upstream for if the Ubuntu kernel is rebased to a newer point release, and the patch posted two months ago [1], is also probably the fixer.

What else do we need?

[1] http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-jaunty.git;a=commit;h=b29e79bf557ce777878518da154f4a0becb1de0e

Revision history for this message

Johan Sköld (johan-skold) wrote on 2009-05-19: Re: [Bug 346691] Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#86

We need it to be made as an official upgrade, is what. For a user to
have to go through "hmm, all my programs just spontaneously crash", go
search for ubuntu bugs, and read through 40 comments, just to get a
fix is asking too much, especially considering a large group of users
barely know how to do anything other than open their email app and
surf the web. Seeing as this is such a wide-spread problem and a
kernel upgrade fixes it in what seems to be 100% of the cases, a
kernel upgrade should really be put into the live repository.

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-20: Re: jaunty kernel 2.6.28-11 kernel update renders the system un-usable.

#87

The official installation media has to be updated as well. I wouldn't trust a system which was installed with a kernel that includes such a problem. (I'm surprised no one faced any installation problems so far....)

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-05-20:

#88

As a workaround to avoid silent data corruption, boot with 'maxcpus=1' until the kernel is patched.

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-20:

#89

maxcpus=1 does not help.

I just reinstalled my laptop from minicd. I booted the installer with maxcpus=1 and double checked that even after the install the kernel would boot with that option. I installed a few packages, installed nvidia restricted driver. rebooted and /sbin/init was gone, so next bootup failed.

Revision history for this message

KJ (cortexbuster) wrote on 2009-05-21:

#90

2.6.28-12-generic #43-Ubuntu is out there, and maybe this one finally fixes the bug. I'm already testing it as I write these lines.

Revision history for this message

KJ (cortexbuster) wrote on 2009-05-21:

#91

I did some testing and to me it seems to be working. My partitions are not in ro mode and none of my apps are segfaulting.
If it continues to work without a hassle I think it's time for a 9.04.2!

Revision history for this message

KJ (cortexbuster) wrote on 2009-05-21:

#92

2.6.28-12-generic #43-Ubuntu did it again. but this time it took significantly longer...
the filesystem crashed and I had to recover it using live cd....
narf.
so no bug fix for me yet, except installing 2.6.29 kernel using ppa...

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-05-22:

#93

I think this bug deserves a "Critical" flag. It's not only causing instability, but it causes data loss too.

Claudio Moretti (flyingstar16) on 2009-05-22

summary:

- jaunty kernel 2.6.28-11 kernel update renders the system un-usable.
+ jaunty kernel 2.6.28-11 kernel update makes the system unusable.

Revision history for this message

kikvors (kikvors) wrote on 2009-05-22: Re: jaunty kernel 2.6.28-11 kernel update makes the system unusable.

#94

I am surprised this is not yet labeled as "Critical". I lost some of my data and belief in Ubuntu with it.

Revision history for this message

zielgruppe (pajoma-gmx) wrote on 2009-05-22:

#95

I didn't really lost my faith in Linux through this, but I have now around 6GB in my lost&found directory due to this bug. I was running kernel v2.6.30-rc4, which solved my issues. I just switched back for a few minutes to try out if my new Wacom tablet. The open source driver does apparently not support 2.6.30 yet. Well, these few minutes destroyed several days of work :(

Revision history for this message

sam tygier (samtygier) wrote on 2009-05-22:

#96

sounds reasonable for this to be labelled critical. its causing data loss for several people. any news on a fix Manoj?

Changed in linux (Ubuntu):
importance:	High → Critical

Mathieu Marquer (slasher-fun) on 2009-05-23

tags:	added: amd64
summary:	- jaunty kernel 2.6.28-11 kernel update makes the system unusable. + 2.6.28-11 causes data corruption with ICH8 on 64 bits installations

Revision history for this message

Dmitry Diskin (diskin) wrote on 2009-05-23: Re: 2.6.28-11 causes data corruption with ICH8 on 64 bits installations

#97

lshw output Edit (22.6 KiB, text/plain)

Wanted to add that I experienced the same bug on a laptop with ICH9 (?): "ICH9M/M-E 2 port SATA IDE Controller"
(full lshw output attached).

Revision history for this message

giorgio130 (gm89) wrote on 2009-05-23:

#98

I also have an ICH9. Current description is reductive.

Mathieu Marquer (slasher-fun) on 2009-05-23

summary:

- 2.6.28-11 causes data corruption with ICH8 on 64 bits installations
+ 2.6.28-11 causes massive data corruption with ICH8/ICH9 on 64 bits
+ installations

Revision history for this message

annunaki2k2 (russell-knighton) wrote on 2009-05-24: Re: 2.6.28-11 causes massive data corruption with ICH8/ICH9 on 64 bits installations

#99

Same problem on my brothers Samsung X22; lost 3 (re)installations now (including data!!). Had to resort to re-installing 8.10.

00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller (rev 03)

Waiting for a fix...

Revision history for this message

Hospik (jmhospers) wrote on 2009-05-25:

#100

I suffer from this problem on a HP/Compaq 8510w with ICH8. Instead of upgrading to a 2.6.29 kernel I'm currently running with 8.10 stable kernel (2.6.27-11) from the Intrepid repos. Which is fine for me. (Will not help you if you want to run ext4).

Still waiting for a kernel upgrade to solve the issue...

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-05-26:

#101

Can someone try a mainline kernel and report if it is still broken?

http://kernel.ubuntu.com/~kernel-ppa/mainline/

Revision history for this message

Dmitry Diskin (diskin) wrote on 2009-05-26:

#102

> Can someone try a mainline kernel and report if it is still broken?

It was already mentioned here, that 2.6.29 from mainline is fine. I'm running 2.6.29-02062902-generic successfully.

Revision history for this message

Steffen Rusitschka (rusi) wrote on 2009-05-26:

#103

I can also confirm that the mainline kernel 2.6.29 from the PPA works fine here.

Revision history for this message

Markus Bachmayr (markus-bachmayr) wrote on 2009-05-26:

#104

lshw.txt Edit (17.2 KiB, text/plain)

I observed a milder version of the symptoms with a Dell Latitude D830, which also has ICH8, using ext3.
The problem became noticeable only when fsck found about 30 errors (duplicated blocks, wrong blocksizes, inode inconsistencies..) that needed to be repaired in manual mode during the first routine check after upgrade to Jaunty. Apparently his concerned only a few files. Otherwise the system had seemed stable.

Revision history for this message

nbp (nobradpitt) wrote on 2009-05-29:

#105

I'm also confirming that 2.6.29-02062902-generic is running successfully (finally!).

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-05-30:

#106

I believe this is the patch fixing this issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7ce9d5d1f3c8736511daa413c64985a05b2feee3

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-05-30:

#107

Why? It's a patch for ext4, but this problem hasn't got anything to do with ext4 at all.

Revision history for this message

quixote (commer-greenglim) wrote on 2009-06-01:

#108

As one of the regular users who's been KO'ed by this bug (and who's pretty much frothing that this bug was *known* prior to the final release and nobody bothered to put out a huge general HEADS UP about it ... but that's another whole issue) --

could somebody post easy instructions on how to install the kernel that seems to solve the problem? At least then us poor schmoes could get some use out of our jaunty installs. I'll be glad to put the workaround in the "known jaunty bugs and workarounds" sticky on ubuntuforums.

For instance, would something along these lines be right:
[CODE]wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.29.3/
sudo dpkg -i linux-image-2.6.29-and-then-what-goes-here??.deb
sudo dpkg -i linux-headers-2.6.29-and-then-what-goes-here??.deb linux-headers-2.6.29-and-then-what-goes-here??_all.deb[/CODE]

(I gather the 29 kernel would be the preferred solution? Yes? No?)

Which commands -- or which further commands -- would one need?

I understand that this is not is not standard practice, yadda, yadda, yadda. Having constant filesystem corruption is worse, though, so please give us a fix!

Revision history for this message

quixote (commer-greenglim) wrote on 2009-06-01:

#109

Or at least a workaround!

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-06-01:

#110

I previously experienced massive ext4 inode bitmap corruption on an x86-64 Opteron w/ a CK804 chipset, while performing a large rsync, the ext4 corruption issue is *not specific* to ICH8/9.

A number of the reports (including duplicate) mention ext4 and the original report in this LP entry mentions JFS. Are we over-merging bug reports?

Also, it's crucial to understand if the ext4 corruption has been observed on i386 systems or not...please mention if you've seen ext4 corruption on i386.

@quixote workaround is:

$ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.29.4/linux-image-2.6.29-02062904-generic_2.6.29-02062904_amd64.deb
$ sudo dpkg -i linux-image-2.6.29-02062904-generic_2.6.29-02062904_amd64.deb
(or _i386.deb if on 32bit)

summary:

- 2.6.28-11 causes massive data corruption with ICH8/ICH9 on 64 bits
- installations
+ 2.6.28-11 causes massive data corruption on 64 bit installations

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-06-01:

#111

This bug also occurs with ext3 and other filesystems. If someone experiences ext4 filesystem corruption on i386, it is not related to this problem. Maybe some bug reports were over merged.

This bug is specifically about the corruption that occurs on mainly Compal notebooks with the AMD64 version of Ubuntu irregardless of filesystem.

Revision history for this message

quixote (commer-greenglim) wrote on 2009-06-01:

#112

@Daniel J. Blueman: Thanks! I'll try that out today.

My system specs, btw, are: Core 2 Duo P8400 2.26GHz, 4GB RAM, Intel GMA X4500 graphics, 64-bit Jaunty and Intrepid dual boot, and ext3 filesystem. The laptop is an "MSI 1223" (i.e. no-name brand, I guess.)

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-06-04:

#113

The filesystem corruption I was experiencing with the stock Jaunty kernel (2.6.28-11-server) on an x86-64 system was down to the nVidia CK804 PCIe chipset corrupting data on PCIe read completions from the SATA controller's DMA engine. I have observed this with a PCIe bus analyser.

On a system without the CK804 (or MCP55 or related) chipset, I find the stock Jaunty kernel solid.

Revision history for this message

Lesley Lutomski (lutomski) wrote on 2009-06-06:

#114

This is my first report, so I apologise if it's incorrect, and for my lack of technical knowledge.

I upgraded my desktop from Hardy to Jaunty, and after a week, the problems described above began - random system freezes, followed by X server error and running fsck, which reported various inode errors. I originally thought the problem was to do with the nVidia drivers, but I removed them completely several days ago and the freezes still occurred. Yesterday's crash has left me unable to boot up at all - error messages about missing files, then kernel panic. (I have more details, if needed.)

I am running 32bit Ubuntu, with ext3. I have a Core Duo 3.4GHz processor, nVidia GeForce 7900 GS graphics card and 2Gb RAM.

Revision history for this message

giorgio130 (gm89) wrote on 2009-06-06:

#115

@Lesley:
I think your report is important. Until now, it happened always on x86_64 installations... attach the output of your lshw.

Revision history for this message

Lesley Lutomski (lutomski) wrote on 2009-06-06:

#116

lshw_060609 Edit (17.7 KiB, text/plain)

Attached as requested. I'm going to have to do a complete reinstall (of 8.10) to get the system up and running again; is there anything else you need before I overwrite the disk?

Revision history for this message

giorgio130 (gm89) wrote on 2009-06-06:

#117

I think this is enough... Anyway, as stated before, if you want to keep using 9.04 you can always use a more recent kernel, or, as you said you came from hardy, the old one that shipped with it, which should be still selectable in grub.

Revision history for this message

Richard Huddleston (rhuddusa) wrote on 2009-06-09:

#118

lspci Edit (9.9 KiB, text/plain)

The issue is on 64bit kernels 2.6.29-02062904-generic AND kernel 2.6.28-11-server.

I believe I'm seeing the same issue on ICH10 AND SiI 3132 ... separate software raid (raid1 and raid5) on both controllers are suffering. Everything was OK until a couple of days ago, issue only came up when I started playing with larger files. Issue manifests itself on XFS, ext4, and directly on software raid device, and directly against disk.

I've run memtest 6+ passes with no errors, in addition to badblocks on each disk and the combined arrays. No SMART issues on the disks.

My test is basically creating files (11 gigs in size) with /dev/random and dd, and then reading back X bytes with md5sum.

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-06-09:

#119

To build a better understanding of the mechanism, it's worthwhile finding out:
- is there sufficient cooling for the southbridge and northbridge?
- are you running the latest BIOS?
- are you running the vendor's validated BIOS defaults?
- is the powersupply of reasonable quality/spec
-> for lower ripple and supply rails within tolerances
- the output from 'lspci'
- importantly: can the corruption be provoked in MS Windows?

I've experiences two cases where bad memory has been exposed though fast I/O (in a HPC environment), but memtest didn't detect issues, so there is still a small chance.

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-06-10:

#120

lspci.txt Edit (17.1 KiB, text/plain)

The first question above in my case is not applicable: my laptop was rock solid with Ubuntu 8.10 and is rock solid with development karmic. Cannot report about windows, and IMHO it shouldn' even be asked such a question. No settings have been modified/altered, and BIOS is vendor default.
Attached my lspci output.

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-06-10:

#121

lspci.txt Edit (35.1 KiB, text/plain)

ACH SO.... Use this, I was normal user.....

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-06-12:

#122

I thought a short howto would be good for some people, about how to get Jaunty running on the effected systems. This is for paranoid people like me, who don't trust a system installed with an installer running the buggy kernel or booted up with it once (and have no intention to create/wait for a new installer). Note: headers might not be needed on some system, but it can not hurt (and for nvidia driver it will be needed)

- save all your data if possible
- install 64 bit intrepid
- boot up system
- upgrade to latest packages
- upgrade to jaunty, but do not restart system after upgrade finished
- wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.29.4/linux-headers-2.6.29-02062904-generic_2.6.29-02062904_amd64.deb
- wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.29.4/linux-headers-2.6.29-02062904_2.6.29-02062904_all.deb
- wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.29.4/linux-image-2.6.29-02062904-generic_2.6.29-02062904_amd64.deb
- dpkg -i linux*2.6.29-02062904*deb
- reboot
- make sure to select above kernel (most probably it will be the default, but it worth to check)
- install nvidia driver if needed (it's ok if you installed it on intrepid, it will be "recompilled" when you install the new kernel automatically

This way your system should not get corrupted as you'll never run the buggy kernel. Of course this way you'll download most of the packages 3 times (intrepid version, intrepid update version, jaunty version). In case there's an updated install medium, please let us know, because that way all this stuff in unnecessary.

Have a nice day! Loci

Revision history for this message

Richard Huddleston (rhuddusa) wrote on 2009-06-13:

#123

Well, I'm still seeing my large file data corruption issue on mainline kernel 2.6.30-020630-generic

md1 : active raid5 sdc1[0] sdf1[3] sdd1[1] sdg1[4] sde1[2]
1953535744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md0 : active raid1 sda1[0] sdb1[1]
78123968 blocks [2/2] [UU]

md0 is 2 sata disks on intel ich10 chipset

md1 is 5 sata disks on sil Silicon Image, Inc. SiI 3132

pv -L 35M bigfile | md5sum
46a2c9e932c3feb32dc3edcd60a81d98 -
pv -L 35M bigfile | md5sum
24287e601aa23abb7b380c6577750bb0 -
pv -L 20M bigfile | md5sum
c65b3137a83c00d0bd20fd95c1ee2e88 -

I've gone in and checked the temperatures of all the components i can and they are well under max temps as stated in the system's tech specs. (using both in bios diagnostics and lm-sensors). the power supply is 300W and only consuming 100W ... the 5 sata disks are in an external chasis with separate power supply.

I've also rotated the disk, rebuilt the raid arrays, and tried downgrading the disks from 3.0 to 1.5 through jumper changes (and verified speed with dmesg). i've also removed a sata cd drive which was on ich10 controller.

i also thought this might have something to do with pci latency, as linux was setting everything to 64 latency (seen in dmesg), but my bios was set to 32 (don't know if that was actually an issue though). I've changed the bios values from 32 to 64 and all the way up to 248 (or something like that). still no change. i'm currently running at 64

i've also tried throttling down read speeds with pv -L rate.

interesting note ... i burned a new ubuntu install cd, and did a media test of it on the cd drive (before removed) and the test failed ... but performing on another box said the media was good. don't know if it was a bad cd drive or another manifestation of this issue.

my lspci is attached on a previous comment

i have not tested my system under 32 bit mode as the install media verification issue.

Revision history for this message

Petter Eklund (denbevingadebaevern) wrote on 2009-06-13:

#124

Richard Huddleston, do you have a "Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller" (or another controller that uses the ata_piix kernel module)? If not, this bug is probably not the cause of your problems.

Revision history for this message

Richard Huddleston (rhuddusa) wrote on 2009-06-13:

#125

I'm using the Intel board DQ45CB which has the Q45 Chipset with the 82801JO I/O Controller Hub, the information is available in my lspci from my first post. I don't know the differences between the 82801JO and the 82801HBM/HEM controllers.

Revision history for this message

Petter Eklund (denbevingadebaevern) wrote on 2009-06-13:

#126

It seem like you've got a SATA controller using the ata_piix kernel module which which I believe common to all people suffering from this bug. My personal experience (and other people are reporting similar observations in earlier commens to this bug) is that upgrading to a mainline kernel of later version than the one shipped with jaunty will fix this issue. I am however not using RAID which could explain the difference.

Revision history for this message

mirix (miromoman) wrote on 2009-06-14:

#127

I cannot believe this bug is still open.

To all the people doing hardware diagnostics: This is not a hardware issue. As far as I can tell, it affects exclusively to the Unbuntu kernel 2.6.28-11. Any other kernel/distro I have tried (older or newer) works perfectly. I am running 2.6.30 from kernel.org now. No problem.

Revision history for this message

David Birch (david-birch) wrote on 2009-06-15:

#128

This is the worst ubuntu bug i have ever struck - i just discovered my backup of my root part was no good due to being made from a usb boot of 9.04, and now my real root partition is gone too. this bites big time. I have requested some update to the release notes...

I feel pretty stuffed here now as to whether any of my data touched since upgrade is any good...

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-06-15:

#129

If this truly is a software bug, our best shot at addressing this bug is via bisection, from lack of specific knowledge.

There are enough reports the we consider the ubuntu kernel 2.6.28-9(.31?) good and we know (at least) ubuntu kernel 2.6.28-11.42 is bad. We can rebuild intermediate kernels from [http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-jaunty.git;a=tags].

First, it may be easier to see if we can bisect using the mainline kernels, since they are prebuilt at [http://kernel.ubuntu.com/~kernel-ppa/mainline/]. We get the ubuntu to mainline mapping from [http://kernel.ubuntu.com/~kernel-ppa/info/kernel-version-map.html].

-> ubuntu 2.6.28-9.31 is based on mainline 2.6.28.7 ("A")
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.28.7/linux-image-2.6.28-02062807-generic_2.6.28-02062807_amd64.deb

-> ubuntu 2.6.28-11.42 is based on mainline 2.6.28.9 ("Z")
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.28.9/linux-image-2.6.28-02062809-generic_2.6.28-02062809_amd64.deb

** Can at least two people who can reproduce this issue with 2.6.28-11.42, double-check they can/can't reproduce it with kernel Z please? ** We'll know if it's ubuntu patches/configuration, or upstream then.

Revision history for this message

quixote (commer-greenglim) wrote on 2009-06-15:

#130

fwiw, since I moved to kernel 2.6.29.02062904 (which is a bit later than either "A" or "Z"?) I haven't had any problems. No crashes, no instability, even after repeated suspends.

Revision history for this message

Richard Huddleston (rhuddusa) wrote on 2009-06-17:

#131

I saw my data corruption issues on kernels:
ubuntu kernel 2.6.28-11-server
mainline kernel 2.6.29-02062904-generic
mainline kernel 2.6.30-020630-generic

i've seen issue on both SATA and USB attached disks. also, doing MD5sum checks of the install cd fails on that machine ... but same disk does not fail on other machines.

unfortunately, in attempting to diagnose my problem, i upgraded to a new intel bios released on 6/10 ... and it bricked my board. All attempts at loading a recovery bios have failed. i'm working with intel to get a new board. perhaps i had a bad board all along ?

Revision history for this message

Bobby (robert-rankin-jr) wrote on 2009-06-18:

#132

Am installing Kernel "Z" right now, will report back in the morning after leaving it run all night. Also, if there's still any doubt, it is a 64bit only issue. I install 32bit ubuntu on the same computer and it works perfectly.

Revision history for this message

Bobby (robert-rankin-jr) wrote on 2009-06-18:

#133

02062809 "Z" seems to work for me. It's not usable in my case because the nVidia driver won't work, but that applies to any replacement kernels. Still, no filesystem errors here.

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2009-06-18: Re: [Bug 346691] Re: 2.6.28-11 causes massive data corruption on 64 bit installations

#134

2009/6/18 Bobby <email address hidden>:
> 02062809 "Z" seems to work for me. It's not usable in my case because
> the nVidia driver won't work, but that applies to any replacement
> kernels. Still, no filesystem errors here.

Not sure what that "kernel Z" is but I'm using one of the vainilla
kernels provided by the Kernel Team and it works great (including the
NVIDIA kernel which is rebuild through DKMS).

Revision history for this message

Bobby (robert-rankin-jr) wrote on 2009-06-18:

#135

It was just what Daniel asked to be tested in order to identify the problem. Never mind, though, installed the headers and everything works perfectly. No FS issues, and graphics are fine. Hope that helps.

Revision history for this message

Sebastian Kuzminsky (seb-highlab) wrote on 2009-06-18:

#136

lshw.txt Edit (17.0 KiB, text/plain)

This bug bit me on a Dell Latitude D830 laptop (64-bit Core2 Duo CPU, ICH8 SATA controller).

Laptop had been very stable while running Intrepid for months.

Installed the Jaunty CD (9.04, from April 2009), which has the now-infamous 2.6.28-11 kernel. Got lots of silent filesystem corruption within days. :-(

Upgraded to to linux-generic 2.6.28.13.17, problem appears to have gone away, though my disk is still corrupted :-(

I wish there was a note in the Jaunty release notes about this! And a new Jaunty install CD with the fixed kernel. I guess it's back to Intrepid and my most recent backup for me.

Revision history for this message

mecat (habdankm) wrote on 2009-06-18:

#137

notebook asus f6a
Linux 2.6.28-11
DISTRIB_DESCRIPTION="Ubuntu 9.04"
Intel(R) Core(TM)2 Duo CPU P8400
ICH9M/M-E 2 port SATA IDE Controller
i tried many configurations of filesystems.
i have lost whole lvm volumes (root xfs on logical volume). i tried to use ext4 on raw partition with same resul - all data was lost.
xfs on a raw partition was the only one filesystem configuration which was able to recovery past system crash. system usually hangs past 10 minutes.
now i am testing 2.6.28-02062809-generic

Revision history for this message

mecat (habdankm) wrote on 2009-06-18:

#138

after lastest upadates on
notebook asus f6a
Linux 2.6.27-14-generic x86_64
DISTRIB_DESCRIPTION="Ubuntu 8.10"
Intel(R) Core(TM)2 Duo CPU P8400
ICH9M/M-E 2 port SATA IDE Controller
i have same problem as above. Maybe this information can by useful - on both versions of ubuntu i have big problems on new kernel with acpi - (fn key to set brightnes do not work, i am unable to set brightnes via proc-fs: echo 81 > /proc/acpi/video/VGA/LCDD/brightness).
On older version of kernel 2.6.27-9 there was no problem with acpi/brightnes and file system corruption.

on ubuntu 9.04 with 2.6.28-02062809-generic kernel on x86_64 system still do not hang (2 hours later:-).

Revision history for this message

Tim McCormack (phyzome) wrote on 2009-06-19:

#139

The important question here is not "How do we fix this bug?" but "How do we prevent this sort of bug from occurring again?" Please take the time to read this excellent article about how software is written for NASA's shuttles, paying close attention to the part about "The Process": http://www.fastcompany.com/node/28121/print

The Ubuntu team's process failed to stop a major data corruption bug from being released, even though it was known about ahead of time. I'm not assigning blame to any one person; we are all affected by the process. So, what part of the process failed *us*? What needs to change about the way the way software is released in the Ubuntu project?

Revision history for this message

Lorant Nemeth (loci) wrote on 2009-06-19:

#140

Let's not go to deep in this NASA thingy... They are in a much easier situation from HW platform point of view. I doubt there are too many kind of space-shuttles they have to support :) On the other hand I agree that these kind of problems should be discovered earlier, maybe gathering and registering testers with different kind of HW would increase HW coverage and prevent such severe problems.

Revision history for this message

Bobby (robert-rankin-jr) wrote on 2009-06-19:

#141

While I agree that that discussion needs to take place, I really don't think that this is the place for that discussion. Maybe a forum topic would be more appropriate. Just my opinion.

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-06-19:

#142

It seems i have found a solution which should work for at least Compal notebooks. I found this based on information I found in one of the duplicate bugs. It seems at least the ICH9 controllers have two modes of operation. One is called IDE compatible or no-AHCI mode and is enabled by default. This uses the ata-piix driver. If you manage to switch the controller to AHCI mode it should use the ahci driver instead, which does not seem to have any problems.

You can check what mode your controller is in using lspci. It will show either IDE or AHCI depending on what mode is set. You can also check the dmesg output for ata-piix or ahci.

Switching the controller to AHCI mode can be quite tricky. I was able to do it using a Dos program from Compal's site. Here are the instructions:
1. Make sure syslinux is installed
2. Get https://haar.student.utwente.nl/~julius/ahci.dsk.gz
3. Gunzip the archive and put it in /boot
4. Add the following lines to /boot/grub/menu.lst:
title FreeDOS AHCI switch disk
root (hd0,1) #copy this from the other Ubuntu item
kernel /usr/lib/syslinux/memdisk #if your /boot is on a different partition, copy this file to /boot and put /boot/memdisk here
initrd /boot/ahci.dsk
5. Boot the new boot item from grub (just press enter on the time & date prompt, there is no config.sys)
6. Run ahci_en

This should enable AHCI mode. You can verify this by running lspci or checking the dmesg output. I built this bootdisk myself based on FreeDOS and the program from Compal's website. You can also disable AHCI again with this disk.
I was able to do a clean Jaunty AMD64 install, do an update and run quite a few applications after I had done this. fsck doesn't find any problems at all after this and there were no strange crashes anymore.

I suspect that mainly Compal notebooks come with this mode set to no-AHCI, which is why there are so many reports of this problem with Compal notebooks.
For others without Compal notebooks with this problem, verify your hardware is in no-AHCI mode and try to switch it to AHCI mode through a similar tool. My bootdisk probably won't work.

I understand that AHCI mode also gives better performance, so this should be an optimal solution. The problem with the ata-piix driver in this kernel should still be considered a bug though.

It seems i have found a solution which should work for at least Compal notebooks. I found this based on information I found in one of the duplicate bugs. It seems at least the ICH9 controllers have two modes of operation. One is called IDE compatible or no-AHCI mode and is enabled by default. This uses the ata-piix driver. If you manage to switch the controller to AHCI mode it should use the ahci driver instead, which does not seem to have any problems.

You can check what mode your controller is in using lspci. It will show either IDE or AHCI depending on what mode is set. You can also check the dmesg output for ata-piix or ahci.

Switching the controller to AHCI mode can be quite tricky. I was able to do it using a Dos program from Compal's site. Here are the instructions:
1. Make sure syslinux is installed
2. Get https://haar.student.utwente.nl/~julius/ahci.dsk.gz
3. Gunzip the archive and put it in /boot
4. Add the following lines to /boot/grub/menu.lst:
title           FreeDOS AHCI switch disk
root            (hd0,1) #copy this from the other Ubuntu item
kernel          /usr/lib/syslinux/memdisk #if your /boot is on a different partition, copy this file to /boot and put /boot/memdisk here
initrd          /boot/ahci.dsk
5. Boot the new boot item from grub (just press enter on the time & date prompt, there is no config.sys)
6. Run ahci_en

This should enable AHCI mode. You can verify this by running lspci or checking the dmesg output. I built this bootdisk myself based on FreeDOS and the program from Compal's website. You can also disable AHCI again with this disk.
I was able to do a clean Jaunty AMD64 install, do an update and run quite a few applications after I had done this. fsck doesn't find any problems at all after this and there were no strange crashes anymore.

I suspect that mainly Compal notebooks come with this mode set to no-AHCI, which is why there are so many reports of this problem with Compal notebooks.
For others without Compal notebooks with this problem, verify your hardware is in no-AHCI mode and try to switch it to AHCI mode through a similar tool. My bootdisk probably won't work.

I understand that AHCI mode also gives better performance, so this should be an optimal solution. The problem with the ata-piix driver in this kernel should still be considered a bug though.

Revision history for this message

giorgio130 (gm89) wrote on 2009-06-20:

#143

Anyone has tried the above solution? I'd try it but this is my only machine, I'd like to preserve it from such corruption... :) Moreover, is this supposed to work with a Compal jhl90?

Revision history for this message

mirix (miromoman) wrote on 2009-06-20:

#144

A new kernel, 2.26.28-13 is available from the update repositories. Now I guess that when someone installs Ubuntu from the CD image, the kernel will be updated from 2.26.28-9 (which is known to be free from this bug) to 2.26.28-13 (which I hope will be bug-free as well). So, practically speaking, the problem should be solved, even if the bug remains.

The problem is that this bug was first reported when Jaunty was in alpha stage, then in beta, then RC and the with the stable version and it took months for the Ubuntu developers to release a new kernel. The situation is particularly absurd because the problematic kernel was not the one released with the iso image, but an update.

I think that the huge delay that the Ubuntu community needed to react to such a critical bug, should make us reconsider the efficiency of the current structures and pipelines.

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-06-20:

#145

2.6.28-13 is not likely to be bugfree. I checked the changelog and there is no change since the 2.6.28-11 kernel which seems to be related to this problem.

giorgio130, I suspect it will work on your system, I have a jhl91 myself which appears to be very similar. In any case I would suggest you to enable AHCI mode, because it should work better in any case.

Revision history for this message

giorgio130 (gm89) wrote on 2009-06-20:

#146

@Julius:
Well, it worked and even quite painlessly. I'll try the buggy kernel as soon as I've the time to make a decent backup. However, I don't think I'll switch back to 2.6.28.

Revision history for this message

quixote (commer-greenglim) wrote on 2009-06-20:

#147

This is just to second (and third and fourth and fifteenth!) the comments by mirix and Tim McCormack above.

It is essential that Ubuntu's processes for software release are able to prevent such appalling disasters in the future. It is a measure of how much goodwill Ubuntu has in the community that nobody took this story and ran with it. That goodwill won't last if this sort of thing happens.

I'm not saying the devs have to be perfect and never make a mistake.

What I'm saying is that when a potentially data-destroying bug is KNOWN to be present before the release, there need to be methods in place to prevent the problem.

Revision history for this message

mecat (habdankm) wrote on 2009-06-21:

#148

on ubuntu 9.04 with 2.6.28-02062809-generic kernel on x86_64 i do not have any problems with fs. I am using:
product: ICH9M/M-E 2 port SATA IDE Controller
configuration: driver=ata_piix latency=0
Is it possible to switch IDE mode to AHCI without reinstalling a system (i tried with windows and i wasn't able to boot it any more, even past switching to piix again)?
On this kernel, there are still a lot of bugs with ACPI (ex. brightnes/buttons/hdd suspending/power management) with wasn't available on older kernels.

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-06-21:

#149

Switching shouldn't bring any problems to installed systems. The kernel will automaticly use the correct driver. At least it worked fine for me (also with my already installed system) :)

Revision history for this message

mecat (habdankm) wrote on 2009-06-21:

#150

You were right - it works in this mode. I will try new kernel.

Revision history for this message

mecat (habdankm) wrote on 2009-06-22:

#151

Linux inf16 2.6.28-13-generic #44-Ubuntu SMP Tue Jun 2 07:55:09 UTC 2009 x86_64 GNU/Linux
In AHCI mode system works correctly.

Revision history for this message

Richard Hansen (rhansen) wrote on 2009-06-25:

#152

2.6.28-13.44 (amd64, core i7, AHCI mode) does NOT work for me -- I also experienced filesystem corruption.

Theodore Ts'o (lead ext4 developer) has some patches in the 'for-stable-2.6.28' branch of his git repository (see http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=shortlog;h=for-stable-2.6.28) that are NOT integrated into 2.6.28-13.44. One of these is a fix for a filesystem corruption bug (see http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commit;h=16cb5dd9f53e569130584696909d423b6fe38c1e) which may or may not be related to what everyone is reporting.

According to the commit message, this bug has been in ext4 for a very long time, but people seem to only be experiencing this bug in 2.6.28-11 and 2.6.28-13. Also, this bug is not specific to 64-bit systems yet people seem to only be having problems on 64-bit platforms. However, because it's a concurrency bug, it could be that the bug is only tickled under certain circumstances and some change between 2.6.28-9 and 2.6.28-11 is causing the bug to be tickled much more regularly on 64-bit systems.

Thus, I'm hopeful that this patch will fix the problem for everyone, but I'm not confident. This patch is scheduled to be included in the next Jaunty kernel release (see bug #389555).

In the meantime, I added all of tytso's patches to 13.44 and uploaded the resulting package to my PPA. I basically took the difference between git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git ('for-stable-2.6.28' branch) and git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.28.y.git ('master' branch) and applied it to git://kernel.ubuntu.com/ubuntu/ubuntu-jaunty.git ('Ubuntu-2.6.28-13.44' tag). If you're feeling adventurous and want to try it out, go to https://launchpad.net/~a7x/+archive/ext4fixes and follow the install instructions.

Revision history for this message

giorgio130 (gm89) wrote on 2009-06-25:

#153

@a7x: this is not related only to ext4. Corruption comes with ext3, reiserfs as well.

Revision history for this message

feanorknd (regino-m) wrote on 2009-06-26:

#154

@giorgio130: Ok.. there is ext3, reiserfs, etc, problems.... but for now, there are fixes/patches created by lead developer of ext4 due to tested data corruption problems, available to fix current 2.6.28 kernel at jaunty, and directly applied to 2.6.29 and 2.6.30 kernels. So......... there is a ext4 related data corruption problem, and there are patches for it.. why not include them into a new 2.6.28 release????

Thanks.

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-06-26:

#155

@feanorknd: These patches may all be valid, but this is unrelated to this bug. This bug is about data corruption that occurs on ext3, reiserfs and other non-ext4 filesystems. Anyone with ext4 problems should not post here if they cannot reproduce this with ext3. Do a proper search for the appropriate bug or create a new bug.

Revision history for this message

Graziano (graziano-giuliani-gmail) wrote on 2009-06-26:

#156

What we know:

*) Bug it is NOT FS related: we have reports so far for jfs, xfs, reiserfs, ext3 and ext4
*) Apart from a dubitable case, is x86_64 related, but concurrency is not the issue (even going UP it persists).
*) Is FOR SURE hardware related, ata_piix driver being the most probable candidate to host the bug.
*) It is critical for a lot of people as can trigger complete loss of disk stored data

About the method:

-) I can be considered responsible for not pointing out at first the problem. I apologize. My faults are:
*) I nominated ext4 in my first post, in a moment where lots of bugs were being solved and reported on that filesystem, and this for sure triggered down the importance of the bug.
*) I entitled the bug at first "kernel 2.6.28-11 kernel destroy system", which was for sure an overshoot I apologize for, and which caused developers think I was a N00b playing with its new shining toy.
-) I nevertheless consider the forced inclusion of the ext4 filesystem in jaunty the head problem of this: it triggered lots of bugs in a critical timeframe (just weeks before the release) and patches to make it work in a non ready for it kernel possibly triggered bugs in other drivers considered stable.

just my 2€c

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-07-09:

#157

Is this stilll an issue with Jaunty? iirc this bug was opened early in the dev stage of jaunty, Jaunty proposed is at 2.6.28-14.46. Can someone verify that this is fixed in Jaunty so that I can close this bug as fix-released ?

Thanks a ton

Revision history for this message

Julius Schwartzenberg (jschwart) wrote on 2009-07-09:

#158

Probably the bug is still there as long as there are no ata-piix related changes in the changelog. Other than that it appears data corruption bugs and system stability aren't very high priority with the Ubuntu devs :S
Even with my controller in AHCI mode, I've experienced several different types of crashes & hangs with the AMD64 kernel (no data corruption anymore though!). I'll be trying out Karmic soon on my hardware, I guess we should hope Karmic won't be such a catastrophe as Jaunty.

Revision history for this message

Richard Hansen (rhansen) wrote on 2009-07-10:

#159

It may not be the ata_piix driver -- I set my controller to AHCI mode before I even installed Jaunty, yet I experienced filesystem corruption. My corruption could have been caused by an unrelated bug, however.

I haven't had any problems since I upgraded to a backported Karmic kernel. If anyone else wants to try the Karmic kernel, you can get it from my PPA at <https://launchpad.net/~a7x/+archive/kbp>. I will try to keep it up-to-date as newer versions of the Karmic kernel are released.

Revision history for this message

Hospik (jmhospers) wrote on 2009-07-10:

#160

I have not dared trying the 2.6.28-14 kernel with the ata_piix driver because I did not see any related changes in the changelog aswell. However, after switching my HP/Compaq 8510w (through the BIOS) to AHCI mode (or native mode as HP calls it) I did upgrade to 2.28-14 and have not noticed any crashes/corruption since!

Currently running an up-to-date Jaunty...

Revision history for this message

Dan Halbert (dhalbert) wrote on 2009-07-13:

#161

@Manoj (2009-07-09): I believe this bug is still present in 2.6.28-13.44, the latest released kernel. Is there any reason to expect it is fixed in 2.6.28-14.46 (in jaunty proposed)? Do you have a specific patch in mind?

A colleague had these symptoms on a Dell E6500 (has ICH10 and Nvidia graphics). The problem is not as pervasive as that of some of the original reporters, but there has been significant filesystem corruption every few days. We have updated his kernel to a 2.6.29 kernel just today, and hope to see an improvement.

Revision history for this message

Zakhar (alainb06) wrote on 2009-07-13:

#162

This bug is really a show-stopper.
I can't believe the Ubuntu team wants to close this report even if the bug is still there (see 5 posts above)

This has stopped me from upgrading to Jaunty, and I bet this bug will stay uncorrected up to Karmic, where it will (hopefully) disappear with the .30 kernel.
Ubuntu team, if people stopped complaining and reporting about this bug, it is simply because they lost hope that someone really knows what going on here, and will have time to fix it. I don't blame, I know such bugs are hard to fix... if you should just admit it, we will simply forget this doomed version of Ubuntu.
Isn't there a status : "we won't correct that" (although it is said to be critical!)

Such bugs really tears down the good image of Ubuntu... especially when you know the bug was ("supposedly") corrected upstream.
I hope you should reconsider delaying the next release if such critical bugs happen again... a good thing for that would be to release at the beginning of the month, it would give some time to handle for such things.

Revision history for this message

kikvors (kikvors) wrote on 2009-07-13:

#163

If you add up the amount of time lost by people having lost data due to file corruption with this bug, it will probably outweigh the effort needed to fix this.

Revision history for this message

Johan Sköld (johan-skold) wrote on 2009-07-22:

#164

Note that the bug has already been fixed in later kernel versions, it's just
the ISO that hasn't been updated. Even though it really should be updated,
the bug will not persist 'til Karmic.

Revision history for this message

quixote (commer-greenglim) wrote on 2009-07-22:

#165

Just two things for what it's worth:

Fact: I've been using the 2.6.29-04 kernel with up to date 64-bit Jaunty on ext3 for a couple of months now with no problems at all.

Opinion: I am still horrified, appalled, perplexed, and angry that there are neither any warnings on LiveCD iso downloads for 64-bit, nor an update for the kernel shipped with the iso. I'm starting to get the impression that whoever makes those decisions at ubuntu doesn't think it's very important if I lose data. I know I'm repeating myself, and it amazes me that in this community that should even be necessary, but that is Not Good.

Revision history for this message

mosgjig (mosgjig) wrote on 2009-07-23:

#166

I came across this issue and found that the solution proposed by Lorant Nemeth on comment #122 worked, though with a slight twist. I was unable to install a fresh copy of intrepid because the liveCD could not mount the swap (too lazy to investigate after dealing with this mess), therefore I just installed the liveCD Jaunty 64bit and followed the instructions. So far so good, installed this morning at work and been gradually re-installing all kinds of apps and goodies with frequent reboots.

My specs

Asus M51Sn
4GB Ram
Intel Core2 Duo T8300 @ 2.4GHz
GeForce 9500m GS

Following the instructions, went from Jaunty amd64 live cd with kernel 2.6.28-11-generic to .28-13-generic to .29-02062904-generic before rebooting from install.

If ya don't hear from me in a couple of days, then take it as A solution to this ridiculous bug that ate 5 hrs of me-life. But seriously, other than these minor glitches, good work on the dist, hope to find some time and contribute some code one of these days.

Godspeed!

Revision history for this message

Thomas Aaron (tom-system76) wrote on 2009-08-05:

#167

Could we please get an update on the prospects for fixing this bug?
It's been about two weeks since the above post.
Is it fixed in the *-14 kernel?

This thing is reaking havoc on a lot of our older systems, and possibly a couple of our newer ones. Not only is it destroying data, it's destroying profit.

If there is any information we can add to help, please let us know.

Best Regards,
Tom
System76
<email address hidden>

Revision history for this message

swordthower (mnrjj) wrote on 2009-08-17:

#168

I have successfully applied the fix in #122 as well. I have an ASUS N80Vb laptop. Everything seems to be working, and I have had no crashes or fs corruption after several reboots.

Fingers crossed...

Revision history for this message

mirix (miromoman) wrote on 2009-08-17:

#169

The bug is fixed in Karmic Koala alpha 4 (kernel 2.26.31 RC5) and the 2.6.30 familiy is also bug-free to this respect.

Paradoxically, Koala seems faster and less bloated than Jackalope ;-)

Revision history for this message

SecuGuru (christopherthe1) wrote on 2009-08-26:

#170

This bug still exists in the 2.6.28-14 amd64 kernel...interestingly, it didn't manifest in my system until I upgraded my RAM from 2GB to 4GB.

My system was down for mobo RMA (bad voltage reg) for the last 3 weeks. Was running fine since Jaunty first went live when I disassembled for RMA, root filesystem on ext3 partition.

The first indication of a problem came after I booted and let the update manager run...installation of the 2.6.28-15 kernel image keeps failing due to corrupted tarfile errors. Repeatedly tried to download it...some succeed but throw corruption error on unpacking, other attempts fail outright with 'package checksum mismatch.' I pulled it down manually via wget (~24MB), but the md5sum didn't match the published value for the package. Then I re-ran it and got a different value! And kept getting different values on subsequent md5sum runs.

My system is dual-boot XP, so I switch to windows and wget the .deb package again onto my NTFS partition. This time, md5sum returns the published value. Reboot using Ubuntu 8.10 (Intrepid, 32-bit) live CD and mount the NTFS partition read-only. I ran fsck on the ext3 partition, but it aborts as clean...so I force fsck and all checks OK. I run md5sum again on the package, it returns the expected value. I mount the ext3 partition and do a 'cp -av' to copy it to /var/cache/apt/archives for the update manager.

Here's where it gets fun. After the copy, I check the md5sum, and it's wrong. So, I check the md5sum on the original copy of the file on the read-only mounted NTFS partition...it's wrong too! WTF?

I reboot to WinXP and check the md5sum on the packages...the copy on the NTFS partition that was downloaded under XP returns the correct MD5 value, but the copy I made to the ext3 partition under the Intrepid Live CD is wrong. (I mount my ext3 partitions in WinXP using an ext2/3 volume manager) I delete the ext3 copy and use 'copy /b /v' to copy the package again from NTFS to ext3. This time, under WinXP the md5sum returns the correct value for both copies.

SUMMARY:
Problem doesn't seem to manifest until 4G RAM installed
Problem exists in both 32-bit and 64-bit kernels (errors under both Ubuntu 8.10 i386 Live CD and 9.04 AMD64 HDD installation currently on 2.6.28-14 kernel)
Problem DOES NOT manifest under WinXP boot, making it very unlikely hardware is the cause

SYSTEM SPECS:
Asus A8N-E mainboard (nForce4 Ultra)
Opteron 185 CPU (dual-core)
4GB Patriot DDR-400 (PC3200) SDRAM, CAS 2
Maxtor 1TB SATA-II HDD
**p1 = 250GB, NTFS
**p2 = 8GB, ext3
**p3 = 680GB, ext3
**p4 = 2GB, swap

This bug still exists in the 2.6.28-14 amd64 kernel...interestingly, it didn't manifest in my system until I upgraded my RAM from 2GB to 4GB.

My system was down for mobo RMA (bad voltage reg) for the last 3 weeks.  Was running fine since Jaunty first went live when I disassembled for RMA, root filesystem on ext3 partition.

The first indication of a problem came after I booted and let the update manager run...installation of the 2.6.28-15 kernel image keeps failing due to corrupted tarfile errors.  Repeatedly tried to download it...some succeed but throw corruption error on unpacking, other attempts fail outright with 'package checksum mismatch.'  I pulled it down manually via wget (~24MB), but the md5sum didn't match the published value for the package.  Then I re-ran it and got a different value!  And kept getting different values on subsequent md5sum runs.

My system is dual-boot XP, so I switch to windows and wget the .deb package again onto my NTFS partition.  This time, md5sum returns the published value.  Reboot using Ubuntu 8.10 (Intrepid, 32-bit) live CD and mount the NTFS partition read-only.  I ran fsck on the ext3 partition, but it aborts as clean...so I force fsck and all checks OK.  I run md5sum again on the package, it returns the expected value.  I mount the ext3 partition and do a 'cp -av' to copy it to /var/cache/apt/archives for the update manager.

Here's where it gets fun.  After the copy, I check the md5sum, and it's wrong.  So, I check the md5sum on the original copy of the file on the read-only mounted NTFS partition...it's wrong too!  WTF?

I reboot to WinXP and check the md5sum on the packages...the copy on the NTFS partition that was downloaded under XP returns the correct MD5 value, but the copy I made to the ext3 partition under the Intrepid Live CD is wrong.  (I mount my ext3 partitions in WinXP using an ext2/3 volume manager)  I delete the ext3 copy and use 'copy /b /v' to copy the package again from NTFS to ext3.  This time, under WinXP the md5sum returns the correct value for both copies.

SUMMARY:
Problem doesn't seem to manifest until 4G RAM installed
Problem exists in both 32-bit and 64-bit kernels (errors under both Ubuntu 8.10 i386 Live CD and 9.04 AMD64 HDD installation currently on 2.6.28-14 kernel)
Problem DOES NOT manifest under WinXP boot, making it very unlikely hardware is the cause

SYSTEM SPECS:
Asus A8N-E mainboard (nForce4 Ultra)
Opteron 185 CPU (dual-core)
4GB Patriot DDR-400 (PC3200) SDRAM, CAS 2
Maxtor 1TB SATA-II HDD
**p1 = 250GB, NTFS
**p2 = 8GB, ext3
**p3 = 680GB, ext3
**p4 = 2GB, swap

Revision history for this message

SecuGuru (christopherthe1) wrote on 2009-08-26:

#171

Addendum to previous comment's System Specs:

512MB nVidia 9800GT graphics card

Revision history for this message

SecuGuru (christopherthe1) wrote on 2009-08-26:

#172

Installed 2.6.29 kernel per workaround suggestion (http://ubuntuforums.org/showpost.php?p=7382178&postcount=29) to no avail.

Data corruption appears to manifest only in files 8MB or larger. Attempting to update package ia32-libs via update manager results in failed download (hash mismatch). Using wget to pull the package manually results in different MD5 sums each time.

Same file downloaded under WinXP checks out with the correct MD5 sum every time.

Revision history for this message

chastell (chastell) wrote on 2009-08-27:

#173

Thanks for the detailed testing, SecuGuru. Can you try with 2.6.30 mainline kernel?
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.30.5/

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2009-08-27:

#174

SecuGuru: can you try to reproduce the problem, booting separately with 'iommu=soft', 'iommu=off', 'mem=2G' please?

Each time, it's worthwhile catching the IOMMU settings with 'dmesg | grep -i iommu' after bootup.

Dr Emixam (dr.emixam) on 2009-10-12

Changed in linux (Ubuntu):
status:	Triaged → In Progress
status:	In Progress → Confirmed

Revision history for this message

Zakhar (alainb06) wrote on 2009-12-18:

#175

I withdraw from this list. As I forcasted 5 month ago (post 162) this bug is still uncorrected and now Karmic is out. So I'm not waiting anymore for a correction of this bug, and skip directly to Karmic 64 which is an awesome version.

Keep up the good job !..

Revision history for this message

Dmitry Diskin (diskin) wrote on 2009-12-18:

#176

So, none of the kernel updates of Jaunty did not fix it? Scary.. I moved to
mainline kernel, since I was not able to work on my new laptop because of
that bug. And I'm still on mainline, now it is 2.6.31-02063107-generic. I do
not see a way to test other kernels, because it would possibly trash my
system. If I only had a spare system on dual boot..

Revision history for this message

raketenman (sesselastronaut) wrote on 2010-01-15:

#177

i can confirm this bug with an 2.6.31-9-rt kernel
my dmesg:
[27216.779223] EXT4-fs error (device sda3): ext4_add_entry: bad entry in directory #859924: directory entry across blocks - offset=0, inode=3633236108, rec_len=180364, name_len=142
[27216.779231] Aborting journal on device sda3:8.
[27216.779448] EXT4-fs (sda3): Remounting filesystem read-only
[27216.780388] EXT4-fs error (device sda3) in ext4_delete_inode: Journal has aborted
[27216.780393] EXT4-fs error (device sda3) in ext4_create: IO failure

Revision history for this message

Daniel J Blueman (danielblueman) wrote on 2010-01-15:

#178

@raketeman, please post this, along with system details to <email address hidden>; here isn't going to help

Revision history for this message

raketenman (sesselastronaut) wrote on 2010-01-15:

#179

lshw.txt Edit (18.6 KiB, text/plain)

attached the lshw associated with the 2.6.31-9-rt kernel

Revision history for this message

raketenman (sesselastronaut) wrote on 2010-01-15:

#180

thanks Daniel for this hint - mail is on the road!

Revision history for this message

Marco Pallante (marco-pallante) wrote on 2010-02-12:

#181

lshw-marco.txt Edit (12.8 KiB, text/plain)

Hi everybody. I'm coming here after a lot of searches about fs corruptions in Ubuntu. Description from the original poster seems to apply very well to my situation.

"Suddenly", already running apps start to seg fault, while new started ones usually report some error with shared objects (missing, not loadable due to header problems... I can't recall the exact messages). I got no data loss, maybe because when these errors start to show I shut down the system as quickly as possible. Usually shutdowns fail and sometimes I'm able to see the EXT3-fs error messages (similar to those in #177).

These are the BAD NEWS. Ubuntu is 9.10 Karmic, and:
$ uname -a
Linux frank 2.6.31-20-generic #57-Ubuntu SMP Mon Feb 8 09:05:19 UTC 2010 i686 GNU/Linux

My system is a 32bit, 6 years old Acer laptop. I can't remember when the bug first showed up, but surely it was there with release 2.6.31-16 (or -15).

Now the GOOD NEWS (I hope). It seems I'm able to REPRODUCE IT!

Some weeks ago I tried to run AC3D 6.5, a (not free) 3d modeler, and after some minutes exploring it, closed it and got a strange error: "Unable to save configuration file in xxx" (or similar). Very peculiar, but the system seemed ok, so I forgot about it and went on; after a few minutes errors was so frequent that I had to shutdown the system and fsck the disk from a Live USB.

I tried that software other times always getting the same problem, even after some kernel upgrade. So I gave up with it and decided to give K-3D a try. I downloaded and started it. After the splash screen showed, the software had a segmentation fault AND the fs got corrupted once again. Again after some kernel upgrade, I repeated the test and got the same system failure. Then I tried Blender and... surprise, I experienced the bug once again. I tried another (let's say it) OpenGL (non free) software, which I started and operated successfully some month ago, and the bug was there.

From my little experience, I can conclude that any time I start some OpenGL 3D software, this bug shows up. Otherwise, I can keep my system up for days without any problem. Please note that I have Compiz disabled, because it has some glitches with Java Swing applications, and no screensaver running. Moreover, if I remember correctly (must check it), I had some mesa or radeon driver update in these months, between the last working execution of an OpenGL app and the first appearance of the bug.

I'm going to test with some other 3D app just to see whether that path goes anywhere.

I hope this report will help you.

Bye,
Marco