hardy random freeze

Bug #227882 reported by Joe
8
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

I posted this problem in bug number 204996, but my hardware differs from the original poster so I am posting independently.
I have a desktop computer with the following hardware:
Gigabyte 7ZXE Mobo
AMD Athlon XP 2000+
1024 Mb RAM
ATI Radeon 8500
SMC1255TX PCI Ethernet Adapter
WDC WD2500JB, ATA Hard drive
SONY DVD RW DRU-720A, ATAPI CD/DVD-ROM
SONY CD-RW CRX320E, ATAPI CD/DVD-ROM

This computer runs fine with Fiesty Fawn (Kubuntu 7.04). It is my personal web server so I leave it on for months at a time, only shutting it down to swap hardware. It will freeze within minutes when booted from the Ubuntu Hardy LiveCD or from a Hardy hard disk install. By "freeze" I mean that it is possible to move the mouse pointer but not to use the keyboard. CTRL-ALT-DEL and CTRL-ALT-Backspace do nothing.
There does not seem to be any pattern to the crashes. Once it will freeze while I'm browsing with Firefox. The next time while I'm trying to read log files. The next time while it's sitting there doing nothing.
I posted my dmesg, lspci and version here:
https://bugs.launchpad.net/ubuntu/+bug/204996/comments/110
I posted the logs from a crash here:
https://bugs.launchpad.net/ubuntu/+bug/204996/comments/118
Here are the events leading up to one crash (ten minutes of uptime):
https://bugs.launchpad.net/ubuntu/+bug/204996/comments/131
And here is the next boot (58 seconds of uptime!):
https://bugs.launchpad.net/ubuntu/+bug/204996/comments/132

Revision history for this message
Joe (fullmitten) wrote :

I installed Fedora 9 on this same machine (different 40Gb hard drive) and have had no problem with freezes in over an hour of uptime. This is much better than the ~10 minute uptime I was getting with Hardy.
I know it uses a newer kernel and patches (2.6.25-14.fc9.i686 vs. 2.6.24-4.6-generic for Ubuntu Hardy) so perhaps the fix is in there somewhere?

Revision history for this message
geofs (geof) wrote :

I also experience random freezes, mostly while using firefox 3b5 but also while playing videos (avi, DVD, flash...). when the freeze occurs, I got the same symptoms as the original post mentions. I noticed something interesting: whe the system freezes while using FF3B5, then little animated ("loading") icon in the upper right corner stops. But while i move the mouse pointer, this icon also moves again! Another thing: when I hit ctrl+alt+backspace, nothing happens as mentioned above. Bu tafter that, when I hit SYSRQ+S, X-window "initiates" its shutdown and all that remains is a black screen and the mouse cursor. Finally, a last observation, I use yakuake (a terminal that scrolls from the top of then screen when hitting a key, usually F12) and when I hit F12, the terminal scrolls down but I can't type anything in it. I'm using an up to date kubuntu 8.04, KDE 3.5.9 (whit some KDE4 packages).

Revision history for this message
Joe (fullmitten) wrote :

I saw that the original bug was fixed with the 2.6.25 kernel being prepared for Intrepid Ibex 8.10. I followed the instructions posted in this comment:
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/192
I've got over 6 hours of uptime in Hardy with the 2.6.25 kernel, so it appear that whatever the problem is, it is fixed in the newer kernel.
Fedora 9 is also using a 2.6.25 kernel and I'm not seeing the lockup with it either.

Revision history for this message
geofs (geof) wrote :

Thanks for pointing that. I'll try 2.6.25 ASAP.

Revision history for this message
Sergio Callegari (callegar) wrote :

Hi,

I am also experiencing random freezes with kernel 2.6.24-18 on a rather old

asus a7v333 MB
AMD athlon CPU
1 GB memory
Nvidia Geoforce 2 (MX400)
ATA HD (with LVM)
wired internet

i.e. an hardware set rather similar to the one above.

I must say that I am using
- nvidia-glx
- smart link modules (slusb)
- Virtualbox (sun edition)

that may all interact with the kernel.

In any case I would tend to exclude that the responsible for the freezes are wireless and sata as it is suggested, although they may help making the bug more evident on some hardware. I wonder if the problem is only affecting 32 bit users (this would explain why a large part of the kernel team and of server users have not noticed it).

I am currently trying the -rt kernel as suggested by some. I do not yet feel confident to say that it solves the problems. But for sure it does not allow the slmodem drivers to be compiled as modules.

The "intrepid" kernels pointed to by https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/204996/comments/192 do not seem to exist.

Revision history for this message
Sergio Callegari (callegar) wrote :

Apparently the -rt kernel eliminates the freeze for me. However, it is very bad in performance, with non-smooth scrolling, etc. and it does not allow some modules to compile (e.g. slmodem).

Please, someone be so kind as either:

backport 2.6.25 to hardy (in case 2.6.25 does not freeze). The intrepid kernel, advertised above is sadly unavailable. Or, even better
make the latest gutsy kernel (that is known to work just fine) available and maintained (from a security point of view) in the hardy repo.

In any case, please provide a working generic kernel asap for hardy, since it is now more than 1 month that the supposedly long term support distribution is showing a critical bug.

Revision history for this message
Sergio Callegari (callegar) wrote :

2.6.24-19 in the hardy proposed repo _does not_ fix the issue

Revision history for this message
Sergio Callegari (callegar) wrote :

2.6.24-19 as recently provided through the hardy updates _does not_ fix the issue. Using the less performing -rt kernel still remains the sole solution to the freezes and lockups.

Revision history for this message
Sergio Callegari (callegar) wrote :

In the end I resorted to experiment myself with vanilla 2.6.25.9.

Spent a night compiling it and the nvidia stuff.

It works without any problems and it is stable.

Hence:

- either 2.6.24 was a very poor choice for Hardy because the upstream 2.6.24 is the cause of the random hard lockups;
- or the upstream 2.6.24 is fine and the ubuntu patches break it.

In either case I believe that this situation is getting weird:

1) Everybody at this point knows that the hardy kernels cause random freezes to many users;
2) Everybody knows that these freezes are subtle as they may happen every few hours, so after having set up a system it is impossible to say for sure whether it is affected by the bug or not without many hours of testing
3) Everybody is just avoiding upgrades from gutsy to hardy just because of that. Note that even if the risk of being hit by the bug is low (say 1%) there is a multiplicative effect here - 50% of potential users will stay away from hardy just because of the risk, just as in insurances where if some accident statistically hits 1% of population, more than 50% decides to make an insurance.
4) The solution of the problem is known since vanilla 2.6.25 fixes the bug.
5) But Ubuntu in 2 months (33% of the release cycle time of hardy itself) has deliberately released no fix since
- it is forbidden by the policy to release 2.6.25 for hardy
- no one seems to know how to patch 2.6.24 to make it behave correctly like 2.6.25

So the policy seems to work very very bad here: backporting the fix to 2.6.24 seems much much more troublesome than releasing a 2.6.25.
Looks like the sense of "long term support" is that the policy will enable support to arrive only in the long term.

The only solution to this apparent deadlock is that some Ubuntu kernel developer feels like to autonomously deliver a 2.6.25 with ubuntu modules and restricted drivers via the ppa channel, so we can bypass the policy and have a fixed and reasonably maintained kernel for hardy.

Revision history for this message
Joe (fullmitten) wrote :

Intrepid Ibex Alpha 1 (http://www.ubuntu.com/testing/intrepid/alpha1) appears to work fine on my hardware (Ubuntu Server install).
12 hours of uptime so far with no crashes.

Revision history for this message
Sergio Callegari (callegar) wrote :

2.6.24.19.36 as recently pushed through the security channel has not fixed the freeze issue (and quite reasonably, since it was not its purpose I believe).

Ubuntu bug https://bugs.launchpad.net/ubuntu/+bug/1 possibly aggravating.

Specifically, solution step #3 in bug 1 gets severly hindered by the requirement to compile your own kernel to achieve stability on some PCs.

Revision history for this message
Joe (fullmitten) wrote :

Further testing with Intrepid gave me random locks, though it is much more stable than Hardy. Hardy will lock up unattended. Intrepid requires user activity before it will lock up, and even then it will run for an hour or so.
I tried swapping out my video card, ATI Radeon 8500, with an ATI Rage 128 (don't laugh, it's the only spare I've got) and got 80 minutes of uptime with Hardy and 2.5 hours in Intrepid with that combination of hardware.
Perhaps I'm seeing a problem a problem with the video driver, like described in Bug #141551. I still don't see any errors in the logs.
Any pointers as to how to narrow down the problem to the card, driver or kernel?

Revision history for this message
Joe (fullmitten) wrote :

Just in case the post above is not clear, I did not get any lockups using the ATI Rage 128 card. I used each distribution for the specified amount of time, then rebooted back to my stable Kubuntu 7.04 when I was finished.

Revision history for this message
Sergio Callegari (callegar) wrote :

I do not want to say it too loud (yet), but 2.6.24-20-generic from the proposed updates seems to finally fix the freeze for me!

I am sure that it was something subtle, so my compliments to who finally got it!!!

Thanks

Sergio

Revision history for this message
Joe (fullmitten) wrote :

I tested the 2.6.24-20-generic kernel and it does not solve my hang.
Interesting tidbit, I could not update the kernel using my ATI 8500 video card. Not enough uptime. Change to my Rage 128 and viola! No problem with hangs and I can update the kernel.

Revision history for this message
Joe (fullmitten) wrote :

If I use my ATI 8500 card and turn off DRI in xorg.conf, I don't have any problems with hangs (Intrepid uptime over 8 hours!), so my problem is probably similar to Bug #141551

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.