linux freezes completely since 2.6.28-7

Bug #338645 reported by Martin
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned
Nominated for Jaunty by Aurius Bendikas Chang
xorg (Ubuntu)
Invalid
Undecided
Unassigned
Nominated for Jaunty by Aurius Bendikas Chang

Bug Description

Binary package hint: linux-image

Since linux-image-2.6.28-7-generic my system often completely freezes:
- screen is completely frozen, even the mouse
- audio playback stops
- keyboard is unresponsive (numlock don't change the LED state)
- sysrq doesn't work

In /etc/sysctl.conf I have:
kernel.panic_on_oops = 1
kernel.panic = 10

but after the freeze the computer never reboots. I see nothing special in the dmesg nor other logs.
The freeze occurs when computer is under heavy load, mostly CPU load (it is happening almost exclusively when I run Eclipse and it starts resolving Ivy dependencies between more than 100 projects; the same if I am resolving dependencies and compiling all projects with ant script; this takes cca 10 minutes, and sometimes it finishes without freezing the computer). I also tried cpustress utils and memtest86+, but no freeze occurred. I also watched CPU temperature while doing that critical operations and all was in normal. With the linux-image-2.6.28-6-generic (2.6.28-6.17) there are also no freezes.

My computer is IBM ThinkPad Z60m. I am running latest Kubuntu Jaunty (updated 2009-03-06), using opensource ATI drivers (occurs regardless of desktop effects on/off), ext4.

Please tell me what should I try to configure to get kernel dump after crash.

Attaching relevant files. Thanks in advance.

Revision history for this message
Martin (martin-zdila) wrote :

... please ignore the previously attached dmesg, I must reboot to the problematic kernel first. I will do it later and upload also other information.

Revision history for this message
Martin (martin-zdila) wrote :

I booted to the problematic kernel just a while ago. I deleted cca 4GB of JPEGs from my HDD. No other CPU/disk operations were running. After a second or so the computer froze. Attaching more or less relevant files after fresh reboot to the problematic kernel. I will also try to run some HDD/filesystem (ext4) stress program (if I find one).

Revision history for this message
Martin (martin-zdila) wrote :
Revision history for this message
Martin (martin-zdila) wrote :
Revision history for this message
Martin (martin-zdila) wrote :
Revision history for this message
Martin (martin-zdila) wrote :

Today tried linux-image-2.6.28-9-generic and my system froze again while starting up eclipse, doing apt-get update and in the moment of switching window to Akregator. CPU and HDD I/O was on full load.

In the different scenario, I booted my system with init=/bin/bash and ran: "stress --hdd 4 --timeout 120s". No other user tasks were running. First run of the stress was OK, but the second caused again system freeze. Could it be the problem of the ext4? I really can't tell and mus stay with linux-image-2.6.28-6-generic.

Is there anything else I could try to help you find the bug? Thanks in advance.

Revision history for this message
Martin (martin-zdila) wrote :

... and, after booting my system, with the forced filesystem check and started KDE, all my plasma settings are lost. The same is for akregator. I often loose settings after such freeze. It is very annoying.

Revision history for this message
Martin (martin-zdila) wrote :

Today I booted to the latest kernel by accident. I worked for couple of hours on taks not requiring high HDD load, but high CPU load. Later I ran eclipse and while building my projects (100+) the computer hard-froze again :'-(. BTW I am update my system cca twice a day cause I want to be on the bleeding edge ;-). Please help me to at least get some kernel messages after that freeze. Thanks in advance.

Revision history for this message
Martin (martin-zdila) wrote :

Same issue with linux-image-2.6.28-10-generic. Have you stopped support of Thinkpad Z60m?

Revision history for this message
Wesley Velroij (velroy1) wrote :

Same issue here, but it happens lately to wjen theres no high cpu load, I heard it could be ext4 problem, but i gues I should test that

Revision history for this message
Martin (martin-zdila) wrote :

Installed kernel 2.6.29 from
http://www.ramoonus.nl/2009/03/24/linux-kernel-2629-installation-guide-for-ubuntu-and-debian-linux/ and it seems to be stable so far, even after running some stress tests and the eclipse with my projects.

Revision history for this message
Andres Mujica (andres.mujica) wrote :

Martin, is possible for you to access from a different machine to your system via ssh when it freezes?

I wonder if the freeze is at all levels or only at your X session.

Maybe is something related between kernel and X.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
importance: High → Undecided
Revision history for this message
Andres Mujica (andres.mujica) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please attach your X server configuration file (/etc/X11/xorg.conf) and X server log file (/var/log/Xorg.0.log) to the bug report as individual uncompressed file attachments using the "Attachment:" box below. Could you please also try to run without any /etc/X11/xorg.conf and let Xorg autodetect your display and video card? When you do please also attach the /var/log/Xorg.0.log from this attempt. Thanks in advance.

Also,

What you have described is a generic freeze. It could be caused by any number of things, and you need to take some additional steps to provide a complete report. "randomly" is not specific enough to be able to analyze the problem.

When did you upgrade to Jaunty? When did you first notice the freezes occurring?

How frequently do the freezes occur? How many per day would you say you experience?

List the applications you typically have open at the time of the freeze.

Describe example activities you were performing for a few of the recent freezes. You mentioned it occurs when you run new applications, which applications were these? In what way were they "new"?

For more tips on troubleshooting freeze bugs, please refer to these links:

  https://wiki.ubuntu.com/X/Troubleshooting/Freeze
  https://wiki.ubuntu.com/X/Bugs/IntelDriver

Changed in xorg (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin (martin-zdila) wrote :

Thanks for the first interest about this bug. All the questions you are asking are already answered in my previous posts (comments), but I'll anyway answer all your questions here:

- I may try to access the computer after the freeze with ssh, but I am giving it almost no chance to succeed. When the freeze occurs, the mouse, and audio stops to respond immediately together with the HDD activity - all in the same millisecond. It is not just X.
- I don't think it is X server what is causing the freeze as it freezes even without the X server running (see comment 7)
- I upgraded to Jaunty when afaik Alpha 3 was released
- first notice of freeze is described in the bug description. I am doing regular updates on a daily basis and the problems started since the kernel version linux-image-2.6.28-7-generic. Therefore the bug was very probably introduced exactly with that minor version (7).
- about frequency: when I do nothing cpu/hdd consuming on the computer, it can run for a hours/days without a problem. Once I start to use 100% cpu/hdd, I estimate 50% chance of freeze during each minute (1min = 50%, 2min = 75% etc.).
- the list of applications has been also mentioned - lightest configuration was from booting directly to init=/bin/bash to fresh boot to KDE and starting Eclipse (described several times in my posts)
- example activities are running the "stress" program, or starting the Eclipse.

Thanks for the links, but from my conclusion described in the comments it is not related to the X. I strongly suggest to check the changes between linux-image-2.6.28-6-generic and linux-image-2.6.28-7-generic, mostly those related to I/O, EXT4, HDD...

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [Bug 338645] Re: linux freezes completely since 2.6.28-7

On Sat, Apr 11, 2009 at 08:29:47PM -0000, Martin wrote:
> Thanks for the links, but from my conclusion described in the comments
> it is not related to the X. I strongly suggest to check the changes
> between linux-image-2.6.28-6-generic and linux-image-2.6.28-7-generic,
> mostly those related to I/O, EXT4, HDD...

I would agree, from the answers you've provided it does not fit with the
profile of an X freeze, and it seems much more likely to be a kernel bug.

Changed in xorg (Ubuntu):
status: Incomplete → Invalid
tags: added: regression-potential
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Martin,

This sounds like it might be related to bug 330824 and bug 348836 in which one of our kernel devs has built a test kernel with a backported ext4 patch:

http://kernel.ubuntu.com/~rtg/2.6.28-lp348836

It would be great if you coul test and confirm if that test kernel resolves the freeze you are seeing here. Please let us know your results. Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin (martin-zdila) wrote :

I installed and booted linux-image-2.6.28-11-generic_2.6.28-11.42_i386.deb. Then I ran twice "stress -d 3 --timeout 120" simultaneously with building my more than 100 projects in the Eclispe. Everything went fine.

At the end of the day I suspended my notebook, resumed it next day and it froze during doing updatedb simultaneously with update in aptitude :-(.

Is there some way to get the oops message? I'll try some more stress runs with that kernel booted as init=/bin/bash and will report more results. Now I am running debian's linux-image-2.6.29-020629-generic without freezing problems.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Martin,

Try and see if any oops or kernel panic may have been logged in /var/log/kern.log . Additionally, the kernel team has started packaging upstream mainline kernel builds for testing. You may want to give the 2.6.29 one of those a try - https://wiki.ubuntu.com/KernelMainlineBuilds. It would be good to know if this exists in the upstream kernel as well. Thanks.

Revision history for this message
fixture (universald) wrote :

Same problem here. Thinkpad X61. Computer freezes during high disk activity. Though, it's probably not EXT4, because I had the same problem with a EXT3/Reiserfs setup. In fact, I did a clean ext4 install to test out my original theory of this being a reiserfs problem.

Sometimes you can recover from the freeze by waiting a bit and gently tap the power button, which can bring up the Gnome shutdown panel. Even after recovery, computer quickly becomes unusable and loaded with some kind of processing. If you check powertop, you would find the processor mysteriously at C0 80 or 90% of the time with no program attributable to the activity.

No weird messages or oops in kernel log. But I do have a theory about this being a problem with Ath5k.

Revision history for this message
fixture (universald) wrote :

I installed ssh-server to get to the bottom of this. When the computer froze, I ssh'ed into it. Surprise! It worked.

It turns out that this is not a kernel freeze. Mouse moves but Caps lock does not respond. In the ssh session, I went into top and htop. Nothing suspicious at all, in fact, there's no cpu usage at all! I do even know what to go on.

Then I theorized that this is a graphics hardware problem. I went to test this out by restarting GDM. This little bit of kernel log is what you get:

[ 723.628131] [drm:i915_gem_idle] *ERROR* hardware wedged
[ 723.640543] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[ 791.378469] [drm:i915_gem_entervt_ioctl] *ERROR* Reenabling wedged hardware, good luck
[ 835.163015] ata1.00: configured for UDMA/133
[ 835.163023] ata1: EH complete
[ 835.163166] sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors: (320 GB/298 GiB)
[ 835.163204] sd 0:0:0:0: [sda] Write Protect is off
[ 835.163210] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 835.163265] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Revision history for this message
Martin (martin-zdila) wrote :

fixture, you seem to have a different problem. mine freeze were caused even without using X. and when I was using X, then the mouse/sound stopped too (immediately).

BTW I am running latest jaunty 2.6.28-11-generic for couple of days and it seems to be stable for me. But I didn't run any stressful tests yet.

Revision history for this message
Martin (martin-zdila) wrote :

Frozen again. Just after starting Apache Felix OSGi framework caching 150 bundles. This operation is the only I know that makes the music playback to get choppy for couple of seconds even on the stable kernels.

Revision history for this message
fixture (universald) wrote :

point taken, still check this. I have to say, with this bug, Jaunty is not fit for release. I don't know how ubuntu managed to make the release progressively less stable. This thing worked better in Feb!

http://www.nabble.com/X-freezes-with--intel-driver-on-Jaunty-td22996863.html

Revision history for this message
fixture (universald) wrote :
Download full text (4.9 KiB)

xorg.log.old from time of freeze

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X(xorg_backtrace+0x26) [0x4f1b66]
1: /usr/bin/X(mieqEnqueue+0x359) [0x4d28a9]
2: /usr/bin/X(xf86PostKeyboardEvent+0x96) [0x495fa6]
3: /usr/lib/xorg/modules/input//evdev_drv.so [0x7fabb1b4dc35]
4: /usr/bin/X [0x485be5]
5: /usr/bin/X [0x476f77]
6: /lib/libpthread.so.0 [0x7fabc6ce8080]
7: /lib/libc.so.6(ioctl+0x7) [0x7fabc50f9cd7]
8: /usr/lib/libdrm_intel.so.1(drm_intel_gem_bo_start_gtt_access+0x4d) [0x7fabc30673bd]
9: /usr/lib/dri/i965_dri.so(intelFinish+0x41) [0x7fabb2458241]
10: /usr/lib/xorg/modules/extensions//libglx.so [0x7fabc3d66805]
11: /usr/lib/xorg/modules/extensions//libglx.so [0x7fabc3d659d2]
12: /usr/lib/xorg/modules/extensions//libglx.so [0x7fabc3d69de2]
13: /usr/bin/X(Dispatch+0x364) [0x44e304]
14: /usr/bin/X(main+0x3bd) [0x433d8d]
15: /lib/libc.so.6(__libc_start_main+0xe6) [0x7fabc503a5a6]
16: /usr/bin/X [0x433219]
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The ser...

Read more...

Revision history for this message
Aurius Bendikas Chang (aurius-bendikas) wrote :

I have exactly the same issue like Martin on my T42 laptop.

RAM: 2GB
VIDEO: RADEON 9600
DISK: 60GB LVM2-EXT4
KERNEL: 2.6.28-11-generic

Unfortunately I can not reproduce the freezes they are quite random. I get no response from machine. The only thing that blinks is bluetooth indicator when I move bluetooth mouse :). I am not using any fancy things. Turned off compiz, switched off bluetooth mouse. An I am also reposting this ticket the second time because it got frozen while eporting the first one :( It is impossible to reboot with Alt+PrtSc + (R S E I U B O). Only 4 sec on power button brings it down.

I have similar configuration on my desktop mashine (RADEON 9600, LVM2 + EXT4, 2.6.28-11-generic) and it works quite stable. That is why I decided to upgrade to my most important work horse. Now it is almost impossible to work.

I am a Java developer, and like Martin doing lots of Eclipse stuff :D But I can not even get to that, because I need to setup WebSphere Portal and this is failing for some other reasons I will not cover in this bug :(. Please help just make my laptop stable, I need it to work.

fixture: i would suggest you open another bug for your issue, because it is obvious that me and Martin are having something different than you.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Aurius Bendikas Chang (aurius-bendikas) wrote :

And I also forgot to mention that there is nothing in the logs and that I am obviously using open-source ATI driver :)

tags: added: jaunty
Revision history for this message
Aurius Bendikas Chang (aurius-bendikas) wrote :

Some more updates.

My colleague has the same T42. The also has jaunty with LVM2, but instead of EXT4 he has ReiserFS on root. We generate almost the same usage patterns on our laptops. He claims his system is quite stable.

However he has EXT4 on home. He admitted he had one or two freezes this month.

I am generating a lot of traffic on my root partition (setting up development environment). And I already had 5 freezes in 3 hours.

I would like to state that I am not aware of any data lost during those freezes, so it might be a duplicate for bug 330824.

Also I have installed Jaunty on 2009-04-26, thinking that EXT4 will improve the performance of my development environment :)

Revision history for this message
Steve Beattie (sbeattie) wrote :

Is everyone who is seeing this lockup using ext4? fixture reported that he wasn't, but his turned out to be an X freeze issue rather than the kernel.

tags: added: regression-release
removed: regression-potential
Revision history for this message
Aurius Bendikas Chang (aurius-bendikas) wrote :

I have tried http://kernel.ubuntu.com/~rtg/2.6.28-lp348836 kernel. It have NOT solved the problem. Also I am convinced that what I am having is a bug 330824, because freeze happens seconds after deleting a bunch of files. I can delete files, but if hang happens then it is always after deletion.

Revision history for this message
gpk (gpk-kochanski) wrote :

I am seeing freezes on a 9.04 system. It's updated on a daily basis, and
freezes have been occurring for a while. I initially went to 9.04 a day or
two after the Beta version came out and I did not see any freezes at
first. Some time later (before the final release), I started seeing freezes.

There is nothing in /var/log/syslog related to the freeze. The last entry
is often several minutes before (and uninteresting). Then, log entries
from the boot sequence appear.

Unlike fixture, I *cannot* ssh into the machine when it is frozen.

The machine is an old, generic P4 machine, 2.6 GHz. Memtest86 reports
no problems, and it runs Windows XP perfectly. (So, it's probably not
a hardware fault.) Graphics is a Nvidia card. I have ext4 filesystems
mounted, but only _readonly_. To the best of my knowledge they are
unused, anyway. The computer also has a mount of a remote disk
via NFS4.

I see no strong correlation with any particular program. However, freezes
are more likely when there is a lot of activity on the desktop. Interestingly,
though, I can run a rsync backup that creates large amounts of network,
disk, and CPU activity for 20 min without causing a freeze. (So, freezes
do not seem strongly correlated with CPU, disk, and network activity.)

FYI, I have an identically (well, as near as I can manage) configured
4-processor AMD phenom box, and it doesn't freeze.

Revision history for this message
Aurius Bendikas Chang (aurius-bendikas) wrote :

gpk:

I am still suspecting this might be ext4 related. Could you remove ext4 volumes from fstab (since they are unused) and work for a day stressing a system.

Having another system with Ext4 and being stable yet does not prove anything. You see I also have another desktop configured the same as my laptop, but it had never frozen. This is because I only use this machine to watch videos and listen to mp3 (kind of media-centre). I had no time to simulate the usage patterns I do with my laptop on my desktop. Not all of them will be possible due to 512 MB RAM on desktop. But once I will do it I will report the results.

I am avoiding removing large amount of files from Ext4, and managed to get stable system for days. Except once when removing virtualbox package with synaptic :)

There are some tasks like upgrading IBM WebSphere Portal will always freeze my system. So I am quite crippled with the amount of work I can do with my laptop.

Revision history for this message
Saivann Carignan (oxmosys) wrote :

That really looks like a duplicate of 330824. Martin has a ext4 filesystem, freeze happens randomly and 2.6.29 kernel fixes the issue (in the same way then 2.6.30 does not have this bug in karmic).

Martin : it would be interesting if you can delete huge amount of files and report if the bug can be reproduced while deleting files. If the bug can be reproduced in that way, can you verify if 2.6.29 or 2.6.30 fixes the bug for you too? Most of time, this was the easiest way to reproduce the bug which does not seem to happen on any hardware.

Revision history for this message
gpk (gpk-kochanski) wrote :

Freezes are much rarer (or perhaps gone) as of about 1 June. However, there is a chance it was disk-related (i.e. hardware, not software). Around 1 June, I had a hard disk failure, replaced a disk and restored from back-up. So, it is possible that the old disk was having sporadic errors that somehow caused the freeze.

I should point out that my ext4 file systems are only used for backup purposes, and therefore are only mounted for a short time every day. Freezes had occurred when the ext4 partitions were not mounted.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Setting to Incomplete for now until we hear back from Martin re https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/338645/comments/33 . And again, would be great to know if this remains with the latest mainline kernel build, currently 2.6.30-rc8 as of this posting. See https://wiki.ubuntu.com/KernelMainlineBuilds . Thanks.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
gpk (gpk-kochanski) wrote :

Ignore my comments on this bug. My PC developed intermitten memory errors so is not to be trusted.

tags: added: ext4
Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Jim Lieb (lieb) wrote :

 This bug report was marked as Incomplete a while ago and has not had any feedback to provide the requested information. As a result this bug is being closed. Please reopen if this is still an issue in the latest Karmic 9.10 Beta release http://cdimage.ubuntu.com/releases/karmic/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.