RAID array causing "BUG: soft lockup" errors/system freeze

Bug #312163 reported by Chris Osgood
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Andy Whitcroft
Intrepid
Fix Released
High
Andy Whitcroft

Bug Description

Running Ubuntu 8.10 Intrepid Desktop 64-bit with all current updates.
Kernel is 2.6.27-9-generic #1 SMP Thu Nov 20 22:15:32 UTC 2008 x86_64 GNU/Linux.
System is Abit IP35-Pro motherboard, Q9550 CPU, 8 GB RAM, 4 Western Digital WD6401AALS-00L3B2 drives (640 GB, SATA).
MemTest and drive tests returns no errors. Using a RAID5 array across the 4 drives with XFS and Ext3 as the file systems.

It seems when the system gets under heavy I/O load on the RAID array it will eventually freeze (requiring a hard reboot). The errors are inconsistent in how long they take to show up and the specific applications causing the I/O load don't seem to matter but the error always shows up eventually. It has already happened dozens of times after I start a load on the machine.

Examples of scenarios where the bug has appeared:
* Running Bonnie++ benchmarks. I have seen this kill the system within 30 seconds. Other times it causes no problems.

* Running a large SQlite import while simultaneously tar/gzip'ing a large directory structure.

* Running a large MySql import while tar/gzip'ing a large directory structure.

* Multiple tar/gzip processes running at the same time.

The specific process that gets the "CPU: soft lockup" error varies. I have seen it lockup in kswapd, pdflush, gzip, sqlite3, bonnie++ (see attached logs).

I have attached some of my stack traces. They repeat many times and I have only included the first distinct events before my system crashed.

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :
Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

I meant to note that the RAID5 array is a software RAID array created with:
mdadm --create /dev/md1 --chunk=256 --level=5 --raid-devices=4 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2

This same system was running XFS on a single drive without problems so it may be some sort of RAID/XFS interaction or bandwidth issue as the performance is much higher than previously.

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

After more testing I have modified the original description because originally I was thinking it just affected XFS but now I don't think that is the case. I just got another crash when using Ext3 so now I'm not sure where the problem might be. It still looks like it probably has something to do with the filesystem layers though.

In this latest case all I did was start a 'grep' on a file and the CPU went to 100% and the process froze completely (could not be killed even with kill -KILL). A few seconds later I started getting the "BUG: soft lockup" errors.

I have attached the latest error.

description: updated
Revision history for this message
bexamous (bexamous) wrote :

I've been seeing this same problem.

I have two arrays one raid0 other raid5 both with xfs on them. If I copy 10GB files between the two theres a 50% chance it'll pause and end with cpu soft lockup messages.

Unsure if this is related:
http://www.nabble.com/Re:-BUG:-soft-lockup---is-this-XFS-problem--td21148012.html

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

I built the current Jaunty kernel (2.6.28-4-generic) and tried that but it has the same problem.

It seems the error happens much more often when using XFS but I'm not sure if it's related to XFS or not. I have found that doing very long running large SqLite operations is the best way to recreate the error (disk I/O is very high). Sometimes it can take hours of hammering the disks before the error appears (race condition I'm guessing).

Often it hard-locks the OS and the RAID arrays have to be resynced (which takes several hours) after reset/reboot.

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

I'm still testing but I have been using the kernel from 8.04 Hardy 2.6.24-23-generic and problem has not appeared for several days now. I'm still running Intrepid but with the older kernel.

If this is true then this bug must have been introduced somewhere between 2.6.24 and 2.6.27 (and as I mentioned 2.6.28 is affected as well).

Revision history for this message
DaveAbrahams (boostpro) wrote :
Download full text (5.2 KiB)

I've been seeing the same problem from doing an iozone test on an mdraid 5 array. This keeps appearing in my dmesg:

[553927.303743] BUG: soft lockup - CPU#3 stuck for 61s! [iozone:24558]
[553927.303755] Modules linked in: wmi video output sbs sbshc pci_slot battery ac bonding iptable_filter ip_tables x_tables parport_pc lp parport loop evdev pcspkr snd_intel8x0 snd_ac97_codec shpchp pci_hotplug k8temp isp1760 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc cfi_cmdset_0002 cfi_util container jedec_probe cfi_probe gen_probe button ck804xrom mtd chipreg i2c_nforce2 map_funcs i2c_core ipv6 xfs sr_mod cdrom pata_amd sata_nv sd_mod crc_t10dif sg usb_storage libusual pata_acpi sata_sil24 ata_generic libata scsi_mod ehci_hcd forcedeth ohci_hcd dock usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
[553927.303755] CPU 3:
[553927.303755] Modules linked in: wmi video output sbs sbshc pci_slot battery ac bonding iptable_filter ip_tables x_tables parport_pc lp parport loop evdev pcspkr snd_intel8x0 snd_ac97_codec shpchp pci_hotplug k8temp isp1760 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc cfi_cmdset_0002 cfi_util container jedec_probe cfi_probe gen_probe button ck804xrom mtd chipreg i2c_nforce2 map_funcs i2c_core ipv6 xfs sr_mod cdrom pata_amd sata_nv sd_mod crc_t10dif sg usb_storage libusual pata_acpi sata_sil24 ata_generic libata scsi_mod ehci_hcd forcedeth ohci_hcd dock usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
[553927.303755] Pid: 24558, comm: iozone Not tainted 2.6.27-11-server #1
[553927.303755] RIP: 0010:[<ffffffff802abea6>] [<ffffffff802abea6>] find_get_pages+0x66/0x110
[553927.303755] RSP: 0018:ffff88015ccfd368 EFLAGS: 00000246
[553927.303755] RAX: ffff880048befce0 RBX: ffff88015ccfd3a8 RCX: 0000000000000003
[553927.303755] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffe20002055800
[553927.303755] RBP: ffff88015ccfd318 R08: ffffe20000bdafc8 R09: 0000000000000008
[553927.303755] R10: 000000000000003e R11: 000000000015dcbf R12: ffffffff802b6ba9
[553927.303755] R13: ffff8801e1450a10 R14: ffffe20000bdae40 R15: 0000000000000282
[553927.303755] FS: 00007f4b4d13c6e0(0000) GS:ffff88025fa2c700(0000) knlGS:0000000000000000
[553927.303755] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[553927.303755] CR2: 00007fa32cff2e40 CR3: 000000015f036000 CR4: 00000000000006e0
[553927.303755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[553927.303755] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[553927.303755]
[553927.303755] Call Trace:
[553927.303755] [<ffffffff802abe83>] ? find_get_pages+0x43/0x110
[553927.303755] [<ffffffff802b6a04>] ? pagevec_lookup+0x24/0x30
[553927.303755] [<ffffffffa026801d>] ? xfs_cluster_write+0xad/0x180 [xfs]
[553927.303755] [<ffffffffa0268588>] ? xfs_page_state_convert+0x498/0x760 [xfs]
[553927.303755] [<ffffffffa02689b1>] ? xfs_vm_writepage+0x71/0x120 [xfs]
[553927.303755] [<ffffffff802b92f4>] ? pageout+0x124/0x27...

Read more...

Revision history for this message
stephen mulcahy (stephen-mulcahy) wrote :
Download full text (9.3 KiB)

Hi,

I'm seeing a similar problem on my system with frequent soft lockup messages and machine check events logged.

Distributor ID: Ubuntu
Description: Ubuntu 8.10
Release: 8.10
Codename: intrepid

Errors with linux-image-2.6.27-11-server (64-bit). It's not clear if there are specific activities triggering the bug. Rolling back to Hardy kernel (linux-image-2.6.24-23-server) seems to have addressed the problem - no error messages in last 12 hours of heavy system activity.

System has 2 x Western Digital WDC WD10EADS-00L5B1 drives (1TB) in RAID1 config (for both root partition and swap partition).

smulcahy@dev:/var/log$ sudo mdadm --misc -D /dev/md1
/dev/md1:
        Version : 00.90
  Creation Time : Mon Feb 9 11:59:43 2009
     Raid Level : raid1
     Array Size : 974808064 (929.65 GiB 998.20 GB)
  Used Dev Size : 974808064 (929.65 GiB 998.20 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Tue Feb 24 12:18:54 2009
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : c8cf85b7:a96d5ddf:b32eea70:c5f0704e
         Events : 0.25

    Number Major Minor RaidDevice State
       0 8 2 0 active sync /dev/sda2
       1 8 18 1 active sync /dev/sdb2

smulcahy@dev:/var/log$ sudo mdadm --misc -D /dev/md2
/dev/md2:
        Version : 00.90
  Creation Time : Mon Feb 9 12:29:38 2009
     Raid Level : raid1
     Array Size : 1951744 (1906.32 MiB 1998.59 MB)
  Used Dev Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Tue Feb 24 12:08:02 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 8957636b:79f79a80:9069ef6b:43c30843
         Events : 0.8

    Number Major Minor RaidDevice State
       0 8 1 0 active sync /dev/sda1
       1 8 17 1 active sync /dev/sdb1

Log file excerpt,

Feb 20 10:07:20 dev kernel: [933002.012531] Machine check events logged
Feb 20 11:23:39 dev kernel: [937580.632507] BUG: soft lockup - CPU#1 stuck for 61s! [kswapd1:185]
Feb 20 11:23:39 dev kernel: [937580.632507] Modules linked in: af_packet ipv6 iptable_filter ip_tables x_tables parport_pc lp parport loop joydev evdev pcspkr container isp1760 amd_rng button i2c_amd756 k8temp shpchp i2c_amd8111 pci_hotplug i2c_core ext3 jbd mbcache sr_mod cdrom pata_acpi sd_mod crc_t10dif sg usbhid hid ata_generic tg3 libphy sata_sil ohci_hcd usbcore pata_amd libata scsi_mod dock raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Feb 20 11:23:39 dev kernel: [937580.632507] CPU 1:
Feb 20 11:23:39 dev kernel: [937580.632507] Modules linked in: af_packet ipv6 iptable_filter ip_tables x_tables parport_pc lp parport loop joydev evdev pcspkr container isp1760 amd_rng button i2c_amd756 k8temp shpchp i2c_amd8111 pci_hotplug i2c_core ext3 jbd mbcache sr_mod c...

Read more...

Revision history for this message
stephen mulcahy (stephen-mulcahy) wrote :

We're still seeing occasional machine check events but nothing logged in /var/log/mcelog

Feb 25 00:10:46 dev kernel: [110217.794081] Machine check events logged
Feb 25 01:13:16 dev kernel: [113958.998576] Machine check events logged

and no further soft lockups.

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

I have been running for over a month now on the older Hardy kernel (2.6.24-23-generic x86_64) and have not seen this issue at all. So it appears it's almost definitely something broken in the newer kernels. At this point it's unclear if it is an Ubuntu or Linux kernel problem.

With that said, I found this:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e8c82c2e23e3527e0c9dc195e432c16784d270fa

Has anyone tried this simple patch against the Ubuntu kernel?

See also:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=805
http://lkml.org/lkml/2009/1/5/307

Revision history for this message
dee (boardwize) wrote :

I've been having the same problem today after adding 2x1TB drives to my Ubuntu x64 system - configured as stripe using mdadm.
Random lockups when copying to and reading from raid. Occurs with xfs, reiser and ext3.
Will post logs next week.

Revision history for this message
dee (boardwize) wrote :

I've downgraded to the 2.6.24-16-generic kernel (hardy) and the problem seems to have been resolved. This issue seems to be an issue with the later kernels.

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

The patch that I mentioned above is in the 2.6.28-11 kernel used in Jaunty so it might work also. I'm not sure when I will get a chance to test it though.

affects: mdadm (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
tags: added: regression-release
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Chris, it would be great if you could let us know your results if you're either able to test the patch or a newer 2.6.28 based Jaunty kernel. Thanks in advance.

Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

I spent the last few days testing on Jaunty and it seems to be fixed there. Kernel is 2.6.28-11.42 x86_64.

I will have more confidence after a few weeks of testing but I think it looks good so far.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Are the following bugs about the same issue?

They all are about soft lockup using xfs on Intrepid:

268215
289158
312163
348218

Thanks.

Revision history for this message
Andy Whitcroft (apw) wrote :

It seems that the upstream fix mentioned above has made it into the Intrepid kernel and is currently in the kernels in Intrepid -proposed. If those who are affected by this could test the kernels from -proposed and report back that would be helpful. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in linux (Ubuntu):
assignee: nobody → Andy Whitcroft (apw)
status: Triaged → Incomplete
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I'm testing linux-image-2.6.27-14-server on amd64 KVM this weekend.

/etc/apt/sources.list contains:

deb http://archive.ubuntu.com/ubuntu/ intrepid-proposed main

/etc/apt/preferences contains:

Package: *
Pin: release a=intrepid-security
Pin-Priority: 990

Package: *
Pin: release a=intrepid-updates
Pin-Priority: 900

Package: *
Pin: release a=intrepid-proposed
Pin-Priority: 400

Then install the new kernel with:

sudo aptitude -t intrepid-proposed install linux-image-2.6.27-14-server linux-image-server

and maybe if you have the need for the kernel headers as well:

sudo aptitude -t intrepid-proposed install linux-headers-2.6.27-14-server

The base header package will be pulled in automatically.

Reboot. Compile sys_basher from source code here: http://www.polybus.com/sys_basher_web/

Let two instances of "sys_basher -d -ho 9999" run over the weekend in GNU screen. One on NFS and the other on xfs.

Revision history for this message
Deepak Sarda (antrix) wrote :
Download full text (28.0 KiB)

It is possible that I am facing this issue too. I am running Jaunty (kernel 2.6.28-11-generic). I have a mdadm based raid too.

Here's the dump from /var/log/kern.log that I also posted in: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/367389/comments/2

==================================================================
Apr 28 14:37:32 cellar kernel: [67396.314920] BUG: unable to handle kernel paging request at 01040424
Apr 28 14:37:32 cellar kernel: [67396.314929] IP: [<c02c85cb>] prio_tree_insert+0x14b/0x290
Apr 28 14:37:32 cellar kernel: [67396.314940] *pde = 00000000
Apr 28 14:37:32 cellar kernel: [67396.314945] Oops: 0000 [#1] SMP
Apr 28 14:37:32 cellar kernel: [67396.314949] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb1/idVendor
Apr 28 14:37:32 cellar kernel: [67396.314955] Dumping ftrace buffer:
Apr 28 14:37:32 cellar kernel: [67396.314960] (ftrace buffer empty)
Apr 28 14:37:32 cellar kernel: [67396.314962] Modules linked in: bridge stp bnep vboxnetflt vboxdrv input_polldev video output reiserfs lp snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device iTCO_wdt iTCO_vendor_support snd psmouse soundcore ppdev serio_raw pcspkr usblp snd_page_alloc btusb nvidia(P) intel_agp parport_pc parport agpgart usbhid r8169 mii raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor [last unloaded: uinput]
Apr 28 14:37:32 cellar kernel: [67396.315019]
Apr 28 14:37:32 cellar kernel: [67396.315024] Pid: 27554, comm: kio_http_cache_ Tainted: P (2.6.28-11-generic #42-Ubuntu) MS-7176
Apr 28 14:37:32 cellar kernel: [67396.315027] EIP: 0060:[<c02c85cb>] EFLAGS: 00010202 CPU: 0
Apr 28 14:37:32 cellar kernel: [67396.315031] EIP is at prio_tree_insert+0x14b/0x290
Apr 28 14:37:32 cellar kernel: [67396.315034] EAX: ee5b6e3c EBX: 01040404 ECX: f197634c EDX: 010403e0
Apr 28 14:37:32 cellar kernel: [67396.315037] ESI: 00000020 EDI: 00000001 EBP: f110deb8 ESP: f110de90
Apr 28 14:37:32 cellar kernel: [67396.315039] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Apr 28 14:37:32 cellar kernel: [67396.315043] Process kio_http_cache_ (pid: 27554, ti=f110c000 task=f1114b60 task.ti=f110c000)
Apr 28 14:37:32 cellar kernel: [67396.315045] Stack:
Apr 28 14:37:32 cellar kernel: [67396.315047] f267a3ec f197634c 00000116 00000118 000433a8 00000000 ee5b6e3c f267a3ec
Apr 28 14:37:32 cellar kernel: [67396.315056] f267a3c8 ee4e5800 f110dec8 c019bb92 f267a3c8 f1976334 f110ded8 c01a3cc9
Apr 28 14:37:32 cellar kernel: [67396.315065] f267a3c8 f1976334 f110def8 c01a4373 ee8559c0 ee8559b8 f0227aa8 ee8559b8
Apr 28 14:37:32 cellar kernel: [67396.315074] Call Trace:
Apr 28 14:37:32 cellar kernel: [67396.315077] [<c019bb92>] ? vma_prio_tree_insert+0x22/0xc0
Apr 28 14:37:32 cellar kernel: [67396.315084] [<c01a3cc9>] ? __vma_link_file+0x49/0x80
Apr 28 14:37:32 cellar kernel: [67396.315088] [<c01a4373>] ? vma_link+0x63/0x90
Apr 28 14:37:32 cellar kernel: [67396.315092] [<c01a5a49>] ? mmap_region+0x489/0x4e0
Apr 28 14:37:32 cellar kernel: [67396.315097] [<c01a5d09>] ? do_mm...

Revision history for this message
Andy Whitcroft (apw) wrote :

@Deepak -- could you try the kernel in proposed as the -11 kernels do not contain the fix, but the -14 should. Please report back here.

Revision history for this message
Deepak Sarda (antrix) wrote :

@Andy: I added the jaunty-proposed repo (as in https://wiki.ubuntu.com/Testing/EnableProposed) but didn't get any new kernel. I tried both with and without pinning the repos as suggested in 'selective upgrading from -proposed' but still no new kernel.

Meanwhile, I am attaching some more kernel logs with lots and lots of oops from early this morning. The system was reporting cpu lockups for almost two hours before I noticed it. When I got to the system, enough applications where misbehaving (error on quit, do not start, can't be killed, etc.) that I had to reboot. Of course, the reboot stuck while leaving X and I then had to power-cycle.

Revision history for this message
tom (thomas-gutzler) wrote :

I've been having similar problems for quite some time (http://oss.sgi.com/archives/xfs/2009-03/msg00062.html). Lockups affecting random parts of the system. Most of the time, it wouldn't even reboot and I had to pull the plug. I'm running intrepid 2.6.27-11-server x86_64 on an Intel Core2 Duo 2.66GHz with an Adaptec 31605 hardware raid-5 (10x500GB); stripe size 256K.
I have attached a log file (xfs_oops.txt) and just upgraded to the -14-server kernel hoping this will solve the problem.

Revision history for this message
Andy Whitcroft (apw) wrote :

@Deepak -- sorry missread your version number, jaunty has no -proposed and the main kernels have the fix to this bug. Looking at your log fragment it looks like a different trace. Could you file that one as a new bug. When you do could you check back before the first occurance of the bug in your logs and pick up a say 10 lines before, i am expecting to see a panic or similar before the first one.

Revision history for this message
Andy Whitcroft (apw) wrote :

@tom -- we would expect that to be gone with -14 kernels, if you could report back if your testing is successful that would be helpful.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

linux-image-2.6.27-14-server on amd64 KVM survived 6 days 22 hours including 13 hours of non-stop sys_basher stress testing on xfs and nfs file systems simultaneously. The VM went down when the KVM host running 2.6.27-11 had issues.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

The issue appears to be a pagecache bug that has been in Intrepid since ALPHA versions. High levels of I/O trigger it so RAID is especially vulnerable.

Check include/linux/radix-tree.h for the patch mentioned at the end of http://lkml.org/lkml/2009/1/5/307 to see if your Intrepid is affected or not. You only need to installed the kernel headers, not the complete source in order to check this. The 2.6.27-14 kernel has that patch.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

We have deployed 2.6.27-14 on the KVM host. As a caveat we had to add "rootdelay=200" to the kernel options as the FC attached storage didn't configure quickly enough for the root fs to be found.

This machine is hosting 14 VMs so it's I/O load can be very heavy at times.

My potentially related bug list has expanded:

bug 268215
bug 289158
bug 300329
bug 303064
bug 312163
bug 348218

Revision history for this message
tom (thomas-gutzler) wrote :

@andy: Not so much luck on my side.
System was running for less than a day before it crashed again. I've attached a new log file. Looks very similar.

Revision history for this message
tom (thomas-gutzler) wrote :

Is there anything I can do to help solve this problem?

Revision history for this message
Andy Whitcroft (apw) wrote :

@tom -- that appears to be a different symptom. Cirtainly from the one originally reported on this bug, and the one the patch to which I was referring was supposed to fix. Lets get a separate bug filed for that one please. We can dup it to this one should it prove later to be the same.

Revision history for this message
Andy Whitcroft (apw) wrote :

Now that karmic is open the primary task applies there. The fix relating to this issue is already there so closing Fix Released. We continue to monitor Intrepid expecting that the fix is there, in -14 and later.

Changed in linux (Ubuntu Intrepid):
assignee: nobody → Andy Whitcroft (apw)
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Chris Osgood (ubuntu-functionalfuture) wrote :

As far as I can tell this issue is fixed in Jaunty. I have been really hammering the system for several weeks and have not seen the original problem.

I believe some of issues posted here lately are not related to the original bug. The stack traces look different.

Andy Whitcroft (apw)
Changed in linux (Ubuntu Intrepid):
importance: Undecided → High
Revision history for this message
Andy Whitcroft (apw) wrote :

We believe that the original bug here is definatly fixed by the change to the lockless pagecache carried in the -14.18 kernel which is now in Intrepid -updates. Closing this one Fix Released for Intrepid. If you do not have the original "soft lockup" style error then please file a new bug.

Changed in linux (Ubuntu Intrepid):
milestone: none → intrepid-updates
status: Incomplete → Fix Released
Revision history for this message
Deepak Sarda (antrix) wrote :

@Andy after almost a week of no-crashes, my computer froze last night. Attached is the extract from kernel log. As you asked, I looked for any 'Panic' lines before the crash but there aren't any.. even for the earlier reported crashes.

In this crash, the first bug reported is "BUG: unable to handle kernel paging request at 0cabf6f5" in process 'apt-config'. After a minute of this occurring, I see several "BUG: soft lockup - CPU#0 stuck for 61s!" errors. There is also something related to "ata1.00: limiting speed to UDMA/44:PIO4" etc., which I don't understand.

Is this crash still related to this particular bug report? If not, can you help me identify some other bug against which I can file this? I am sorry but I really don't have the knowledge to look through the filed bugs and correctly identify which one to report this against!

I have also installed the kerneloops package and it seems to be submitting all the oops. But I can't figure out where they end up on the kernel oops site. How do I get the URL of my submitted oops!?

Revision history for this message
tom (thomas-gutzler) wrote :

I noticed that "Pid: x, comm: indexer Not tainted 2.6.27-y-server" appeared in all my log files. indexer is part of mnogosearch which I use to search through pdf files. I had built version 3.3.7 on a 2.6.27-9-generic kernel a while ago.
I downloaded the new version of mnogosearch, rebuilt it and the system runs fine since.
Sorry for misleading

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.