Kernel panic - not syncing

Bug #1024309 reported by Dmitri Minaev
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

Since May, some of my servers crashed from time to time. This one crashed today. No information is available in syslog and other log files. The CPU load was not very high, but a lot of swap was used, even though the disks were not very active.

Here's the OCRed screenshot of the hanging system:

[5288936.542226] [<ffffffff81011e9b>] cpu_idle+0xeb/0x110
[5288936.542229] [<ffffffff81552b33>] start_secondary+0xa8/Oxaa
[5288936.542231] Code: 06 89 85 c0 fe ff ff c7 85 c4 fe ff ff 01 00 00 00 e9 9'
fb ff ff 90 48 8b 95 e0 fe ff ff 48 8b 45 a8 8b 72 08 48 e1 e0 0a 31 d2 <48) f’
f6 48 8b 75 b0 48 89 45 a0 31 c0 48 85 f6 74 0c 48 8b 45
[5288936.542248] RIP [<ffffffff810562a4>] find_busiest_group+0x634/0x8f0
[5288936.542251] RSP <ffff88101cce3c58>
[5288936.542253] ---[ end trace 17d6ca884388b3c0 ]--—
[5288936.542255] Kernel panic — not syncing: attempted to kill the idle task!
[5288936.54225?] Pid: 0, comm: swapper Tainted: G D 2.6.32-24-server #
-Ubuntu
[5288936.542259] Call Trace:
[5288936.542261] [<ffffffff8155?dda>] panic+0x78/0x137
[5288936.542266] [<ffffffff8106b53a>] do_exit+0x35a/0x380
[5288936.542269] [<ffffffff8155bc40>] oops_end+0xb0/0xf0
[5288936.5422?1] [<ffffffff8101?11b>] die+0x5b/0x90
[5288936.5422?3] [<ffffffff8155b514>] do_trap+0xc4/0x1?0
[5288936.5422?6] [<ffffffff810150af>] do_divide_error+0x8f/0xb0
[5288936.5422?9] [<ffffffff810562a4>] ? find_busiest_group+0x634/0x8f0
[5288936.542282] [<ffffffff812ba38c>] ? put_dec+0x10c/0x110
[5288936.542284] [<ffffffff812ba6?e>] ? number+0x2ee/0x320
[5288936.542286] [<ffffffff812b8ece>] ? __rb_erase_co1or+0x1be/0x1d0
[5288936.542289] [<ffffffff81013f1b>] divide_error+0x1b/0x20
[5288936.542292] [<ffffffff810562a4>] ? find_busiest_group+0x634/0x8f0
[5288936.542295] [<ffffffff81055ea4>] ? find_busiest_group+0x234/0x8f0
[5288936.542299] [<ffffffff8105c8a8>] load_balance_newidle+0xa8/0x310
[5288936.542302] [<ffffffff81558a8a>] thread_return+0x352/0x418
[5288936.542305] [<ffffffff81011e9b>] cpu_idle+0xeb/0x110
[5288936.54230?] [<ffffffff81552b33>] start_secondary+0xa8/Oxaa

---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg:

DistroRelease: Ubuntu 10.04
Frequency: Once every few days.
InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
MachineType: DEPO Computers X8DT3
Package: linux (not installed)
PciMultimedia:

ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-41-server root=/dev/mapper/S--WEB--01-root ro quiet
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-41.88-server 2.6.32.59+drm33.24
Regression: No
Reproducible: No
Tags: lucid needs-upstream-testing
Uname: Linux 2.6.32-41-server x86_64
UserGroups:

dmi.bios.date: 09/14/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2.0a
dmi.board.asset.tag: 1234567890
dmi.board.name: X8DT3
dmi.board.vendor: Supermicro
dmi.board.version: 2.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2.0a:bd09/14/2010:svnDEPOComputers:pnX8DT3:pvr1234567890:rvnSupermicro:rnX8DT3:rvr2.0:cvnSupermicro:ct17:cvr1234567890:
dmi.product.name: X8DT3
dmi.product.version: 1234567890
dmi.sys.vendor: DEPO Computers

Revision history for this message
Dmitri Minaev (minaev) wrote :

On July, 13, when the system crashed, it ran the kernel 2.6.32-24-server, as seen from the screenshot above. Apport was launched under new kernel 2.6.32-41-server, which was installed on April, 23, but the server was not rebooted.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1024309

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: lucid
Revision history for this message
Dmitri Minaev (minaev) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Dmitri Minaev (minaev) wrote : Lspci.txt

apport information

Revision history for this message
Dmitri Minaev (minaev) wrote : Lsusb.txt

apport information

Revision history for this message
Dmitri Minaev (minaev) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Dmitri Minaev (minaev) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Dmitri Minaev (minaev) wrote : ProcModules.txt

apport information

Revision history for this message
Dmitri Minaev (minaev) wrote : UdevDb.txt

apport information

Revision history for this message
Dmitri Minaev (minaev) wrote : UdevLog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
description: updated
Revision history for this message
Luis Henriques (henrix) wrote :

Dmitri, I'm not sure I've understood correctly: the issue occurred under kernel 2.6.32-24, correct? This is a very old version, and it would be great if you could reproduce it in more recent versions (such as 2.6.32-41).

From the git log, I can see at least a commit that _could_ have fixed this issue; I'm referring to upstreams commit aa483808516ca5cacfa0e5849691f64fec25828e, which was released in Lucid kernel 2.6.32-30.59.

Revision history for this message
Dmitri Minaev (minaev) wrote :

This is correct, Luis. I have upgraded the kernel to 2.6.32-41. Unfortunately, I cannot reproduce the bug and by the time I see it next time, 2.6.32-41 may become outdated, too :).

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Dmitri Minaev (minaev) wrote :
Download full text (3.9 KiB)

Here is another OCRed screenful of messages from another, different server which suffered the same problem a day earlier. This server is HP Proliant DL180 G6, with two Intel Xeon X5650 (just like the first server!). I also attach the result of 'ubuntu-bug linux' from this server.

[5225927.742220] Call Trace:
[5225927.755435] <IRQ> [<ffffffff81557dda>] panic+0x78/0x137
[5225927.769348] [<ffffffff8155bc7a>] oops_end+0xea/0xf0
[5225927.783026] [<ffffffff8101711b>] die+0x5b/0x90
[5225927.796585] [<ffffffff8155b514>] do_trap+0xc4/0x170
[5225927.810147] [<ffffffff810150af>] do_divide_error+0x8f/0xb0
[5225927.823744] [<ffffffff810562a4>] ? find_busiest_group+0x634/0x8f0
[5225927.837567] [<ffffffffa00f0185>] ? ipt_do_table+0x295/0x678 [ip_tables]
[5225927.851521] [<ffffffff8105b230>] ? default_wake_function+0x0/0x20
[5225927.865460] [<ffffffff81013f1b>] divide_error+0x1b/0x20
[5225927.879315] [<ffffffff810562a4>] ? find_busiest_group+0x634/0x8f0
[5225927.893496] [<ffffffff8105c22e>] load_balance+0xae/0x410
[5225927.907535] [<ffffffff8105c629>] reba1ance_domains+0x99/0x180
[5225927.921614] [<ffffffff8105c759>] run_rebalance_domains+0x49/0xf0
[5225927.935682] [<ffffffff8108f393>] ? ktime_get+0x63/0xe0
[5225927.949579] [<ffffffff8106e317>] __do_softirq+0xb7/0x1e0
[5225927.963615] [<ffffffff8109431a>] ? tick_program_event+0x2a/0x30
[5225927.977748] [<ffffffff810142ec>] call_softirq+0x1c/0x30
[5225927.991829] [<ffffffff81015cb5>] do_softirq+0x65/0xa0
[5225928.005829] [<ffffffff8106e1b5>] irq_exit+0x85/0x90
[5225928.019422] [<ffffffff8155fe11>] smp_apic_timer_interrupt+0x71/0x90
[5225928.033146] [<ffffffff81013cb3>] apic_timer_interrupt+0x13/0x20
[5225928.045831] <EUI> [<ffffffff8130f443>] ? acpi_idle_enter_bm+0x28a/0x2bI
[5225928.058796] [<ffffffff8130f43c>] ? acpi_idle_enter_bm+0x283/0x2be
[5225928.071497] [<ffffffff8144e007>] ? cpuidle_idle_call+0xa7/0x140
[5225928.084184] [<ffffffff81011e63>] ? cpu_idle+0xb3/0x110
[5225928.096631] [<ffffffff81552b33>] ? start_secondary+0xa8/Oxaa
?field.comment=Here is another OCRed screenful of messages from another, different server which suffered the same problem a day earlier. This server is HP Proliant DL180 G6, with two Intel Xeon X5650 (just like the first server!). I also attach the result of 'ubuntu-bug linux' from this server.

[5225927.742220] Call Trace:
[5225927.755435] <IRQ> [<ffffffff81557dda>] panic+0x78/0x137
[5225927.769348] [<ffffffff8155bc7a>] oops_end+0xea/0xf0
[5225927.783026] [<ffffffff8101711b>] die+0x5b/0x90
[5225927.796585] [<ffffffff8155b514>] do_trap+0xc4/0x170
[5225927.810147] [<ffffffff810150af>] do_divide_error+0x8f/0xb0
[5225927.823744] [<ffffffff810562a4>] ? find_busiest_group+0x634/0x8f0
[5225927.837567] [<ffffffffa00f0185>] ? ipt_do_table+0x295/0x678 [ip_tables]
[5225927.851521] [<ffffffff8105b230>] ? default_wake_function+0x0/0x20
[5225927.865460] [<ffffffff81013f1b>] divide_error+0x1b/0x20
[5225927.879315] [<ffffffff810562a4>] ? find_busiest_group+0x634/0x8f0
[5225927.893496] [<ffffffff8105c22e>] load_balance+0xae/0x410
[5225927.907535] [<ffffffff8105c629>] reba1ance_domains+0x99/0x180
[5225927.921614] [<ffffffff8105c759>] run_rebalance_domains+0x49/0xf0
[5225927.935682] [<fffffff...

Read more...

Revision history for this message
Dmitri Minaev (minaev) wrote :

I wonder if this bug could be related to https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/614853 and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/824304.

The same division by zero in find_busiest_group.

Also, I think I didn't mention it before, both servers were running the same software, Sphinx search engine (http://sphinxsearch.com).

Revision history for this message
penalvch (penalvch) wrote :

Dmitri Minaev, chis bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest server release of Ubuntu? ISO images are available from http://releases.ubuntu.com/raring/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: bios-outdated-r2.1 needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.