Kernel Panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.

Bug #1318551 reported by Florian Engelmann on 2014-05-12
68
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Rafael David Tinoco

Bug Description

Ubuntu Server 14.04 amd64
Linux global04-jobs2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

HW: HP DL380p Gen8 / Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz / 256GB

global04-jobs2 login: [203930.116834] Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.
[203930.116834]
[203930.174171] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-24-generic #46-Ubuntu
[203930.211766] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
[203930.249588] 0000b9792e2c349e ffff881fbfa06dd0 ffffffff81715a64 ffffffffa02c02d8
[203930.286130] ffff881fbfa06e48 ffffffff8170ec65 0000000000000008 ffff881fbfa06e58
[203930.322243] ffff881fbfa06df8 0000000000000000 ffffc90029274072 0000000000000001
[203930.358126] Call Trace:
[203930.370201] <NMI> [<ffffffff81715a64>] dump_stack+0x45/0x56
[203930.407285] [<ffffffff8170ec65>] panic+0xc8/0x1d7
[203930.430907] [<ffffffffa02bf8fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[203930.461560] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
[203930.487228] [<ffffffff8171f108>] nmi_handle.isra.3+0x88/0x180
[203930.516179] [<ffffffff8171f3bd>] do_nmi+0x1bd/0x340
[203930.540958] [<ffffffff8171e571>] end_repeat_nmi+0x1e/0x2e
[203930.567543] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203930.593928] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203930.619941] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203930.646100] <<EOE>> [<ffffffff815c9570>] cpuidle_enter_state+0x40/0xc0
[203930.678933] [<ffffffff815c96a9>] cpuidle_idle_call+0xb9/0x1f0
[203930.707409] [<ffffffff8101ceae>] arch_cpu_idle+0xe/0x30
[203930.734325] [<ffffffff810beb85>] cpu_startup_entry+0xc5/0x290
[203930.763715] [<ffffffff81703f37>] rest_init+0x77/0x80
[203930.789115] [<ffffffff81d34f70>] start_kernel+0x438/0x443
[203930.815958] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[203930.844461] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
[203930.874616] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[203930.907482] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[203930.942571] ERST: [Firmware Warn]: Firmware does not respond in time.
[203930.983822] ERST: [Firmware Warn]: Firmware does not respond in time.
[203931.026586] ERST: [Firmware Warn]: Firmware does not respond in time.
[203931.068394] ERST: [Firmware Warn]: Firmware does not respond in time.
[203931.110558] ERST: [Firmware Warn]: Firmware does not respond in time.
[203931.151923] ERST: [Firmware Warn]: Firmware does not respond in time.
[203931.189421] ------------[ cut here ]------------
[203931.212832] WARNING: CPU: 0 PID: 0 at /build/buildd/linux-3.13.0/kernel/rcu/tree.c:508 rcu_eqs_exit_common.isra.48+0x110/0x120()
[203931.269279] Modules linked in: veth bridge bonding dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison libcrc32c gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul sb_edac glue_helper 8021q ablk_helper cryptd hpwdt hpilo edac_core ioatdma lpc_ich psmouse garp serio_raw stp mrp llc acpi_power_meter ipmi_si mac_hid lp parport igb i2c_algo_bit tg3 dca ptp hpsa pps_core
[203931.488907] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-24-generic #46-Ubuntu
[203931.526932] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
[203931.559891] 0000000000000009 ffff881fbfa03ed0 ffffffff81715a64 0000000000000000
[203931.597473] ffff881fbfa03f08 ffffffff810676bd 0000000000000001 0000000000000046
[203931.634669] 0000000000000000 0000000000000000 ffffffff81c93398 ffff881fbfa03f18
[203931.696007] Call Trace:
[203931.708590] <IRQ> [<ffffffff81715a64>] dump_stack+0x45/0x56
[203931.737818] [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
[203931.767951] [<ffffffff8106779a>] warn_slowpath_null+0x1a/0x20
[203931.797160] [<ffffffff810c8720>] rcu_eqs_exit_common.isra.48+0x110/0x120
[203931.831002] [<ffffffff810caf05>] rcu_irq_enter+0x75/0xa0
[203931.857924] [<ffffffff8106cea7>] irq_enter+0x17/0xa0
[203931.883727] [<ffffffff8109894e>] scheduler_ipi+0x4e/0x1d0
[203931.911444] [<ffffffff810404ca>] smp_reschedule_interrupt+0x2a/0x30
[203931.943438] [<ffffffff8172781d>] reschedule_interrupt+0x6d/0x80
[203931.973619] <EOI> <NMI> [<ffffffff8170ed33>] ? panic+0x196/0x1d7
[203932.005056] [<ffffffffa02bf8fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[203932.036609] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
[203932.063827] [<ffffffff8171f108>] nmi_handle.isra.3+0x88/0x180
[203932.093283] [<ffffffff8171f3bd>] do_nmi+0x1bd/0x340
[203932.118263] [<ffffffff8171e571>] end_repeat_nmi+0x1e/0x2e
[203932.145761] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203932.172752] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203932.199896] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203932.227251] <<EOE>> [<ffffffff815c9570>] cpuidle_enter_state+0x40/0xc0
[203932.260781] [<ffffffff815c96a9>] cpuidle_idle_call+0xb9/0x1f0
[203932.290037] [<ffffffff8101ceae>] arch_cpu_idle+0xe/0x30
[203932.316383] [<ffffffff810beb85>] cpu_startup_entry+0xc5/0x290
[203932.345426] [<ffffffff81703f37>] rest_init+0x77/0x80
[203932.370610] [<ffffffff81d34f70>] start_kernel+0x438/0x443
[203932.398445] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[203932.428250] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
[203932.459464] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[203932.492000] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[203932.522502] ---[ end trace 1b2caf07f75276b5 ]---
[203932.546584] ------------[ cut here ]------------
[203932.570277] WARNING: CPU: 0 PID: 0 at /build/buildd/linux-3.13.0/kernel/rcu/tree.c:388 rcu_eqs_enter_common.isra.47+0x210/0x220()
[203932.629000] Modules linked in: veth bridge bonding dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison libcrc32c gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul sb_edac glue_helper 8021q ablk_helper cryptd hpwdt hpilo edac_core ioatdma lpc_ich psmouse garp serio_raw stp mrp llc acpi_power_meter ipmi_si mac_hid lp parport igb i2c_algo_bit tg3 dca ptp hpsa pps_core
[203932.840564] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.13.0-24-generic #46-Ubuntu
[203932.882979] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
[203932.915861] 0000000000000009 ffff881fbfa03ec0 ffffffff81715a64 0000000000000000
[203932.953260] ffff881fbfa03ef8 ffffffff810676bd ffff881fbfa0e600 0000000000000001
[203932.990936] ffff881fbfa0e600 0000000000000000 0000000000000000 ffff881fbfa03f08
[203933.028255] Call Trace:
[203933.041524] <IRQ> [<ffffffff81715a64>] dump_stack+0x45/0x56
[203933.069392] [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
[203933.099631] [<ffffffff8106779a>] warn_slowpath_null+0x1a/0x20
[203933.129192] [<ffffffff810c80d0>] rcu_eqs_enter_common.isra.47+0x210/0x220
[203933.163019] [<ffffffff810ca35d>] rcu_irq_exit+0x6d/0xa0
[203933.189393] [<ffffffff8106cf9b>] irq_exit+0x6b/0x110
[203933.214830] [<ffffffff810989ae>] scheduler_ipi+0xae/0x1d0
[203933.242468] [<ffffffff810404ca>] smp_reschedule_interrupt+0x2a/0x30
[203933.274067] [<ffffffff8172781d>] reschedule_interrupt+0x6d/0x80
[203933.304620] <EOI> <NMI> [<ffffffff8170ed33>] ? panic+0x196/0x1d7
[203933.336114] [<ffffffffa02bf8fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[203933.368295] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
[203933.394723] [<ffffffff8171f108>] nmi_handle.isra.3+0x88/0x180
[203933.423870] [<ffffffff8171f3bd>] do_nmi+0x1bd/0x340
[203933.448658] [<ffffffff8171e571>] end_repeat_nmi+0x1e/0x2e
[203933.476063] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203933.502937] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203933.530842] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203933.558576] <<EOE>> [<ffffffff815c9570>] cpuidle_enter_state+0x40/0xc0
[203933.592501] [<ffffffff815c96a9>] cpuidle_idle_call+0xb9/0x1f0
[203933.622438] [<ffffffff8101ceae>] arch_cpu_idle+0xe/0x30
[203933.649036] [<ffffffff810beb85>] cpu_startup_entry+0xc5/0x290
[203933.678640] [<ffffffff81703f37>] rest_init+0x77/0x80
[203933.704425] [<ffffffff81d34f70>] start_kernel+0x438/0x443
[203933.732692] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[203933.762571] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
[203933.794029] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[203933.826003] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[203933.856852] ---[ end trace 1b2caf07f75276b6 ]---
[203933.879841] ------------[ cut here ]------------
[203933.903067] WARNING: CPU: 0 PID: 0 at /build/buildd/linux-3.13.0/arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
[203933.962151] Modules linked in: veth bridge bonding dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison libcrc32c gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul sb_edac glue_helper 8021q ablk_helper cryptd hpwdt hpilo edac_core ioatdma lpc_ich psmouse garp serio_raw stp mrp llc acpi_power_meter ipmi_si mac_hid lp parport igb i2c_algo_bit tg3 dca ptp hpsa pps_core
[203934.173264] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.13.0-24-generic #46-Ubuntu
[203934.216377] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
[203934.249431] 0000000000000009 ffff881fbfa03d90 ffffffff81715a64 0000000000000000
[203934.287108] ffff881fbfa03dc8 ffffffff810676bd 0000000000000001 ffff881fbfa14440
[203934.324901] 000000010308f9c3 0000000000000000 ffff881fbfa34440 ffff881fbfa03dd8
[203934.362475] Call Trace:
[203934.375125] <IRQ> [<ffffffff81715a64>] dump_stack+0x45/0x56
[203934.404219] [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
[203934.434625] [<ffffffff8106779a>] warn_slowpath_null+0x1a/0x20
[203934.464213] [<ffffffff8104023d>] native_smp_send_reschedule+0x5d/0x60
[203934.497051] [<ffffffff810a7ffa>] trigger_load_balance+0x16a/0x1e0
[203934.527951] [<ffffffff810992b4>] scheduler_tick+0xa4/0xf0
[203934.555529] [<ffffffff81076230>] update_process_times+0x60/0x70
[203934.585948] [<ffffffff810d5be5>] tick_sched_handle.isra.17+0x25/0x60
[203934.619005] [<ffffffff810d5c61>] tick_sched_timer+0x41/0x60
[203934.647109] [<ffffffff8108e537>] __run_hrtimer+0x77/0x1d0
[203934.674547] [<ffffffff810d5c20>] ? tick_sched_handle.isra.17+0x60/0x60
[203934.707593] [<ffffffff8108ed3f>] hrtimer_interrupt+0xef/0x230
[203934.737132] [<ffffffff81043087>] local_apic_timer_interrupt+0x37/0x60
[203934.769972] [<ffffffff817287ff>] smp_apic_timer_interrupt+0x3f/0x60
[203934.801850] [<ffffffff8172719d>] apic_timer_interrupt+0x6d/0x80
[203934.831941] <EOI> <NMI> [<ffffffff8170ed33>] ? panic+0x196/0x1d7
[203934.864271] [<ffffffffa02bf8fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[203934.896292] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
[203934.923132] [<ffffffff8171f108>] nmi_handle.isra.3+0x88/0x180
[203934.952363] [<ffffffff8171f3bd>] do_nmi+0x1bd/0x340
[203934.977834] [<ffffffff8171e571>] end_repeat_nmi+0x1e/0x2e
[203935.005817] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203935.033184] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203935.060362] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[203935.087458] <<EOE>> [<ffffffff815c9570>] cpuidle_enter_state+0x40/0xc0
[203935.123200] [<ffffffff815c96a9>] cpuidle_idle_call+0xb9/0x1f0
[203935.152205] [<ffffffff8101ceae>] arch_cpu_idle+0xe/0x30
[203935.177872] [<ffffffff810beb85>] cpu_startup_entry+0xc5/0x290
[203935.207919] [<ffffffff81703f37>] rest_init+0x77/0x80
[203935.233858] [<ffffffff81d34f70>] start_kernel+0x438/0x443
[203935.261678] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[203935.293191] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
[203935.324069] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[203935.357138] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[203935.388293] ---[ end trace 1b2caf07f75276b7 ]---

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-24-generic 3.13.0-24.46
ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 12 10:13 seq
 crw-rw---- 1 root audio 116, 33 May 12 10:13 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Mon May 12 10:50:00 2014
HibernationDevice: RESUME=/dev/mapper/system-swap
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant DL380p Gen8
PciMultimedia:

ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-24-generic root=/dev/mapper/system-root ro console=tty0 console=tty1 console=ttyS0,115200n8 swapaccount=1 net.ifnames=1 biosdevname=0
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-24-generic N/A
 linux-backports-modules-3.13.0-24-generic N/A
 linux-firmware 1.127
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/10/2014
dmi.bios.vendor: HP
dmi.bios.version: P70
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP70:bd02/10/2014:svnHP:pnProLiantDL380pGen8:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL380p Gen8
dmi.sys.vendor: HP

Florian Engelmann (engelmann) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Can you take a picture of the panic and attach it to the bug report?

Changed in linux (Ubuntu):
importance: Undecided → High
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-rc5-utopic/

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Claudio Kuenzler (napsty) wrote :

Have two servers within 9 days which crashed because of this bug.
Both HP DL 380 Gen8 with Ubuntu 14.04 LTS and Kernel 3.13.0-24 generic.

Copied the panic from ILO's text console:

[1863292.758326] [<ffffffff8108ed3f>] hrtimer_interrupt+0xef/0x230
[1863292.786466] [<ffffffff81043087>] local_apic_timer_interrupt+0x37/0x60
[1863292.818554] [<ffffffff8172887f>] smp_apic_timer_interrupt+0x3f/0x60
[1863292.850351] [<ffffffff8172721d>] apic_timer_interrupt+0x6d/0x80
[1863292.879596] <EOI> <NMI> [<ffffffff8170ed93>] ? panic+0x196/0x1d7
[1863292.910248] [<ffffffffa01208fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[1863292.941162] [<ffffffff8101b7d9>] ? sched_clock+0x9/0x10
[1863292.968075] [<ffffffff8171f188>] nmi_handle.isra.3+0x88/0x180
[1863292.996345] [<ffffffff8171f43d>] do_nmi+0x1bd/0x340
[1863293.020585] [<ffffffff8171e5f1>] end_repeat_nmi+0x1e/0x2e
[1863293.046908] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[1863293.073359] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[1863293.099900] [<ffffffff813dfd78>] ? intel_idle+0xd8/0x140
[1863293.125429] <<EOE>> [<ffffffff815c95d0>] cpuidle_enter_state+0x40/0xc0
[1863293.157562] [<ffffffff815c9709>] cpuidle_idle_call+0xb9/0x1f0
[1863293.185072] [<ffffffff8101ceae>] arch_cpu_idle+0xe/0x30
[1863293.212075] [<ffffffff810beb85>] cpu_startup_entry+0xc5/0x290
[1863293.239492] [<ffffffff81703f97>] rest_init+0x77/0x80
[1863293.263435] [<ffffffff81d34f70>] start_kernel+0x438/0x443
[1863293.289440] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[1863293.317249] [<ffffffff81d34120>] ? early_idt_handlers+0x120/0x120
[1863293.348279] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[1863293.378575] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[1863293.407399] ---[ end trace de17fe13514d8515 ]---

Changed in linux (Ubuntu):
status: Expired → Confirmed
falstaff (falstaff) wrote :

We see this or a similar issue on a DL360 G7. As visible in the stack trace, we are using VirtualBox. However, I think this is not related since we used the same version on a earlier Kernel. We also upgraded to latest VirtualBox version after the last crash, however the system crashed again since then.

[459191.570170] Kernel panic - not syncinc: An NMI occurred, please see the Integrated Management Log for details.
[4s9191.570170]
[459191.585348] CPU: 4 PZD: 4245 Comm: EMT-O Tainted: GF ZO 3.13.0-30-generic #55-Ubuntu
[459191.592975] Hardware name: HP ProLiant DL360 G7, BIOS P68 01/28/2011
[459191.600518] 0001a1a1de6c68e8 ffff880c1f886dd0 ffffffff8171a324 ffffffffa00502d8
[459191.607889] ffff880c1f886e48 ffffffff81713525 0000000000000008 ffff880c1f886e58
[459191.615247] ffff880c1f886df8 0000000000000000 ffffc9000625e072 0000000000000000
[459191.622757] Call Trace:
[459191.629819] <NMZ> [<ffffffff8171a324>] dump_stack+Ox45/Ox56
[459191.636964] [<ffffffff81713525>] panic+0xc8/0x1d7
[459191.643974] [<ffffffffa004f8fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[459191.650902] [<ffffffff8101b7c9>]? sched_clock+0x9/0x10
[459191.657646] [<ffffffff817239c8>] nmi_handle.isra.3+0x88/0x180
[459191.664304] [<ffffffff81723c5e>] do_nmi+0x19e/0x340
[459191.670833] [<ffffffff81722e31>] end_repeat_nmi+0x1e/0x2e
[459191.677272] <<EOE>> [<ffffffff810aa7b8>] ? __wake_up_common+0x58/0x90
[459191.683699] [<ffffffffa0638a7d>]? rtROMemAllocEx+0x17d/0x250 [vboxdrv]
[459191.690002] [<ffffffffa062db97>] ? supdrvZOCtlFast+0x77/0xaO [vboxdrv]
[459191.696204] [<ffffffffa062a3f9>]? VBoxDrvLinuxZOCtl_4_3_12+0x49/0x1e0 [vboxdrv]
[459191.702403] [<ffffffff811cfa20>]? do_v_s_ioctl+Ox2eO/0x4cO
[459191.708419] [<ffffffff8109dd94>]? vtime_account_user+0x54/0x60
[459191.714335] [<ffffffff811cfc81>] ? SyS_ioctl+Ox81/OxaO
[459191.720137] [<ffffffff8172aeff>] ? tracesys+Oxe1/Oxe6
[459191.725830] drm_kms_helper: panic occurred, switching back to text console

We're not using virtualbox. What I traced back was that it always crashes in two functions.

update_cfs_shares() + pick_next_task_fair()

I think it has to do with compiler bug (gcc 4.9.0)
Are newer Ubuntu kernels built with that broken version probably?
I installed another kernel yesterday (pf-kernel 3.15-pf4). No crashs anymore.

So probably it has to do with this: https://lkml.org/lkml/2014/7/24/584

Btw. the kernel I'm using currently can be found at and of page http://big-bum.uni.cx/ftp.html

linux-image-3.15.0-pf4+_3.15.0-pf4+-10.00.Custom_amd64.deb

However ftp seems to be down currently...

falstaff (falstaff) wrote :

The Kernel installed in our environment is compiled using GCC 4.8.2 (check dmesg | head). Hence I don't think this is related.

We had the kernel panic again last week (see attachment). We added the hpwdt module to blacklist, which hopefully works around this problem. We also updated the BIOS, but not sure whether this would have solved the problem. Since its a production server, I cannot do any further tests to help solve this problem, sorry.

Marco Nenciarini (mnencia) wrote :

I have an HP Proliant dl380p gen8 with 256 GB of ram. 14.04 Installation works, but after the reboot It crashes at almost every boot attempt.

I tried reinstalling it, It started, I upgraded the system to latest kernel available and after the reboot it doesn't boot again, nor with the original nor the latest kernel.

Recovery from minima ISO works, but as soon as I boot to the normal system it crashes.

I've also tried the recovery mode and init=/bin/bash, and both works until I add some "concurrency" to the system, then it hangs again.

Attached there is a screenshot of one of the hang during the early boot.

I've already contacted HP and they changed the system main board, we installed the system again and it is still crashing.

Dave Richardson (pudnik019) wrote :

We are also seeing this bug on our HP DL380p Gen8's. Running:

3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

whoopn (whoopn) wrote :

So this seems to be related to the BIOS/OS CPU powerstate control. Specifically PPC in the BIOS appears to be the issue.

Here is a link to HP advisory as it relates to a similar VMware issue (in actuality, its not a vmware issue): http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_ba847bafb2a2d782fcbb0710b053ce01=wsrp-navigationalState%3DdocId%253Demr_na-c03543898-4%257CdocLocale%253D%257CcalledBy%253D&javax.portlet.tpst=ba847bafb2a2d782fcbb0710b053ce01&ac.admitted=1408113098173.876444892.492883150

Now hilariously this SAME issue occurs on most ANY HP motherboard after a certain date (for instance, my issue is with a workstation).

If you have a workstation, go into the BIOS and disable ANYTHING related to power regulation, let it run in turbo mode, if C_states are an option, disable them.

This has resolved it for me for 2 days thus far, which is quite an accomplishment with how often it was crashing before.

Marco Nenciarini (mnencia) wrote :

I can confirm that after installing the kernel at

ftp://big-bum.uni.cx/pf-kernel/amd64/linux-image-3.15.0-pf4%2B_3.15.0-pf4%2B-10.00.Custom_amd64.deb

the server worked for days without problem (however it's not in production, as I'm a bit scared of using such a kernel in production)

I think that this issue is unrelated to gcc bug linked upthread. Probably the linked kernel have some option that disable/make it working "Processor Clocking Control" or "Collaborative Power Control"

Marco Nenciarini (mnencia) wrote :

I've tried following the guide on HP site and disabled "Collaborative Power Control" option in BIOS, but the issue with the stock kernel persists. So it isn't the root cause.

Marco Nenciarini (mnencia) wrote :

I've also tried to recompile the kernel disabling CONFIG_X86_PCC_CPUFREQ, but the resulting kernel is still crashing on boot.

Marco Nenciarini (mnencia) wrote :

I've bisected the differences from Ubuntu stock kernel and pf4+ kernel configuration and I've found that the error is triggered by tue CONFIG_SCHED_AUTOGROUP setting. Fortunately it can be disabled at boot time adding the "noautogroup" parameter to grub.

With the "noautogroup" the Ubuntu stock kernel 3.13.0-35-generic boots without any problem. I've tried several times.

Another update on this to track down the problem further:

I've tested several kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline

What I did was installing every final kernel so I noticed the last one working without this problem seems to be http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7.10-raring/

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13.28-raring/ did fail for me as well.

So the problem seems to be introduced between the both versions.

I will try now some other 3.8.x to see if it appeared there from beginning.

Tested now http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-raring/

It's also crashing. So seems that this bug / feature :-) has been introduced in 3.8.0

Magesh GV (magesh-gv) wrote :

We started running into this issue on Hewlett-Packard ProLiant DL380 G6, BIOS P62 after moving to Ubuntu 14.04 Server edition from Ubuntu 13.04 Server

Magesh GV (magesh-gv) wrote :

This may have been introduced by this fix as it is seen only on HP Servers:

      x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-raring/CHANGES

For me this issue occurs also with Intel + Gigabyte Mainboard, so I can't confirm it's HP related only.

Andy Whitcroft (apw) wrote :

@Michael -- if 3.7.0 is ok, and 3.8.0 is not the next logical step is to try out the v3.8-rcN candidates out and see which of those introduced it.

Marco Nenciarini (mnencia) wrote :

Shouldn't be faster to git bisect the two releases?

I've just tested the rc1 http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc1-raring/ and it's also crashing

Esel (glumpad) wrote :

Have the same error on a HP Proliant G6 ML330 with Intel® Xeon(R) CPU E5504 @ 2.00GHz × 4 and 10GB RAM. I also have 2 new 1TB, WD1002F9YZ in it and configured a software RAID with Partions for SWAP (10GB), System (100GB, ext4) und DATA (890GB, ext4)

Esel (glumpad) wrote :

If any additional information needed please feel free to ask me about it!

Changed in linux (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
Esel (glumpad) wrote :

Any solution in sight? Otherwise I'd have to change distro and I don't want to.

Marco Nenciarini (mnencia) wrote :

Setting the "noautogroup" option at boot time normally solves the issue. Doesn't it work for you?

P.S. The autogroup feature is pretty useless on a server, so it's safe to disable it.

Chris J Arges (arges) on 2014-12-01
tags: added: cts kernel-key
Chris J Arges (arges) wrote :

In order to better debug firmware issues can those affected by this issue do the following:

1) sudo apt-get install fwts
2) sudo fwts
3) append 'results.log' to this bug

If that doesn't work you can also just produce an acpidump using the following:

1) sudo acpidump > acpidump.log
2) append 'acpidump.log' to this bug

Thanks!

Rafael David Tinoco (inaddy) wrote :

For Proliant Servers and X2APIC observations please follow discussion in:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1398497

Thank you

Rafael David Tinoco (inaddy) wrote :

Whenever facing NMIs on Proliant Servers check NMI code under ILO:

Translate the code:

00h (0x00000000) No source found
01h (0x00000001) Uncorrectable Memory Error
1Bh (0x0000001B) ASR NMI
20h (0x00000020) PCI Parity Error
27h (0x00000027) NMI Button Press
28h (0x00000028) SB_BUS_NMI
29h (0x00000029) ILO Doorbell NMI
2Ah (0x0000002A) ILO IOP NMI
2Bh (0x0000002B) ILO Watchdog NMI
2Ch (0x0000002C) Proc Throt NMI
2Dh (0x0000002D) Front Side Bus NMI
2Fh (0x0000002F) PCI Express Error
30h (0x00000030) DMA controller NMI
31h (0x00000031) Hypertransport/CSI Error

If you are getting something like:

"76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000)"

You are facing a ILO Watchdog NMI meaning that you triggered the ILO watchdog countdown and it has not been updated for sometime.

HPWDT triggers the ILO Watchdog countdown whenever /dev/watchdog is opened (like corosync/pacemer do, for example) and ILO will send NMIs after the watchdog has zerod (not updating ILO timer properly, for example).

Workaround (other than using the HP-ASRD daemon that frequently updates the counter) is to blacklist hpwdt module:

# echo "blacklist hpwdt" >> /etc/modprobe.d/blacklist-hp.conf
# update-initramfs -k all -u
# upgrade-grub
# reboot

Give feedback please.

Rafael David Tinoco (inaddy) wrote :

Check case:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837

If you ever face NMIs on Proliant Servers for no apparent reason.

Maxim Doucet (maximdoucet) wrote :

Wow.

I just spent the last 6 hours trying to solve this problem on a supermicro machine (mother board is Supermicro X9SRL-F http://www.supermicro.com/products/motherboard/Xeon/C600/X9SRL-F.cfm) before stumbling upon this ticket. Now you know that not only HP machines are concerned.

Adding the "noautogroup" by editing GRUB at boot time solved the problem for me.

The system is ubuntu 14.04.2 with a 3.13.0-53 though I experienced the issue with previous 3.13.0 kernels from the original installation of the server back in September 2014. Note that even if I am on 14.04.2, I don't use the newest LTS kernel stack (https://wiki.ubuntu.com/Kernel/LTSEnablementStack) from 14.04.2 (it means that I installed from a 14.04.1 media, the mini.iso to be exact, then upgraded).

I fixed the issue permanently with the following workaround:
- modify the following line in "/etc/default/grub" to add "noautogroup":

  GRUB_CMDLINE_LINUX_DEFAULT="splash quiet noautogroup"

- launch "sudo update-grub" to apply the configuration

I thought I was going crazy: during those 6 hours trying to fix the bug, I eventually ended up doing a fresh install (with the mini.iso though, which means that I got all last upgrades, including newer 3.13 kernel), but the problem was still there.

I seriously hope this will be fixed!

Best regards,
Maxim
CIO @ Fix Studio

Rafael David Tinoco (inaddy) wrote :

Maxim,

Good to know you are able to reproduce this behaviour and that you found a workaround. When you say "noautogroup" fixed the "problem".. was you problem kernel panics due to NMIs ? Could you share stack trace and/or kernel panic output so I can take a look ? Are you using Intel 26xx v2 CPU series ? Do you mind providing me a sosreport ? Can you check if this behaviour happens with 3.16 and/or 3.19 kernel also ?

Thank you

Rafael

Rafael David Tinoco (inaddy) wrote :

Maxim,

Also, if possible, could you follow "http://www.inaddy.org/mini-howtos/dumps/using-ubuntu-crash-dump-with-kdump" instructions, enabling kdump, and send me the core dump from /var/crash ? It looks like there are 2 bugs being commented on this case.

Thank you

Rafael David Tinoco (inaddy) wrote :
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Invalid
tags: removed: kernel-key
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers