2015-03-16 21:05:25 |
Rafael David Tinoco |
bug |
|
|
added bug |
2015-03-16 21:05:33 |
Rafael David Tinoco |
tags |
|
cts |
|
2015-03-16 21:06:21 |
Rafael David Tinoco |
linux (Ubuntu): assignee |
|
Rafael David Tinoco (inaddy) |
|
2015-03-16 21:06:24 |
Rafael David Tinoco |
linux (Ubuntu): assignee |
Rafael David Tinoco (inaddy) |
|
|
2015-03-16 21:07:07 |
Rafael David Tinoco |
linux (Ubuntu): status |
New |
Incomplete |
|
2015-03-16 21:07:09 |
Rafael David Tinoco |
linux (Ubuntu): status |
Incomplete |
Confirmed |
|
2015-03-16 21:11:41 |
Rafael David Tinoco |
summary |
HP Proliant Servers should not have HPWDT module loaded automatically |
HP Proliant Servers - Kernel Panic NMI - DL360 & DL380 - HPWDT module loaded |
|
2015-03-16 21:11:49 |
Rafael David Tinoco |
summary |
HP Proliant Servers - Kernel Panic NMI - DL360 & DL380 - HPWDT module loaded |
HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT module loaded |
|
2015-03-16 21:14:56 |
Rafael David Tinoco |
description |
It was brought to me several situations where users where facing kernel panics when machine was apparently idling:
Examples:
PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
#1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
#2 [ffff88085fc05da0] panic at ffffffff8175b3f2
#3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
#4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
#5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
#6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
#7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
#8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
[exception RIP: native_safe_halt+6]
RIP: ffffffff81055186 RSP: ffffffff81c03e90 RFLAGS: 00000246
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RDX: ffffffff81c03e90 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff81055186 R8: ffffffff81055186 R9: 0000000000000018
R10: ffffffff81c03e90 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
#10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
#11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
#12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
#13 [ffffffff81c03f40] rest_init at ffffffff81751a37
#14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
#15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
#16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733
OR
PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
#1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
#2 [ffff880fffa07d80] panic at ffffffff81730335
#3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
#4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
#5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
#6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
#7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
[exception RIP: intel_idle+204]
RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <NMI exception stack> ---
#8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
#9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf
It turned out that after investigating all idling situations and diverse kernel dump files - where we had most of the CPUs either MWAITing and or "relaxing", we discovered that HPWDT was loaded and corosync was opening /dev/watchdog file, triggering the ILO watchdog timer and not updating frequently enough as ILO expected.
As described in /etc/modprobe.d/blacklist-watchdog.conf:
"""
# Watchdog drivers should not be loaded automatically, but only if a
# watchdog daemon is installed.
"""
We should blacklist module "hpwdt" by default for all Ubuntu versions. |
It was brought to me several situations where users where facing kernel panics when machine was apparently idling (for some HP Proliant Servers like DL 360, DL 380).
Examples:
PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
#1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
#2 [ffff88085fc05da0] panic at ffffffff8175b3f2
#3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
#4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
#5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
#6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
#7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
#8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
[exception RIP: native_safe_halt+6]
RIP: ffffffff81055186 RSP: ffffffff81c03e90 RFLAGS: 00000246
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RDX: ffffffff81c03e90 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff81055186 R8: ffffffff81055186 R9: 0000000000000018
R10: ffffffff81c03e90 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
#10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
#11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
#12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
#13 [ffffffff81c03f40] rest_init at ffffffff81751a37
#14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
#15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
#16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733
OR
PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
#1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
#2 [ffff880fffa07d80] panic at ffffffff81730335
#3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
#4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
#5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
#6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
#7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
[exception RIP: intel_idle+204]
RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <NMI exception stack> ---
#8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
#9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf
It turned out that after investigating all idling situations and diverse kernel dump files - where we had most of the CPUs either MWAITing and or "relaxing", we discovered that HPWDT was loaded and corosync was opening /dev/watchdog file, triggering the ILO watchdog timer and not updating frequently enough as ILO expected.
As described in /etc/modprobe.d/blacklist-watchdog.conf:
"""
# Watchdog drivers should not be loaded automatically, but only if a
# watchdog daemon is installed.
"""
We should blacklist module "hpwdt" by default for all Ubuntu versions. |
|
2015-03-16 21:16:54 |
Rafael David Tinoco |
description |
It was brought to me several situations where users where facing kernel panics when machine was apparently idling (for some HP Proliant Servers like DL 360, DL 380).
Examples:
PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
#1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
#2 [ffff88085fc05da0] panic at ffffffff8175b3f2
#3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
#4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
#5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
#6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
#7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
#8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
[exception RIP: native_safe_halt+6]
RIP: ffffffff81055186 RSP: ffffffff81c03e90 RFLAGS: 00000246
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RDX: ffffffff81c03e90 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff81055186 R8: ffffffff81055186 R9: 0000000000000018
R10: ffffffff81c03e90 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
#10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
#11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
#12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
#13 [ffffffff81c03f40] rest_init at ffffffff81751a37
#14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
#15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
#16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733
OR
PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
#1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
#2 [ffff880fffa07d80] panic at ffffffff81730335
#3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
#4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
#5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
#6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
#7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
[exception RIP: intel_idle+204]
RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <NMI exception stack> ---
#8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
#9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf
It turned out that after investigating all idling situations and diverse kernel dump files - where we had most of the CPUs either MWAITing and or "relaxing", we discovered that HPWDT was loaded and corosync was opening /dev/watchdog file, triggering the ILO watchdog timer and not updating frequently enough as ILO expected.
As described in /etc/modprobe.d/blacklist-watchdog.conf:
"""
# Watchdog drivers should not be loaded automatically, but only if a
# watchdog daemon is installed.
"""
We should blacklist module "hpwdt" by default for all Ubuntu versions. |
It was brought to me several situations where users where facing kernel panics when machine was apparently idling (for some HP Proliant Servers like DL 360, DL 380).
ILO:
"76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000)"
Examples:
PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
#1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
#2 [ffff88085fc05da0] panic at ffffffff8175b3f2
#3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
#4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
#5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
#6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
#7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
#8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
[exception RIP: native_safe_halt+6]
RIP: ffffffff81055186 RSP: ffffffff81c03e90 RFLAGS: 00000246
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RDX: ffffffff81c03e90 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff81055186 R8: ffffffff81055186 R9: 0000000000000018
R10: ffffffff81c03e90 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
#10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
#11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
#12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
#13 [ffffffff81c03f40] rest_init at ffffffff81751a37
#14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
#15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
#16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733
OR
PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
#1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
#2 [ffff880fffa07d80] panic at ffffffff81730335
#3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
#4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
#5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
#6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
#7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
[exception RIP: intel_idle+204]
RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <NMI exception stack> ---
#8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
#9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf
It turned out that after investigating all idling situations and diverse kernel dump files - where we had most of the CPUs either MWAITing and or "relaxing", we discovered that HPWDT was loaded and corosync was opening /dev/watchdog file, triggering the ILO watchdog timer and not updating frequently enough as ILO expected.
As described in /etc/modprobe.d/blacklist-watchdog.conf:
"""
# Watchdog drivers should not be loaded automatically, but only if a
# watchdog daemon is installed.
"""
We should blacklist module "hpwdt" by default for all Ubuntu versions. |
|
2015-03-17 11:06:17 |
Andy Whitcroft |
linux (Ubuntu): status |
Confirmed |
In Progress |
|
2015-03-17 11:06:19 |
Andy Whitcroft |
linux (Ubuntu): importance |
Undecided |
High |
|
2015-03-17 11:06:21 |
Andy Whitcroft |
linux (Ubuntu): assignee |
|
Andy Whitcroft (apw) |
|
2015-03-17 11:06:23 |
Andy Whitcroft |
linux (Ubuntu): milestone |
|
ubuntu-15.03 |
|
2015-03-17 11:09:45 |
Adam Conrad |
nominated for series |
|
Ubuntu Utopic |
|
2015-03-17 11:09:45 |
Adam Conrad |
bug task added |
|
linux (Ubuntu Utopic) |
|
2015-03-17 11:09:45 |
Adam Conrad |
nominated for series |
|
Ubuntu Trusty |
|
2015-03-17 11:09:45 |
Adam Conrad |
bug task added |
|
linux (Ubuntu Trusty) |
|
2015-03-17 11:09:45 |
Adam Conrad |
nominated for series |
|
Ubuntu Precise |
|
2015-03-17 11:09:45 |
Adam Conrad |
bug task added |
|
linux (Ubuntu Precise) |
|
2015-03-17 12:16:22 |
Andy Whitcroft |
linux (Ubuntu Precise): status |
New |
In Progress |
|
2015-03-17 12:16:24 |
Andy Whitcroft |
linux (Ubuntu Trusty): status |
New |
In Progress |
|
2015-03-17 12:16:26 |
Andy Whitcroft |
linux (Ubuntu Utopic): status |
New |
In Progress |
|
2015-03-17 12:16:29 |
Andy Whitcroft |
linux (Ubuntu Precise): importance |
Undecided |
High |
|
2015-03-17 12:16:31 |
Andy Whitcroft |
linux (Ubuntu Trusty): importance |
Undecided |
High |
|
2015-03-17 12:16:33 |
Andy Whitcroft |
linux (Ubuntu Utopic): importance |
Undecided |
High |
|
2015-03-17 12:16:36 |
Andy Whitcroft |
linux (Ubuntu Precise): assignee |
|
Andy Whitcroft (apw) |
|
2015-03-17 12:16:38 |
Andy Whitcroft |
linux (Ubuntu Trusty): assignee |
|
Andy Whitcroft (apw) |
|
2015-03-17 12:16:40 |
Andy Whitcroft |
linux (Ubuntu Utopic): assignee |
|
Andy Whitcroft (apw) |
|
2015-03-17 12:18:32 |
Andy Whitcroft |
linux (Ubuntu): status |
In Progress |
Fix Committed |
|
2015-03-17 15:54:01 |
Micheal Waltz |
bug |
|
|
added subscriber Micheal Waltz |
2015-03-18 13:42:33 |
Brad Figg |
linux (Ubuntu Utopic): status |
In Progress |
Fix Committed |
|
2015-03-18 13:42:37 |
Brad Figg |
linux (Ubuntu Trusty): status |
In Progress |
Fix Committed |
|
2015-03-18 13:42:40 |
Brad Figg |
linux (Ubuntu Precise): status |
In Progress |
Fix Committed |
|
2015-03-24 02:28:01 |
Launchpad Janitor |
linux (Ubuntu): status |
Fix Committed |
Fix Released |
|
2015-03-26 03:49:45 |
Dave Leaver |
bug |
|
|
added subscriber Dave Leaver |
2015-03-26 17:39:40 |
Brad Figg |
tags |
cts |
cts verification-needed-precise |
|
2015-03-26 17:39:54 |
Brad Figg |
tags |
cts verification-needed-precise |
cts verification-needed-precise verification-needed-trusty |
|
2015-03-26 17:40:29 |
Brad Figg |
tags |
cts verification-needed-precise verification-needed-trusty |
cts verification-needed-precise verification-needed-trusty verification-needed-utopic |
|
2015-04-07 15:18:55 |
Rafael David Tinoco |
tags |
cts verification-needed-precise verification-needed-trusty verification-needed-utopic |
cts verification-done |
|
2015-04-08 15:40:29 |
Launchpad Janitor |
linux (Ubuntu Trusty): status |
Fix Committed |
Fix Released |
|
2015-04-08 15:40:29 |
Launchpad Janitor |
cve linked |
|
2015-1421 |
|
2015-04-08 15:40:29 |
Launchpad Janitor |
cve linked |
|
2015-1465 |
|
2015-04-08 15:40:29 |
Launchpad Janitor |
cve linked |
|
2015-1593 |
|
2015-04-08 15:40:29 |
Launchpad Janitor |
cve linked |
|
2015-2041 |
|
2015-04-08 15:40:29 |
Launchpad Janitor |
cve linked |
|
2015-2042 |
|
2015-04-08 16:13:01 |
Launchpad Janitor |
linux (Ubuntu Precise): status |
Fix Committed |
Fix Released |
|
2015-04-09 03:41:06 |
Launchpad Janitor |
linux (Ubuntu Utopic): status |
Fix Committed |
Fix Released |
|
2015-05-04 14:40:09 |
Rafael David Tinoco |
marked as duplicate |
|
1417580 |
|