Trusty + Intel E5-26xx + NMI handler (perf_event_nmi_handler) took too long to run

Bug #1416414 reported by Rafael David Tinoco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Rafael David Tinoco

Bug Description

It was brought to my attention the following case:

Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013
Kernel: 3.13.0-34

Stack trace:

2189823.168958] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 882.406 msecs
[2189823.168974] Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.

[2189823.184283] CPU: 0 PID: 60396 Comm: ceph-osd Not tainted 3.13.0-34-generic #60-Ubuntu
[2189823.194371] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013
[2189823.202794] 0007c7a1f01b74a3 ffff88081fa06dd0 ffffffff8171bd94 ffffffffa01672d8
[2189823.212421] ffff88081fa06e48 ffffffff81714f95 0000000000000008 ffff88081fa06e58
[2189823.221889] ffff88081fa06df8 ffffffff81c1c4c0 ffffc90006278072 0000000000000001
[2189823.231361] Call Trace:
[2189823.234597] <NMI> [<ffffffff8171bd94>] dump_stack+0x45/0x56
[2189823.241996] [<ffffffff81714f95>] panic+0xc8/0x1d7
[2189823.248152] [<ffffffffa01668fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[2189823.256251] [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10
[2189823.263054] [<ffffffff81725448>] nmi_handle.isra.3+0x88/0x180
[2189823.270500] [<ffffffff817256fd>] do_nmi+0x1bd/0x340
[2189823.276867] [<ffffffff817248b1>] end_repeat_nmi+0x1e/0x2e
[2189823.283888] [<ffffffff810d7bf0>] ? futex_wait_queue_me+0x140/0x140
[2189823.291874] [<ffffffff810d7bf0>] ? futex_wait_queue_me+0x140/0x140
[2189823.299966] [<ffffffff810d7bf0>] ? futex_wait_queue_me+0x140/0x140

Tags: cts
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (3.6 KiB)

After analyzing the incomplete dump we got, for this particular case, and analyzing kernel code changes and Intel firmware erratas, and talking with HP ROM engineers (providing them errata also) we believe that, for this stack trace, we have triggered the following microcode problem:

###

http://www.intel.com.br/content/dam/www/public/us/en/documents/specification-updates/xeon-e7-v2-spec-update.pdf (Intel® Xeon® Processor E7 v2 Product Family Specification Update January 2015)

CF140 Performance Monitoring IA32_PERF_GLOBAL_STATUS.CondChgd Bit Not Cleared by Reset

Problem: The IA32_PERF_GLOBAL_STATUS MSR (38EH) should be cleared by reset. Due to this erratum, CondChgd (bit 63) of the IA32_PERF_GLOBAL_STATUS MSR may not be cleared.

Implication: When this erratum occurs, performance monitoring software may behave unexpectedly.
Workaround: It is possible for the BIOS to contain a workaround for this erratum. --> HP is probably working on this.

###

*believe because we can't check the PMU registers from the core dump we got, but everything points in that direction

This means that in x86 Linux the NMI (Non Maskable Interrupts) watchdog (hard-lockup_detector) uses PMU (Performance Counters) registers to signal who was responsible to generate the NMI.

Obs: Our intention when talking to HP was to make sure their power management firmware was not touching those registers (and they said they only read registers and there is no such thing as a "clear" after read when reads are made by firmware).

The NMI handler (kernel function responsible to handle NMIs) identifies who was responsible for the NMI by looking into PMU registers. Intel microcode does not clear BIT 63 (CondChgd) when the CPU is reset and it makes the NMI handler to misbehave (trying to handle NMIs that should not be handled by this particular kernel code).

This was seen recently by a kernel developer in the following commit:

commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f

And in Intel errata document (above).

This following commit is applied in Trusty kernel from version 3.13.0-35 up to the latest one:

inaddy@workstation:/kernel/ubuntu-trusty$ git tag --contains=ffb4bbaa2bf1ad9d79cf4d62d625499a7271f88e
Ubuntu-3.13.0-35.61
...
Ubuntu-3.13.0-45.74

User was using kernel 3.13.0-34 and it does not contain such fix.

STEP 1) To upgrade all HP Proliant Servers to latest Ubuntu Trusty kernel version.

STEP 2)

Together with HP we concluded that, for now, the best for the HP Proliant Servers is to have the following cmdline:

" ... intremap=no_x2apic_optout intel_idle.max_cstate=0 nmi_watchdog=0 ..."

intremap=no_x2apic_optout -> tells the OS that despite firmware asking for the kernel to opt out in using x2apic... it can use (Gen8 and beyond support that feature and have the advantages from x2apic (over xapic) such as supporting more CPUs and IRQ remapping).

intel_idle.max_cstate=0 -> tells the OS to disable intel_idle module and activate acpi_idle module. (HP uses ACPI heavily for their firmware power management features and intel_idle might put CPUs in a deeper state than the firmware would like it to be, causing bigger latencies and NMIs)

nmi_watchdog=0 -> tells the OS to use HP w...

Read more...

Changed in linux (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
status: New → Fix Released
tags: added: cts
Changed in linux (Ubuntu):
status: Fix Released → In Progress
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

We may consider this as "Fix Released" for trusty since Ubuntu-3.13.0-35.61.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

It affects E5 V2 also:

http://www.intel.com.br/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v2-spec-update.pdf

CA156 Performance Monitoring IA32_PERF_GLOBAL_STATUS.CondChgd Bit
Not Cleared by Reset
Problem: The IA32_PERF_GLOBAL_STATUS MSR (38EH) should be cleared by reset. Due to this
erratum, CondChgd (bit 63) of the IA32_PERF_GLOBAL_STATUS MSR may not be
cleared.
Implication: When this erratum occurs, performance monitoring software may behave
unexpectedly.
Workaround: It is possible for the BIOS to contain a workaround for this erratum.
Status: For the affected steppings, see the Summary Tables of Changes.

Precise 3.2 might need to cherry-pick the fix:

commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f

Checking...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Author: HATAYAMA Daisuke <email address hidden>
Date: Wed Jun 25 10:09:07 2014 +0900

    perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

    BugLink: http://bugs.launchpad.net/bugs/1355293

    commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream.

From Ubuntu-3.2.0-68.102 to latest.

Anyone using 3.2.0-68 and newer has the workaround already.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Please check case:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1417580

For more information regarding HP Proliant Servers peculiarities/known problems/solutions.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.