[Bug]spurious PEBS NMI triggered by non-precise events

Bug #1559901 reported by XiongZhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

The following perf command can easily trigger the pebs warning or spurious NMI error on Skylake/Broadwell/Haswell platforms.
sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a
Also NMI watchdog must be enabled to reproduce this issue.
Here is the dump.
[ 113.452176] Call Trace:
[ 113.452178] <NMI> [<ffffffff813c3a2e>] dump_stack+0x63/0x85
[ 113.452188] [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
[ 113.452190] [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
[ 113.452193] [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
[ 113.452197] [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
[ 113.452204] [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
[ 113.452208] [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
[ 113.452211] [<ffffffff811f2f11>] ? unmap_kernel_range_noflush+0x11/0x20
[ 113.452216] [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
[ 113.452218] [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
[ 113.452224] [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
[ 113.452230] [<ffffffff810310b9>] nmi_handle+0x69/0x120
[ 113.452233] [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
[ 113.452236] [<ffffffff810317f2>] do_nmi+0xe2/0x130
[ 113.452240] [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
[ 113.452243] [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
[ 113.452246] [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
[ 113.452249] [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
[ 113.452250] <<EOE>> <IRQ> [<ffffffff81006df8>] ? x86_perf_event_set_period+0xd8/0x180
[ 113.452255] [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
[ 113.452258] [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
[ 113.452263] [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
[ 113.452267] [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
[ 113.452270] [<ffffffff8119c970>] ? __perf_install_in_context+0xc0/0xc0
[ 113.452273] [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
[ 113.452276] [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
[ 113.452278] [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0
[ 113.452283] [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
[ 113.452286] [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
[ 113.452290] [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
[ 113.452291] <EOI> [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0
[ 113.452298] [<ffffffff81123de5>] ? smp_call_function_single+0xd5/0x130
[ 113.452300] [<ffffffff81123ddb>] ? smp_call_function_single+0xcb/0x130
[ 113.452303] [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0
[ 113.452306] [<ffffffff8119765a>] event_function_call+0x10a/0x120
[ 113.452308] [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
[ 113.452311] [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
[ 113.452313] [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
[ 113.452315] [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
[ 113.452318] [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
[ 113.452320] [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
[ 113.452322] [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
[ 113.452326] [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
[ 113.452330] [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
[ 113.452334] [<ffffffff81036d29>] ? sched_clock+0x9/0x10
[ 113.452336] [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
[ 113.452338] [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 113.452340] --[ end trace aef202839fe9a71d ]--
[ 113.452611] Uhhuh. NMI received for unknown reason 2d on CPU 2.
[ 113.453880] Do you have a strange power saving mode enabled?

Revision history for this message
XiongZhang (xiong-y-zhang) wrote :

One commit from v4.6 fix this issue, please back port it to 16.04
c3d266c perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi

Revision history for this message
Tim Gardner (timg-tpi) wrote :

I think this will have to wait for a 16.10 HWE kernel. The number of prerequisite commits are just too many at this late stage.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The v4.8 based kernel has been uploaded to the Yakkety 16.10 archive. I'm setting this to Fix Released.

git describe --contains c3d266c
v4.6-rc1~165^2~8

commit c3d266c8a9838cc141b69548bc3b1b18808ae8c4
Author: Kan Liang <email address hidden>
Date: Thu Mar 3 18:07:28 2016 -0500

    perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi

information type: Proprietary → Public
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in intel:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.