[P9,Power NV][WSP][Ubuntu 1804] : "Kernel access of bad area " when grouping different pmu events using perf fuzzer . (perf:)

Bug #1746225 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Critical
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Critical
Joseph Salisbury
Artful
Fix Released
Critical
Joseph Salisbury
Bionic
Fix Released
Critical
Joseph Salisbury

Bug Description

== SRU Justification ==
Due to this bug, perf fuzzer resulted in crash and system goes for a reboot
and results in a call trace shown in the bug. It is due to grouping of
different PMU events, which is fixed by mainline commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e1

Commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e11 is in mailine as of v4.15-rc5.

== Fix ==
commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e11
Author: Ravi Bangoria <email address hidden>
Date: Thu Nov 30 14:03:22 2017 +0530

    powerpc/perf: Fix oops when grouping different pmu events

== Regression Potential ==
Low. This fix is specific to powerpc.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

==Original Bug Description==
== Comment: #0 - Shriya R. Kulkarni <email address hidden> - 2018-01-30 03:24:47 ==
Problem Description :
==============
Perf fuzzer resulted in crash and system goes for reboot and the call trace is shown below . It is due to grouping of different PMU events.

Machine details :
==========
OS : Ubuntu 1804
uname -r : 4.13.0-25-generic
system : Witherspoon + DD2.1
perf -v : perf version 4.13.13

ltc-wspoon12 login: [78592.995848] Unable to handle kernel paging request for instruction fetch
[78592.995914] Faulting instruction address: 0x00000000
[78592.995950] Oops: Kernel access of bad area, sig: 11 [#1]
[78592.995982] SMP NR_CPUS=2048
[78592.995985] NUMA
[78592.996011] PowerNV
[78592.996045] Modules linked in: vmx_crypto idt_89hpesx crct10dif_vpmsum at24 ofpart uio_pdrv_genirq uio cmdlinepart powernv_flash mtd ibmpowernv opal_prd ipmi_powernv ipmi_devintf ipmi_msghandler sch_fq_codel ip_tables x_tables autofs4 nouveau lpfc ast i2c_algo_bit crc32c_vpmsum ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm mlx5_core nvmet_fc nvmet tg3 nvme_fc nvme_fabrics ahci nvme_core libahci mlxfw devlink scsi_transport_fc
[78592.996367] CPU: 69 PID: 6010 Comm: perf_fuzzer Tainted: G W 4.13.0-25-generic #29-Ubuntu
[78592.996422] task: c000003f77b5b500 task.stack: c000003d0b0c8000
[78592.996462] NIP: 0000000000000000 LR: c0000000000e9b1c CTR: 0000000000000000
[78592.996509] REGS: c000003d0b0cb780 TRAP: 0400 Tainted: G W (4.13.0-25-generic)
[78592.996562] MSR: 9000000040009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[78592.996588] CR: 48002874 XER: 00000000
[78592.996642] CFAR: c0000000000e9b18 SOFTE: 1
[78592.996642] GPR00: c0000000000eb128 c000003d0b0cba00 c0000000015f6200 0000000000000000
[78592.996642] GPR04: c000003d0b0cbba0 c000003d0b0cbc20 0000000000000002 c000000001596b10
[78592.996642] GPR08: 0000000000000002 0000000000000000 c000000001596b10 c000003fecad0028
[78592.996642] GPR12: 0000000000000000 c000000007a8d480 0000000000000000 0000000000000000
[78592.996642] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[78592.996642] GPR20: 0000000000000001 c000003d0b0cbc1c c000003d0b0cbc24 c000003d0b0cbb98
[78592.996642] GPR24: c000003d0b0cbba0 c000003d0b0cbc20 0000000000001555 c000003fefeb4ea0
[78592.996642] GPR28: c000003d0b0cbc20 0000000000000002 0000000000003000 c000003fefeb5190
[78592.997170] NIP [0000000000000000] (null)
[78592.997208] LR [c0000000000e9b1c] power_check_constraints+0x13c/0x5a0
[78592.997247] Call Trace:
[78592.997267] [c000003d0b0cba00] [c000003d0b0cbaa0] 0xc000003d0b0cbaa0 (unreliable)
[78592.997321] [c000003d0b0cbb80] [c0000000000eb128] power_pmu_event_init+0x298/0x6a0
[78592.997373] [c000003d0b0cbc70] [c00000000029e6b4] perf_try_init_event+0xd4/0x120
[78592.997424] [c000003d0b0cbcb0] [c0000000002a1038] perf_event_alloc.part.23+0x7b8/0xb90
[78592.997475] [c000003d0b0cbd30] [c0000000002aa0dc] SyS_perf_event_open+0x69c/0xfa0
[78592.997527] [c000003d0b0cbe30] [c00000000000b184] system_call+0x58/0x6c
[78592.997568] Instruction dump:
[78592.997597] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[78592.997664] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[78592.997733] ---[ end trace 57fb7542c4083583 ]---
[78594.008780]
[78594.008932] Sending IP[78773.335857584,5] OPAL: Switch to big-endian OS
I to other CPUs
[78594.01029

Steps to reproduce :
============

#! /bin/bash
set -x
git clone https://github.com/deater/perf_event_tests.git
cd perf_event_tests/include
mkdir asm
cd asm
wget http://9.114.13.132/repo/shriya/perf_regs.h
cd ../../lib
make
sleep 10
cd ../fuzzer
make
sleep 10

echo 0 > /proc/sys/kernel/nmi_watchdog
echo 2 > /proc/sys/kernel/perf_event_paranoid
echo 100000 > /proc/sys/kernel/perf_event_max_sample_rate
./perf_fuzzer -r 1492143527

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-164107 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
tags: added: triage-g
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

tags: added: kernel-da-key
Changed in kernel-package (Ubuntu):
importance: Undecided → Critical
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-01-31 02:45 EDT-------
(In reply to comment #5)
> Did this issue start happening after an update/upgrade? Was there a prior
> kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer to
> https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15
> kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following tag
> 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

This patch should fix the issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5aa04b3eb6fca63d2e9827be656dcadc26d54e11

Please cherry pick

Changed in linux (Ubuntu):
status: New → In Progress
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Joseph Salisbury (jsalisbury)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 5aa04b3eb6fca. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746225

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-02 02:08 EDT-------
(In reply to comment #7)
> I built a test kernel with commit 5aa04b3eb6fca. The test kernel can be
> downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1746225
>
> Can you test this kernel and see if it resolves this bug?
>
> Note, to test this kernel, you need to install both the linux-image and
> linux-image-extra .deb packages.
>
> Thanks in advance!

Issue is resolved.

Verified the kernel provided in http://kernel.ubuntu.com/~jsalisbury/lp1746225 fixes the issue.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Seth Forshee (sforshee)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-12 07:37 EDT-------
Any update on this ?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-13 04:06 EDT-------
Hi ,
Any updates on this defect ? On which build the fix will be available ?

Revision history for this message
Frank Heimes (fheimes) wrote :

According to comment #5 the SRU request was already submitted and is in progress.
So it usually will go out with the next SRU cycle, but that might be held up a bit due to Spectre/Meltdown activities.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-22 03:43 EDT-------
Hi ,
Can you please let us know in which build we are expecting the fix ?

Thanks

Revision history for this message
Manoj Iyer (manjo) wrote :

This should be available in Bionic now. The patch was applied to bionic earlier this month. And, in Artful in a couple of weeks.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-27 00:25 EDT-------
Hi ,
Unable to hit the above call trace , hence the issue is resolved.

Verified on kernel :

root@ltc-wcwsp1:~/shriya/kernel# uname -a
Linux ltc-wcwsp1 4.15.0-10-generic #11 SMP Fri Feb 23 01:59:38 EST 2018 ppc64le ppc64le ppc64le GNU/Linux

Perf fuzzer :
=======
+ ./perf_fuzzer -r 1492143527
Using user-specified random seed of 1492143527

*** perf_fuzzer 0.32-rc0 *** by Vince Weaver

Linux version 4.15.0-10-generic ppc64le
Processor: ppc64le UNKNOWN

Watchdog enabled with timeout 60s
Will auto-exit if signal storm detected
Seeding RNG with supplied seed 1492143527

To reproduce, try:
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 2 > /proc/sys/kernel/perf_event_paranoid
echo 100000 > /proc/sys/kernel/perf_event_max_sample_rate
./perf_fuzzer -r 1492143527

Fuzzing the following syscalls: mmap perf_event_open close read write ioctl fork prctl poll
Also attempting the following: signal-handler-on-overflow busy-instruction-loop accessing-perf-proc-and-sys-files trashing-the-mmap-page

Pid=4506, sleeping 1s

==================================================
Starting fuzzing at 2018-02-26 21:36:45
==================================================
Cannot open /sys/kernel/tracing/kprobe_events
Iteration 10000, 98728 syscalls in 5.32 s (18.554 k syscalls/s)
Open attempts: 90726 Successful: 917 Currently open: 54
ENOENT : 446
E2BIG : 7816
EBADF : 8522
EINVAL : 72718
ENOSPC : 81
EOVERFLOW : 2
EOPNOTSUPP : 224
Trinity Type (Normal 345/22780)(Sampling 28/22630)(Global 488/22586)(Random 56/22730)
Type (Hardware 246/12690)(software 362/11897)(tracepoint 58/11874)(Cache 55/11258)(cpu 136/11638)(breakpoint 33/11690)(nest_capp0_imc 1/357)(nest_capp1_imc 1/365)(core_imc 0/460)(nest_mba0_imc 1/378)(nest_mba1_imc 0/352)(nest_mba2_imc 0/362)(nest_mba3_imc 1/371)(nest_mba4_imc 1/357)(nest_mba5_imc 0/320)(nest_mba6_imc 1/358)(nest_mba7_imc 0/439)(nest_mcs01_imc 0/359)(nest_mcs23_imc 2/361)(nest_nvlink0_imc 19/14840)
Close: 863/863 Successful
Read: 824/912 Successful
Write: 0/855 Successful
Ioctl: 371/924 Successful: (ENABLE 91/91)(DISABLE 81/81)(REFRESH 9/78)(RESET 84/84)(PERIOD 2/75)(SET_OUTPUT 8/80)(SET_FILTER 0/92)(ID 87/87)(SET_BPF 0/91)(PAUSE_OUTPUT 9/79)(#10 0/0)(#11 0/0)(#12 0/0)(#13 0/0)(#14 0/0)(>14 0/86)
Mmap: 662/1060 Successful: (MMAP 662/1060)(TRASH 144/156)(READ 133/143)(UNMAP 653/988)(AUX 0/169)(AUX_READ 0/0)
Prctl: 929/929 Successful
Fork: 471/471 Successful
Poll: 895/921 Successful
Access: 0/0 Successful
Overflows: 0 Recursive: 0
SIGIOs due to RT signal queue full: 0
Memset of mmap#18 at (nil) caused segfault!
MAP_SHARED MAP_DENYWRITE MAP_FIXED MAP_NONBLOCK MAP_STACK PROT_READ PROT_WRITE size 27661 fd 47
event: cpu=4 pid=-1 group_fd=-1 flags=9
memset(&pe[47],0,sizeof(struct perf_event_attr));
pe[47].type=PERF_TYPE_SOFTWARE;
pe[47].size=64;
pe[47].config=PERF_COUNT_SW_PAGE_FAULTS;
pe[47].sample_type=0; /* 0 */
pe[47].read_format=PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_GROUP; /* a */
pe[47].disabled=1;
pe[47].inherit=1;
pe[47].pinned=1;
pe[47].precise_ip=0; /* arbitrary skid */
pe[47].wakeup_events=0;
pe[47].bp_type=HW_BREAKPOINT_EMPTY;

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-19 07:13 EDT-------
Bug was raised Bionic and tester had verified the issue. Please close.

Manoj Iyer (manjo)
tags: added: verification-done-artful
removed: verification-needed-artful
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.9 KiB)

This bug was fixed in the package linux - 4.13.0-38.43

---------------
linux (4.13.0-38.43) artful; urgency=medium

  * linux: 4.13.0-38.43 -proposed tracker (LP: #1755762)

  * Servers going OOM after updating kernel from 4.10 to 4.13 (LP: #1748408)
    - i40e: Fix memory leak related filter programming status
    - i40e: Add programming descriptors to cleaned_count

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * fails to dump with latest kpti fixes (LP: #1750021)
    - kdump: write correct address of mem_section into vmcoreinfo

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
    - ALSA: hda - Fix headset mic detection problem for two Dell machines
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * CIFS SMB2/SMB3 does not work for domain based DFS (LP: #1747572)
    - CIFS: make IPC a regular tcon
    - CIFS: use tcon_ipc instead of use_ipc parameter of SMB2_ioctl
    - CIFS: dump IPC tcon in debug proc file

  * i2c-thunderx: erroneous error message "unhandled state: 0" (LP: #1754076)
    - i2c: octeon: Prevent error message on bus error

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * EDAC, sb_edac: Backport 1 patch to Ubuntu 17.10 (Fix missing DIMM sysfs
    entries with KNL SNC2/SNC4 mode) (LP: #1743856)
    - EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode

  * [regression] Colour banding and artefacts appear system-wide on an Asus
    Zenbook UX303LA with Intel HD 4400 graphics (LP: #1749420)
    - drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA

  * DVB Card with SAA7146 chipset not working (LP: #1742316)
    - vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * support thunderx2 vendor pmu events (LP: #1747523)
    - perf pmu: Extract function to get JSON alias map
    - perf pmu: Pass pmu as a parameter to get_cpuid_str()
    - perf tools arm64: Add support for get_cpuid_str function.
    - perf pmu: Add helper function is_pmu_core to detect PMU CORE devices
    - perf vendor events arm64: Add ThunderX2 implementation defined pmu core
      events
    - perf pmu: Add check for valid cpuid in perf_pmu__find_map()

  * lpfc.ko module doesn't work (LP: #1746970)
    - scsi: lpfc: Fix loop mode target discovery

  * Ubuntu 17.10 crashes on vmalloc.c (LP: #1739498)
    - powerpc/mm/book3s64: Make KERN_IO_START a variable
    - powerpc/mm/slb: Move comment next to the code it's referring to
    - powerpc/mm/hash64: Make vmalloc 56T on hash

  * ethtool -p fails to light NIC LED on HiSilicon D05 systems (LP: #1748567)
    - net...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.