[P9,Power NV][WSP][Ubuntu 1804] : "Kernel access of bad area " when grouping different pmu events using perf fuzzer . (perf:)

Bug #1746225 reported by bugproxy on 2018-01-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Joseph Salisbury
Artful
Critical
Joseph Salisbury
Bionic
Critical
Joseph Salisbury

Bug Description

== SRU Justification ==
Due to this bug, perf fuzzer resulted in crash and system goes for a reboot
and results in a call trace shown in the bug. It is due to grouping of
different PMU events, which is fixed by mainline commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e1

Commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e11 is in mailine as of v4.15-rc5.

== Fix ==
commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e11
Author: Ravi Bangoria <email address hidden>
Date: Thu Nov 30 14:03:22 2017 +0530

    powerpc/perf: Fix oops when grouping different pmu events

== Regression Potential ==
Low. This fix is specific to powerpc.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

==Original Bug Description==
== Comment: #0 - Shriya R. Kulkarni <email address hidden> - 2018-01-30 03:24:47 ==
Problem Description :
==============
Perf fuzzer resulted in crash and system goes for reboot and the call trace is shown below . It is due to grouping of different PMU events.

Machine details :
==========
OS : Ubuntu 1804
uname -r : 4.13.0-25-generic
system : Witherspoon + DD2.1
perf -v : perf version 4.13.13

ltc-wspoon12 login: [78592.995848] Unable to handle kernel paging request for instruction fetch
[78592.995914] Faulting instruction address: 0x00000000
[78592.995950] Oops: Kernel access of bad area, sig: 11 [#1]
[78592.995982] SMP NR_CPUS=2048
[78592.995985] NUMA
[78592.996011] PowerNV
[78592.996045] Modules linked in: vmx_crypto idt_89hpesx crct10dif_vpmsum at24 ofpart uio_pdrv_genirq uio cmdlinepart powernv_flash mtd ibmpowernv opal_prd ipmi_powernv ipmi_devintf ipmi_msghandler sch_fq_codel ip_tables x_tables autofs4 nouveau lpfc ast i2c_algo_bit crc32c_vpmsum ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm mlx5_core nvmet_fc nvmet tg3 nvme_fc nvme_fabrics ahci nvme_core libahci mlxfw devlink scsi_transport_fc
[78592.996367] CPU: 69 PID: 6010 Comm: perf_fuzzer Tainted: G W 4.13.0-25-generic #29-Ubuntu
[78592.996422] task: c000003f77b5b500 task.stack: c000003d0b0c8000
[78592.996462] NIP: 0000000000000000 LR: c0000000000e9b1c CTR: 0000000000000000
[78592.996509] REGS: c000003d0b0cb780 TRAP: 0400 Tainted: G W (4.13.0-25-generic)
[78592.996562] MSR: 9000000040009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[78592.996588] CR: 48002874 XER: 00000000
[78592.996642] CFAR: c0000000000e9b18 SOFTE: 1
[78592.996642] GPR00: c0000000000eb128 c000003d0b0cba00 c0000000015f6200 0000000000000000
[78592.996642] GPR04: c000003d0b0cbba0 c000003d0b0cbc20 0000000000000002 c000000001596b10
[78592.996642] GPR08: 0000000000000002 0000000000000000 c000000001596b10 c000003fecad0028
[78592.996642] GPR12: 0000000000000000 c000000007a8d480 0000000000000000 0000000000000000
[78592.996642] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[78592.996642] GPR20: 0000000000000001 c000003d0b0cbc1c c000003d0b0cbc24 c000003d0b0cbb98
[78592.996642] GPR24: c000003d0b0cbba0 c000003d0b0cbc20 0000000000001555 c000003fefeb4ea0
[78592.996642] GPR28: c000003d0b0cbc20 0000000000000002 0000000000003000 c000003fefeb5190
[78592.997170] NIP [0000000000000000] (null)
[78592.997208] LR [c0000000000e9b1c] power_check_constraints+0x13c/0x5a0
[78592.997247] Call Trace:
[78592.997267] [c000003d0b0cba00] [c000003d0b0cbaa0] 0xc000003d0b0cbaa0 (unreliable)
[78592.997321] [c000003d0b0cbb80] [c0000000000eb128] power_pmu_event_init+0x298/0x6a0
[78592.997373] [c000003d0b0cbc70] [c00000000029e6b4] perf_try_init_event+0xd4/0x120
[78592.997424] [c000003d0b0cbcb0] [c0000000002a1038] perf_event_alloc.part.23+0x7b8/0xb90
[78592.997475] [c000003d0b0cbd30] [c0000000002aa0dc] SyS_perf_event_open+0x69c/0xfa0
[78592.997527] [c000003d0b0cbe30] [c00000000000b184] system_call+0x58/0x6c
[78592.997568] Instruction dump:
[78592.997597] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[78592.997664] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[78592.997733] ---[ end trace 57fb7542c4083583 ]---
[78594.008780]
[78594.008932] Sending IP[78773.335857584,5] OPAL: Switch to big-endian OS
I to other CPUs
[78594.01029

Steps to reproduce :
============

#! /bin/bash
set -x
git clone https://github.com/deater/perf_event_tests.git
cd perf_event_tests/include
mkdir asm
cd asm
wget http://9.114.13.132/repo/shriya/perf_regs.h
cd ../../lib
make
sleep 10
cd ../fuzzer
make
sleep 10

echo 0 > /proc/sys/kernel/nmi_watchdog
echo 2 > /proc/sys/kernel/perf_event_paranoid
echo 100000 > /proc/sys/kernel/perf_event_max_sample_rate
./perf_fuzzer -r 1492143527

bugproxy (bugproxy) on 2018-01-30
tags: added: architecture-ppc64le bugnameltc-164107 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
tags: added: triage-g
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

tags: added: kernel-da-key
Changed in kernel-package (Ubuntu):
importance: Undecided → Critical
affects: kernel-package (Ubuntu) → linux (Ubuntu)

------- Comment From <email address hidden> 2018-01-31 02:45 EDT-------
(In reply to comment #5)
> Did this issue start happening after an update/upgrade? Was there a prior
> kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer to
> https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15
> kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following tag
> 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

This patch should fix the issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5aa04b3eb6fca63d2e9827be656dcadc26d54e11

Please cherry pick

Changed in linux (Ubuntu):
status: New → In Progress
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in ubuntu-power-systems:
status: New → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 5aa04b3eb6fca. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746225

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-02 02:08 EDT-------
(In reply to comment #7)
> I built a test kernel with commit 5aa04b3eb6fca. The test kernel can be
> downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1746225
>
> Can you test this kernel and see if it resolves this bug?
>
> Note, to test this kernel, you need to install both the linux-image and
> linux-image-extra .deb packages.
>
> Thanks in advance!

Issue is resolved.

Verified the kernel provided in http://kernel.ubuntu.com/~jsalisbury/lp1746225 fixes the issue.

Seth Forshee (sforshee) on 2018-02-03
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-12 07:37 EDT-------
Any update on this ?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-13 04:06 EDT-------
Hi ,
Any updates on this defect ? On which build the fix will be available ?

Frank Heimes (frank-heimes) wrote :

According to comment #5 the SRU request was already submitted and is in progress.
So it usually will go out with the next SRU cycle, but that might be held up a bit due to Spectre/Meltdown activities.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-22 03:43 EDT-------
Hi ,
Can you please let us know in which build we are expecting the fix ?

Thanks

Manoj Iyer (manjo) wrote :

This should be available in Bionic now. The patch was applied to bionic earlier this month. And, in Artful in a couple of weeks.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-27 00:25 EDT-------
Hi ,
Unable to hit the above call trace , hence the issue is resolved.

Verified on kernel :

root@ltc-wcwsp1:~/shriya/kernel# uname -a
Linux ltc-wcwsp1 4.15.0-10-generic #11 SMP Fri Feb 23 01:59:38 EST 2018 ppc64le ppc64le ppc64le GNU/Linux

Perf fuzzer :
=======
+ ./perf_fuzzer -r 1492143527
Using user-specified random seed of 1492143527

*** perf_fuzzer 0.32-rc0 *** by Vince Weaver

Linux version 4.15.0-10-generic ppc64le
Processor: ppc64le UNKNOWN

Watchdog enabled with timeout 60s
Will auto-exit if signal storm detected
Seeding RNG with supplied seed 1492143527

To reproduce, try:
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 2 > /proc/sys/kernel/perf_event_paranoid
echo 100000 > /proc/sys/kernel/perf_event_max_sample_rate
./perf_fuzzer -r 1492143527

Fuzzing the following syscalls: mmap perf_event_open close read write ioctl fork prctl poll
Also attempting the following: signal-handler-on-overflow busy-instruction-loop accessing-perf-proc-and-sys-files trashing-the-mmap-page

Pid=4506, sleeping 1s

==================================================
Starting fuzzing at 2018-02-26 21:36:45
==================================================
Cannot open /sys/kernel/tracing/kprobe_events
Iteration 10000, 98728 syscalls in 5.32 s (18.554 k syscalls/s)
Open attempts: 90726 Successful: 917 Currently open: 54
ENOENT : 446
E2BIG : 7816
EBADF : 8522
EINVAL : 72718
ENOSPC : 81
EOVERFLOW : 2
EOPNOTSUPP : 224
Trinity Type (Normal 345/22780)(Sampling 28/22630)(Global 488/22586)(Random 56/22730)
Type (Hardware 246/12690)(software 362/11897)(tracepoint 58/11874)(Cache 55/11258)(cpu 136/11638)(breakpoint 33/11690)(nest_capp0_imc 1/357)(nest_capp1_imc 1/365)(core_imc 0/460)(nest_mba0_imc 1/378)(nest_mba1_imc 0/352)(nest_mba2_imc 0/362)(nest_mba3_imc 1/371)(nest_mba4_imc 1/357)(nest_mba5_imc 0/320)(nest_mba6_imc 1/358)(nest_mba7_imc 0/439)(nest_mcs01_imc 0/359)(nest_mcs23_imc 2/361)(nest_nvlink0_imc 19/14840)
Close: 863/863 Successful
Read: 824/912 Successful
Write: 0/855 Successful
Ioctl: 371/924 Successful: (ENABLE 91/91)(DISABLE 81/81)(REFRESH 9/78)(RESET 84/84)(PERIOD 2/75)(SET_OUTPUT 8/80)(SET_FILTER 0/92)(ID 87/87)(SET_BPF 0/91)(PAUSE_OUTPUT 9/79)(#10 0/0)(#11 0/0)(#12 0/0)(#13 0/0)(#14 0/0)(>14 0/86)
Mmap: 662/1060 Successful: (MMAP 662/1060)(TRASH 144/156)(READ 133/143)(UNMAP 653/988)(AUX 0/169)(AUX_READ 0/0)
Prctl: 929/929 Successful
Fork: 471/471 Successful
Poll: 895/921 Successful
Access: 0/0 Successful
Overflows: 0 Recursive: 0
SIGIOs due to RT signal queue full: 0
Memset of mmap#18 at (nil) caused segfault!
MAP_SHARED MAP_DENYWRITE MAP_FIXED MAP_NONBLOCK MAP_STACK PROT_READ PROT_WRITE size 27661 fd 47
event: cpu=4 pid=-1 group_fd=-1 flags=9
memset(&pe[47],0,sizeof(struct perf_event_attr));
pe[47].type=PERF_TYPE_SOFTWARE;
pe[47].size=64;
pe[47].config=PERF_COUNT_SW_PAGE_FAULTS;
pe[47].sample_type=0; /* 0 */
pe[47].read_format=PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_GROUP; /* a */
pe[47].disabled=1;
pe[47].inherit=1;
pe[47].pinned=1;
pe[47].precise_ip=0; /* arbitrary skid */
pe[47].wakeup_events=0;
pe[47].bp_type=HW_BREAKPOINT_EMPTY;

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Manoj Iyer (manjo) on 2018-03-05
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-19 07:13 EDT-------
Bug was raised Bionic and tester had verified the issue. Please close.

Manoj Iyer (manjo) on 2018-03-26
tags: added: verification-done-artful
removed: verification-needed-artful
Launchpad Janitor (janitor) wrote :
Download full text (18.9 KiB)

This bug was fixed in the package linux - 4.13.0-38.43

---------------
linux (4.13.0-38.43) artful; urgency=medium

  * linux: 4.13.0-38.43 -proposed tracker (LP: #1755762)

  * Servers going OOM after updating kernel from 4.10 to 4.13 (LP: #1748408)
    - i40e: Fix memory leak related filter programming status
    - i40e: Add programming descriptors to cleaned_count

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * fails to dump with latest kpti fixes (LP: #1750021)
    - kdump: write correct address of mem_section into vmcoreinfo

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
    - ALSA: hda - Fix headset mic detection problem for two Dell machines
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * CIFS SMB2/SMB3 does not work for domain based DFS (LP: #1747572)
    - CIFS: make IPC a regular tcon
    - CIFS: use tcon_ipc instead of use_ipc parameter of SMB2_ioctl
    - CIFS: dump IPC tcon in debug proc file

  * i2c-thunderx: erroneous error message "unhandled state: 0" (LP: #1754076)
    - i2c: octeon: Prevent error message on bus error

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * EDAC, sb_edac: Backport 1 patch to Ubuntu 17.10 (Fix missing DIMM sysfs
    entries with KNL SNC2/SNC4 mode) (LP: #1743856)
    - EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode

  * [regression] Colour banding and artefacts appear system-wide on an Asus
    Zenbook UX303LA with Intel HD 4400 graphics (LP: #1749420)
    - drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA

  * DVB Card with SAA7146 chipset not working (LP: #1742316)
    - vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * support thunderx2 vendor pmu events (LP: #1747523)
    - perf pmu: Extract function to get JSON alias map
    - perf pmu: Pass pmu as a parameter to get_cpuid_str()
    - perf tools arm64: Add support for get_cpuid_str function.
    - perf pmu: Add helper function is_pmu_core to detect PMU CORE devices
    - perf vendor events arm64: Add ThunderX2 implementation defined pmu core
      events
    - perf pmu: Add check for valid cpuid in perf_pmu__find_map()

  * lpfc.ko module doesn't work (LP: #1746970)
    - scsi: lpfc: Fix loop mode target discovery

  * Ubuntu 17.10 crashes on vmalloc.c (LP: #1739498)
    - powerpc/mm/book3s64: Make KERN_IO_START a variable
    - powerpc/mm/slb: Move comment next to the code it's referring to
    - powerpc/mm/hash64: Make vmalloc 56T on hash

  * ethtool -p fails to light NIC LED on HiSilicon D05 systems (LP: #1748567)
    - net...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers