[P9, Power NV][ WSP][Ubuntu 16.04.03] : perf hw breakpoint command results in call traces and system goes for reboot.

Bug #1706033 reported by bugproxy
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Zesty
Fix Released
Undecided
Unassigned

Bug Description

== Comment: #0 - Shriya R. Kulkarni <> - 2017-06-14 04:38:16 ==
Problem Description :
=============

While running perftool - testsuite, the perf hw breakpoint fails and it result in call traces , hence system goes for reboot .

Machine details :
==========
System : P9 , WSP , Bare metal.
OS : Ubuntu 16.04.03
uname -a : Linux ltc-wspoon3 4.10.0-23-generic #25~16.04.1-Ubuntu SMP Fri Jun 9 10:43:34 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Steps to reproduce:
============
1. Install perf.
2. git clone perftool-testsuite.
    https://github.com/rfmvh/perftool-testsuite
3. Do make.
4. Test fails at step : -- [ FAIL ] -- perf_stat :: test_hw_breakpoints :: kspace address execution mem:0xc00000000035c020:x (command exitcode + output regexp parsing
and call trace is seen as system goes for reboot.

Call traces :
=======

ubuntu@ltc-wspoon3:~$ [1602513.518414] Unable to handle kernel paging request for data at address 0xc00000000135d3b8
[1602513.518553] Faulting instruction address: 0xc0000000002869bc
[1602513.518694] Oops: Kernel access of bad area, sig: 11 [#1]
[1602513.518782] SMP NR_CPUS=2048
[1602513.518784] NUMA
[1602513.518842] PowerNV
[1602513.518922] Modules linked in: vmx_crypto ofpart ipmi_powernv cmdlinepart ipmi_devintf powernv_flash ipmi_msghandler ibmpowernv opal_prd mtd at24 nvmem_core uio_pdrv_genirq uio autofs4 ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_vpmsum ttm drm tg3 ahci libahci
[1602513.519399] CPU: 27 PID: 4069 Comm: sysctl Not tainted 4.10.0-22-generic #24
[1602513.519524] task: c000203968c42c00 task.stack: c000203965710000
[1602513.519624] NIP: c0000000002869bc LR: c0000000003f7348 CTR: c000000000286990
[1602513.519747] REGS: c000203965713a40 TRAP: 0300 Not tainted (4.10.0-22-generic)
[1602513.519876] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[1602513.519889] CR: 22002448 XER: 00000000
[1602513.520058] CFAR: c0000000003f7344 DAR: c00000000135d3b8 DSISR: 00400000 SOFTE: 1
[1602513.520058] GPR00: c0000000003f7348 c000203965713cc0 c00000000145d100 c00000000134af00
[1602513.520058] GPR04: 0000000000000000 000000004ee50300 c000203965713d20 c000203965713e00
[1602513.520058] GPR08: 0000000000000000 c00000000135d100 0000000000000000 c000000000b71020
[1602513.520058] GPR12: c000000000286990 c000000007b4f300 0000000000000000 0000000000000000
[1602513.520058] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[1602513.520058] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
[1602513.520058] GPR24: 00003fffc542f5a0 0000000000000400 c000203965713e00 000000004ee50300
[1602513.520058] GPR28: c00000000134af00 0000000000000000 c000003fee038800 0000000000000000
[1602513.521280] NIP [c0000000002869bc] dirty_ratio_handler+0x2c/0x90
[1602513.521374] LR [c0000000003f7348] proc_sys_call_handler+0x138/0x1c0
[1602513.521481] Call Trace:
[1602513.521526] [c000203965713cc0] [c000203965713d00] 0xc000203965713d00 (unreliable)
[1602513.521655] [c000203965713d00] [c0000000003f7348] proc_sys_call_handler+0x138/0x1c0
[1602513.521797] [c000203965713d70] [c0000000003436ec] __vfs_read+0x3c/0x70
[1602513.521907] [c000203965713d90] [c00000000034516c] vfs_read+0xbc/0x1b0
[1602513.522016] [c000203965713de0] [c000000000346dd8] SyS_read+0x68/0x110
[1602513.522112] [c000203965713e30] [c00000000000b184] system_call+0x38/0xe0
[1602513.522243] Instruction dump:
[1602513.522303] 60420000 3c4c011d 38426770 7c0802a6 60000000 7c0802a6 fbc1fff0 fbe1fff8
[1602513.522445] f8010010 f821ffc1 3d22fff0 7c9f2378 <ebc902ba> 4be66da9 60000000 3d22fff0
[1602513.522564] ---[ end trace 17c76e13e641d3c6 ]---
[1602513.522657]

It goes for reboot :

After booting to Ubuntu , I see series of call traces.

Ubuntu 16.04.2 LTS ltc-wspoon3 hvc0

ltc-wspoon3 login: [ 3476.626263] Unable to handle kernel paging request for data at address 0xc0000000013ad438
[ 3476.626422] Faulting instruction address: 0xc00000000029a140
[ 3476.626537] Oops: Kernel access of bad area, sig: 11 [#1]
[ 3476.626615] SMP NR_CPUS=2048
[ 3476.626616] NUMA
[ 3476.626673] PowerNV
[ 3476.626746] Modules linked in: ipmi_powernv at24 ipmi_devintf nvmem_core ipmi_msghandler ofpart cmdlinepart powernv_flash mtd opal_prd vmx_crypto ibmpowernv uio_pdrv_genirq uio autofs4 ast i2c_algo_bit ttm crc32c_vpmsum drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops tg3 drm ahci libahci
[ 3476.627220] CPU: 28 PID: 4529 Comm: sysctl Not tainted 4.10.0-23-generic #25~16.04.1-Ubuntu
[ 3476.627339] task: c000203968ceec00 task.stack: c000203968d10000
[ 3476.627428] NIP: c00000000029a140 LR: c0000000004133a8 CTR: c00000000029a110
[ 3476.627554] REGS: c000203968d13a50 TRAP: 0300 Not tainted (4.10.0-23-generic)
[ 3476.627675] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 3476.627689] CR: 22002448 XER: 00000000
[ 3476.627844] CFAR: c0000000004133a4 DAR: c0000000013ad438 DSISR: 00400000 SOFTE: 1
[ 3476.627844] GPR00: c0000000004133a8 c000203968d13cd0 c0000000014ad100 c00000000139af78
[ 3476.627844] GPR04: 0000000000000000 000000003b440300 c000203968d13d30 c000203968d13e00
[ 3476.627844] GPR08: 0000000000000000 c0000000013ad100 0000000000000000 c000000000bc10a8
[ 3476.627844] GPR12: c00000000029a110 c000000007b4fc00 0000000000000000 0000000000000000
[ 3476.627844] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3476.627844] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
[ 3476.627844] GPR24: 00003fffd410aa70 0000000000000400 c000203968d13e00 000000003b440300
[ 3476.627844] GPR28: c00000000139af78 0000000000000000 c000003fee038800 0000000000000000
[ 3476.629067] NIP [c00000000029a140] dirty_ratio_handler+0x30/0x90
[ 3476.629177] LR [c0000000004133a8] proc_sys_call_handler+0x138/0x170
[ 3476.629283] Call Trace:
[ 3476.629330] [c000203968d13cd0] [c000203968d13d10] 0xc000203968d13d10 (unreliable)
[ 3476.629462] [c000203968d13d10] [c0000000004133a8] proc_sys_call_handler+0x138/0x170
[ 3476.629600] [c000203968d13d80] [c00000000035a4f0] __vfs_read+0x40/0x80
[ 3476.629711] [c000203968d13da0] [c00000000035c0d8] vfs_read+0xb8/0x1a0
[ 3476.629823] [c000203968d13de0] [c00000000035ddec] SyS_read+0x6c/0x110
[ 3476.629938] [c000203968d13e30] [c00000000000b184] system_call+0x38/0xe0
[ 3476.630050] Instruction dump:
[ 3476.630110] 3c4c0121 38422ff0 7c0802a6 f8010010 60000000 7c0802a6 fbc1fff0 fbe1fff8
[ 3476.630250] f8010010 f821ffc1 3d22fff0 7c9f2378 <ebc9033a> 4be5bbc5 60000000 3d22fff0
[ 3476.630396] ---[ end trace 10b22aebb5b2bf8d ]---
[ 3477.238492]
[ 3477.238534] Sending IPI to other CPUs
[ 3477.239615] IPI complete
[ 3477.240827] kexec: waiting for cpu 5 (physical 49) to ente

Attaching call traces in logs.

== Comment: #4 - Shriya R. Kulkarni <> - 2017-07-10 13:16:10 ==
The issue is fixed with upstream kernel.

Here is the testing done on upstream kernel .

Testing :
======
1. root@ltc-boston27:~/linux-next-next-20170710/tools/perf# cat /proc/kallsyms | grep -P vm_dirty_ratio
c0000000014591e0 D vm_dirty_ratio

./perf stat -e mem:0xc0000000014591e0:rw -x';' -- sysctl vm.dirty_ratio > /dev/null
3;;mem:0xc0000000014591e0:rw;1126624;100.00;;;;

2. root@ltc-boston27:~/linux-next-next-20170710/tools/perf# cat /proc/kallsyms | grep pid_max
c000000001413bfc D pid_max
c000000001413c00 D pid_max_max
c000000001413c04 D pid_max_min
root@ltc-boston27:~/linux-next-next-20170710/tools/perf# ./perf record -a -e mem:0xc000000001413bfc -g
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.161 MB perf.data (6 samples) ]

root@ltc-boston27:~/linux-next-next-20170710/tools/perf# dmesg -c
root@ltc-boston27:~/linux-next-next-20170710/tools/perf#

== Comment: #6 - Shriya R. Kulkarni <> - 2017-07-21 01:59:26 ==

This patch fixes the issue as follows ::
Patch : https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d89ba5353f301971dd7d2f9fdf25c4432728f38e

Revision history for this message
bugproxy (bugproxy) wrote : Call traces.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-155724 severity-high targetmilestone-inin16043
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with commit d89ba5353f301971dd7. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1706033/

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
Manoj Iyer (manjo) wrote :

Breno,

Could you please test with the kernel provided to you by the kernel team and report back here?

Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Breno Leitão (breno-leitao)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-27 02:52 EDT-------
Verified on Witherspoon + DD1.0 : The fix was able to resolve the issue
=====================

Installed the kernel : http://kernel.ubuntu.com/~jsalisbury/lp1706033/

root@ltc-wspoon3:~/linux-next-next-20170926/tools/perf# cat /proc/kallsyms | grep pid_max
c000000001328994 D pid_max
c000000001328998 D pid_max_max
c00000000132899c D pid_max_min
root@ltc-wspoon3:~/linux-next-next-20170926/tools/perf# ./perf record -a -e mem:0xc000000001328994 -g
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.157 MB perf.data (64 samples) ]

root@ltc-wspoon3:~/linux-next-next-20170926/tools/perf# uname -a
Linux ltc-wspoon3 4.10.0-28-generic #32~lp1706033 SMP Thu Jul 27 17:28:10 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

System details :
=========
Machine : Witherspoon + DD1.0
OS : Ubuntu 16.04.03

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I submitted a Z SRU request.

Changed in linux (Ubuntu Zesty):
status: New → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-18 20:11 EDT-------
Hi Kleber,

Is the verification for Zesty (17.04) a requirement for a 16.04 SRU? This is the first time Zesty has been mentioned in this bug. Thanks.

Hi Shriya,

If Kleber states we do need a verification on 17.04 as well, would you be able to do it? Thanks.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi chavez,

As the system mentioned in the bug report was running on the 4.10 kernel (Zesty kernel), that's why it was requested to do the verification request for Zesty.

In your case, I think it's ok to verify this with the updated 4.10 kernel for Xenial, aka the Zesty-HWE kernel (4.10.0-38.42~16.04.1).

Thanks

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.4 KiB)

This bug was fixed in the package linux - 4.10.0-38.42

---------------
linux (4.10.0-38.42) zesty; urgency=low

  * linux: 4.10.0-38.42 -proposed tracker (LP: #1722330)

  * Controller lockup detected on ProLiant DL380 Gen9 with P440 Controller
    (LP: #1720359)
    - scsi: hpsa: limit transfer length to 1MB

  * [Dell Docking IE][0bda:8153] Realtek USB Ethernet leads to system hang
    (LP: #1720977)
    - r8152: fix the list rx_done may be used without initialization

  * Touchpad not detected in Lenovo X1 Yoga / Yoga 720-15IKB (LP: #1700657)
    - mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices

  * Add installer support for Broadcom BCM573xx network drivers. (LP: #1720466)
    - d-i: Add bnxt_en to nic-modules.

  * CVE-2017-1000252
    - KVM: VMX: Do not BUG() on out-of-bounds guest IRQ

  * CVE-2017-10663
    - f2fs: sanity check checkpoint segno and blkoff

  * xfstest sanity checks on seek operations fails (LP: #1696049)
    - xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()

  * [P9, Power NV][ WSP][Ubuntu 16.04.03] : perf hw breakpoint command results
    in call traces and system goes for reboot. (LP: #1706033)
    - powerpc/64s: Handle data breakpoints in Radix mode

  * 5U84 - ses driver isn't binding right - cannot blink lights on 1 of the 2
    5u84 (LP: #1693369)
    - scsi: ses: do not add a device to an enclosure if enclosure_add_links()
      fails.

  * Vlun resize request could fail with cxlflash driver (LP: #1713575)
    - scsi: cxlflash: Fix vlun resize failure in the shrink path

  * More migrations with constant load (LP: #1713576)
    - sched/fair: Prefer sibiling only if local group is under-utilized

  * New PMU fixes for marked events. (LP: #1716491)
    - powerpc/perf: POWER9 PMU stops after idle workaround

  * CVE-2017-14340
    - xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present

  * [Zesty][Yakkety] rtl8192e bug fixes (LP: #1698470)
    - staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
    - staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
    - staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
    - staging: rtl8192e: GetTs Fix invalid TID 7 warning.

  * Stranded with ENODEV after mdadm --readonly (LP: #1706243)
    - md: MD_CLOSING needs to be cleared after called md_set_readonly or
      do_md_stop

  * multipath -ll is not showing the disks which are actually multipath
    (LP: #1718397)
    - fs: aio: fix the increment of aio-nr and counting against aio-max-nr

  * ETPS/2 Elantech Touchpad inconsistently detected (Gigabyte P57W laptop)
    (LP: #1594214)
    - Input: i8042 - add Gigabyte P57 to the keyboard reset table

  * CVE-2017-10911
    - xen-blkback: don't leak stack data via response ring

  * CVE-2017-11176
    - mqueue: fix a use-after-free in sys_mq_notify()

  * implement 'complain mode' in seccomp for developer mode with snaps
    (LP: #1567597)
    - Revert "UBUNTU: SAUCE: seccomp: log actions even when audit is disabled"
    - seccomp: Provide matching filter for introspection
    - seccomp: Sysctl to display available actions
    - seccomp: Operation for checking if an a...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: Breno Leitão (breno-leitao) → Canonical Kernel Team (canonical-kernel-team)
bugproxy (bugproxy)
tags: added: verification-done-zesty
removed: verification-needed-zesty
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.