[P9, Power NV][ WSP][Ubuntu 16.04.03] : perf hw breakpoint command results in call traces and system goes for reboot.

Bug #1706033 reported by bugproxy
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
linux (Ubuntu)
High
Joseph Salisbury
Zesty
Undecided
Unassigned

Bug Description

== Comment: #0 - Shriya R. Kulkarni <> - 2017-06-14 04:38:16 ==
Problem Description :
=============

While running perftool - testsuite, the perf hw breakpoint fails and it result in call traces , hence system goes for reboot .

Machine details :
==========
System : P9 , WSP , Bare metal.
OS : Ubuntu 16.04.03
uname -a : Linux ltc-wspoon3 4.10.0-23-generic #25~16.04.1-Ubuntu SMP Fri Jun 9 10:43:34 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Steps to reproduce:
============
1. Install perf.
2. git clone perftool-testsuite.
    https://github.com/rfmvh/perftool-testsuite
3. Do make.
4. Test fails at step : -- [ FAIL ] -- perf_stat :: test_hw_breakpoints :: kspace address execution mem:0xc00000000035c020:x (command exitcode + output regexp parsing
and call trace is seen as system goes for reboot.

Call traces :
=======

ubuntu@ltc-wspoon3:~$ [1602513.518414] Unable to handle kernel paging request for data at address 0xc00000000135d3b8
[1602513.518553] Faulting instruction address: 0xc0000000002869bc
[1602513.518694] Oops: Kernel access of bad area, sig: 11 [#1]
[1602513.518782] SMP NR_CPUS=2048
[1602513.518784] NUMA
[1602513.518842] PowerNV
[1602513.518922] Modules linked in: vmx_crypto ofpart ipmi_powernv cmdlinepart ipmi_devintf powernv_flash ipmi_msghandler ibmpowernv opal_prd mtd at24 nvmem_core uio_pdrv_genirq uio autofs4 ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_vpmsum ttm drm tg3 ahci libahci
[1602513.519399] CPU: 27 PID: 4069 Comm: sysctl Not tainted 4.10.0-22-generic #24
[1602513.519524] task: c000203968c42c00 task.stack: c000203965710000
[1602513.519624] NIP: c0000000002869bc LR: c0000000003f7348 CTR: c000000000286990
[1602513.519747] REGS: c000203965713a40 TRAP: 0300 Not tainted (4.10.0-22-generic)
[1602513.519876] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[1602513.519889] CR: 22002448 XER: 00000000
[1602513.520058] CFAR: c0000000003f7344 DAR: c00000000135d3b8 DSISR: 00400000 SOFTE: 1
[1602513.520058] GPR00: c0000000003f7348 c000203965713cc0 c00000000145d100 c00000000134af00
[1602513.520058] GPR04: 0000000000000000 000000004ee50300 c000203965713d20 c000203965713e00
[1602513.520058] GPR08: 0000000000000000 c00000000135d100 0000000000000000 c000000000b71020
[1602513.520058] GPR12: c000000000286990 c000000007b4f300 0000000000000000 0000000000000000
[1602513.520058] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[1602513.520058] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
[1602513.520058] GPR24: 00003fffc542f5a0 0000000000000400 c000203965713e00 000000004ee50300
[1602513.520058] GPR28: c00000000134af00 0000000000000000 c000003fee038800 0000000000000000
[1602513.521280] NIP [c0000000002869bc] dirty_ratio_handler+0x2c/0x90
[1602513.521374] LR [c0000000003f7348] proc_sys_call_handler+0x138/0x1c0
[1602513.521481] Call Trace:
[1602513.521526] [c000203965713cc0] [c000203965713d00] 0xc000203965713d00 (unreliable)
[1602513.521655] [c000203965713d00] [c0000000003f7348] proc_sys_call_handler+0x138/0x1c0
[1602513.521797] [c000203965713d70] [c0000000003436ec] __vfs_read+0x3c/0x70
[1602513.521907] [c000203965713d90] [c00000000034516c] vfs_read+0xbc/0x1b0
[1602513.522016] [c000203965713de0] [c000000000346dd8] SyS_read+0x68/0x110
[1602513.522112] [c000203965713e30] [c00000000000b184] system_call+0x38/0xe0
[1602513.522243] Instruction dump:
[1602513.522303] 60420000 3c4c011d 38426770 7c0802a6 60000000 7c0802a6 fbc1fff0 fbe1fff8
[1602513.522445] f8010010 f821ffc1 3d22fff0 7c9f2378 <ebc902ba> 4be66da9 60000000 3d22fff0
[1602513.522564] ---[ end trace 17c76e13e641d3c6 ]---
[1602513.522657]

It goes for reboot :

After booting to Ubuntu , I see series of call traces.

Ubuntu 16.04.2 LTS ltc-wspoon3 hvc0

ltc-wspoon3 login: [ 3476.626263] Unable to handle kernel paging request for data at address 0xc0000000013ad438
[ 3476.626422] Faulting instruction address: 0xc00000000029a140
[ 3476.626537] Oops: Kernel access of bad area, sig: 11 [#1]
[ 3476.626615] SMP NR_CPUS=2048
[ 3476.626616] NUMA
[ 3476.626673] PowerNV
[ 3476.626746] Modules linked in: ipmi_powernv at24 ipmi_devintf nvmem_core ipmi_msghandler ofpart cmdlinepart powernv_flash mtd opal_prd vmx_crypto ibmpowernv uio_pdrv_genirq uio autofs4 ast i2c_algo_bit ttm crc32c_vpmsum drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops tg3 drm ahci libahci
[ 3476.627220] CPU: 28 PID: 4529 Comm: sysctl Not tainted 4.10.0-23-generic #25~16.04.1-Ubuntu
[ 3476.627339] task: c000203968ceec00 task.stack: c000203968d10000
[ 3476.627428] NIP: c00000000029a140 LR: c0000000004133a8 CTR: c00000000029a110
[ 3476.627554] REGS: c000203968d13a50 TRAP: 0300 Not tainted (4.10.0-23-generic)
[ 3476.627675] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 3476.627689] CR: 22002448 XER: 00000000
[ 3476.627844] CFAR: c0000000004133a4 DAR: c0000000013ad438 DSISR: 00400000 SOFTE: 1
[ 3476.627844] GPR00: c0000000004133a8 c000203968d13cd0 c0000000014ad100 c00000000139af78
[ 3476.627844] GPR04: 0000000000000000 000000003b440300 c000203968d13d30 c000203968d13e00
[ 3476.627844] GPR08: 0000000000000000 c0000000013ad100 0000000000000000 c000000000bc10a8
[ 3476.627844] GPR12: c00000000029a110 c000000007b4fc00 0000000000000000 0000000000000000
[ 3476.627844] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3476.627844] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
[ 3476.627844] GPR24: 00003fffd410aa70 0000000000000400 c000203968d13e00 000000003b440300
[ 3476.627844] GPR28: c00000000139af78 0000000000000000 c000003fee038800 0000000000000000
[ 3476.629067] NIP [c00000000029a140] dirty_ratio_handler+0x30/0x90
[ 3476.629177] LR [c0000000004133a8] proc_sys_call_handler+0x138/0x170
[ 3476.629283] Call Trace:
[ 3476.629330] [c000203968d13cd0] [c000203968d13d10] 0xc000203968d13d10 (unreliable)
[ 3476.629462] [c000203968d13d10] [c0000000004133a8] proc_sys_call_handler+0x138/0x170
[ 3476.629600] [c000203968d13d80] [c00000000035a4f0] __vfs_read+0x40/0x80
[ 3476.629711] [c000203968d13da0] [c00000000035c0d8] vfs_read+0xb8/0x1a0
[ 3476.629823] [c000203968d13de0] [c00000000035ddec] SyS_read+0x6c/0x110
[ 3476.629938] [c000203968d13e30] [c00000000000b184] system_call+0x38/0xe0
[ 3476.630050] Instruction dump:
[ 3476.630110] 3c4c0121 38422ff0 7c0802a6 f8010010 60000000 7c0802a6 fbc1fff0 fbe1fff8
[ 3476.630250] f8010010 f821ffc1 3d22fff0 7c9f2378 <ebc9033a> 4be5bbc5 60000000 3d22fff0
[ 3476.630396] ---[ end trace 10b22aebb5b2bf8d ]---
[ 3477.238492]
[ 3477.238534] Sending IPI to other CPUs
[ 3477.239615] IPI complete
[ 3477.240827] kexec: waiting for cpu 5 (physical 49) to ente

Attaching call traces in logs.

== Comment: #4 - Shriya R. Kulkarni <> - 2017-07-10 13:16:10 ==
The issue is fixed with upstream kernel.

Here is the testing done on upstream kernel .

Testing :
======
1. root@ltc-boston27:~/linux-next-next-20170710/tools/perf# cat /proc/kallsyms | grep -P vm_dirty_ratio
c0000000014591e0 D vm_dirty_ratio

./perf stat -e mem:0xc0000000014591e0:rw -x';' -- sysctl vm.dirty_ratio > /dev/null
3;;mem:0xc0000000014591e0:rw;1126624;100.00;;;;

2. root@ltc-boston27:~/linux-next-next-20170710/tools/perf# cat /proc/kallsyms | grep pid_max
c000000001413bfc D pid_max
c000000001413c00 D pid_max_max
c000000001413c04 D pid_max_min
root@ltc-boston27:~/linux-next-next-20170710/tools/perf# ./perf record -a -e mem:0xc000000001413bfc -g
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.161 MB perf.data (6 samples) ]

root@ltc-boston27:~/linux-next-next-20170710/tools/perf# dmesg -c
root@ltc-boston27:~/linux-next-next-20170710/tools/perf#

== Comment: #6 - Shriya R. Kulkarni <> - 2017-07-21 01:59:26 ==

This patch fixes the issue as follows ::
Patch : https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d89ba5353f301971dd7d2f9fdf25c4432728f38e

Revision history for this message
bugproxy (bugproxy) wrote : Call traces.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-155724 severity-high targetmilestone-inin16043
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with commit d89ba5353f301971dd7. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1706033/

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
Manoj Iyer (manjo) wrote :

Breno,

Could you please test with the kernel provided to you by the kernel team and report back here?

Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Breno Leitão (breno-leitao)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-27 02:52 EDT-------
Verified on Witherspoon + DD1.0 : The fix was able to resolve the issue
=====================

Installed the kernel : http://kernel.ubuntu.com/~jsalisbury/lp1706033/

root@ltc-wspoon3:~/linux-next-next-20170926/tools/perf# cat /proc/kallsyms | grep pid_max
c000000001328994 D pid_max
c000000001328998 D pid_max_max
c00000000132899c D pid_max_min
root@ltc-wspoon3:~/linux-next-next-20170926/tools/perf# ./perf record -a -e mem:0xc000000001328994 -g
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.157 MB perf.data (64 samples) ]

root@ltc-wspoon3:~/linux-next-next-20170926/tools/perf# uname -a
Linux ltc-wspoon3 4.10.0-28-generic #32~lp1706033 SMP Thu Jul 27 17:28:10 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

System details :
=========
Machine : Witherspoon + DD1.0
OS : Ubuntu 16.04.03

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I submitted a Z SRU request.

Changed in linux (Ubuntu Zesty):
status: New → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-10-18 20:11 EDT-------
Hi Kleber,

Is the verification for Zesty (17.04) a requirement for a 16.04 SRU? This is the first time Zesty has been mentioned in this bug. Thanks.

Hi Shriya,

If Kleber states we do need a verification on 17.04 as well, would you be able to do it? Thanks.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi chavez,

As the system mentioned in the bug report was running on the 4.10 kernel (Zesty kernel), that's why it was requested to do the verification request for Zesty.

In your case, I think it's ok to verify this with the updated 4.10 kernel for Xenial, aka the Zesty-HWE kernel (4.10.0-38.42~16.04.1).

Thanks

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.4 KiB)

This bug was fixed in the package linux - 4.10.0-38.42

---------------
linux (4.10.0-38.42) zesty; urgency=low

  * linux: 4.10.0-38.42 -proposed tracker (LP: #1722330)

  * Controller lockup detected on ProLiant DL380 Gen9 with P440 Controller
    (LP: #1720359)
    - scsi: hpsa: limit transfer length to 1MB

  * [Dell Docking IE][0bda:8153] Realtek USB Ethernet leads to system hang
    (LP: #1720977)
    - r8152: fix the list rx_done may be used without initialization

  * Touchpad not detected in Lenovo X1 Yoga / Yoga 720-15IKB (LP: #1700657)
    - mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices

  * Add installer support for Broadcom BCM573xx network drivers. (LP: #1720466)
    - d-i: Add bnxt_en to nic-modules.

  * CVE-2017-1000252
    - KVM: VMX: Do not BUG() on out-of-bounds guest IRQ

  * CVE-2017-10663
    - f2fs: sanity check checkpoint segno and blkoff

  * xfstest sanity checks on seek operations fails (LP: #1696049)
    - xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()

  * [P9, Power NV][ WSP][Ubuntu 16.04.03] : perf hw breakpoint command results
    in call traces and system goes for reboot. (LP: #1706033)
    - powerpc/64s: Handle data breakpoints in Radix mode

  * 5U84 - ses driver isn't binding right - cannot blink lights on 1 of the 2
    5u84 (LP: #1693369)
    - scsi: ses: do not add a device to an enclosure if enclosure_add_links()
      fails.

  * Vlun resize request could fail with cxlflash driver (LP: #1713575)
    - scsi: cxlflash: Fix vlun resize failure in the shrink path

  * More migrations with constant load (LP: #1713576)
    - sched/fair: Prefer sibiling only if local group is under-utilized

  * New PMU fixes for marked events. (LP: #1716491)
    - powerpc/perf: POWER9 PMU stops after idle workaround

  * CVE-2017-14340
    - xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present

  * [Zesty][Yakkety] rtl8192e bug fixes (LP: #1698470)
    - staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
    - staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
    - staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
    - staging: rtl8192e: GetTs Fix invalid TID 7 warning.

  * Stranded with ENODEV after mdadm --readonly (LP: #1706243)
    - md: MD_CLOSING needs to be cleared after called md_set_readonly or
      do_md_stop

  * multipath -ll is not showing the disks which are actually multipath
    (LP: #1718397)
    - fs: aio: fix the increment of aio-nr and counting against aio-max-nr

  * ETPS/2 Elantech Touchpad inconsistently detected (Gigabyte P57W laptop)
    (LP: #1594214)
    - Input: i8042 - add Gigabyte P57 to the keyboard reset table

  * CVE-2017-10911
    - xen-blkback: don't leak stack data via response ring

  * CVE-2017-11176
    - mqueue: fix a use-after-free in sys_mq_notify()

  * implement 'complain mode' in seccomp for developer mode with snaps
    (LP: #1567597)
    - Revert "UBUNTU: SAUCE: seccomp: log actions even when audit is disabled"
    - seccomp: Provide matching filter for introspection
    - seccomp: Sysctl to display available actions
    - seccomp: Operation for checking if an a...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: Breno Leitão (breno-leitao) → Canonical Kernel Team (canonical-kernel-team)
bugproxy (bugproxy)
tags: added: verification-done-zesty
removed: verification-needed-zesty
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments