Ubuntu17.04: CAPI: call trace seen while error injection to the CAPI card.

Bug #1694485 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Medium
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Zesty
Won't Fix
Medium
Joseph Salisbury
Artful
Fix Released
Medium
Joseph Salisbury

Bug Description

== Comment: #0 - SUDEESH JOHN - 2017-03-18 13:55:03 ==
---Problem Description---
call trace while injecting error to the CAPI card.

" WARNING: CPU: 31 PID: 491 at /build/linux-VtwHOM/linux-4.10.0/drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x68/0x90 [cxl] "

---uname output---
Linux freak 4.10.0-13-generic #15-Ubuntu SMP Thu Mar 9 20:27:28 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = PowerNV 8247-21L

---Steps to Reproduce---
1. echo 10000 > /sys/kernel/debug/powerpc/eeh_max_freezes
2. echo 1 > /sys/class/cxl/card0/perst_reloads_same_image
3. echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct_outbound

---The complete call trace ---

Mar 18 14:39:09 freak kernel: [ 289.675421] ------------[ cut here ]------------
Mar 18 14:39:09 freak kernel: [ 289.675431] WARNING: CPU: 5 PID: 491 at /build/linux-VtwHOM/linux-4.10.0/drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x68/0x90 [cxl]
Mar 18 14:39:09 freak kernel: [ 289.675432] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel powernv_rng vmx_crypto ibmpowernv leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear ses enclosure scsi_transport_sas bnx2x mlx5_core tg3 cxl mdio ipr libcrc32c devlink crc32c_vpmsum pnv_php
Mar 18 14:39:09 freak kernel: [ 289.675490] CPU: 5 PID: 491 Comm: eehd Not tainted 4.10.0-13-generic #15-Ubuntu
Mar 18 14:39:09 freak kernel: [ 289.675492] task: c0000003bfbfde00 task.stack: c0000003bfc5c000
Mar 18 14:39:09 freak kernel: [ 289.675493] NIP: d000000005cc0ca0 LR: d000000005cc0c9c CTR: c000000000605aa0
Mar 18 14:39:09 freak kernel: [ 289.675495] REGS: c0000003bfc5f6a0 TRAP: 0700 Not tainted (4.10.0-13-generic)
Mar 18 14:39:09 freak kernel: [ 289.675496] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
Mar 18 14:39:09 freak kernel: [ 289.675504] CR: 28008282 XER: 20000000
Mar 18 14:39:09 freak kernel: [ 289.675504] CFAR: c000000000b568dc SOFTE: 1
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR00: d000000005cc0c9c c0000003bfc5f920 d000000005cf2d88 000000000000002f
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR04: 0000000000000001 00000000000003fd 0000000063206576 0000000000000000
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR08: c0000000015dc700 0000000000000000 0000000000000000 0000000000000001
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR12: 0000000000008800 c00000000fb82d00 c000000000108c88 c0000003c51f9f00
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d53990
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR24: c000000000d53968 c0000000014a4330 c0000003ab8fa800 c0000003bd2c20c0
Mar 18 14:39:09 freak kernel: [ 289.675504] GPR28: c0000003c5051098 0000000000000000 c0000003ab8fa800 0000000000000000
Mar 18 14:39:09 freak kernel: [ 289.675535] NIP [d000000005cc0ca0] cxl_adapter_context_unlock+0x68/0x90 [cxl]
Mar 18 14:39:09 freak kernel: [ 289.675540] LR [d000000005cc0c9c] cxl_adapter_context_unlock+0x64/0x90 [cxl]
Mar 18 14:39:09 freak kernel: [ 289.675541] Call Trace:
Mar 18 14:39:09 freak kernel: [ 289.675547] [c0000003bfc5f920] [d000000005cc0c9c] cxl_adapter_context_unlock+0x64/0x90 [cxl] (unreliable)
Mar 18 14:39:09 freak kernel: [ 289.675556] [c0000003bfc5f980] [d000000005cd022c] cxl_configure_adapter+0x954/0x990 [cxl]
Mar 18 14:39:09 freak kernel: [ 289.675563] [c0000003bfc5fa30] [d000000005cd02c0] cxl_pci_slot_reset+0x58/0x240 [cxl]
Mar 18 14:39:09 freak kernel: [ 289.675568] [c0000003bfc5fae0] [c00000000003b0d4] eeh_report_reset+0x154/0x190
Mar 18 14:39:09 freak kernel: [ 289.675571] [c0000003bfc5fb20] [c000000000039428] eeh_pe_dev_traverse+0x98/0x170
Mar 18 14:39:09 freak kernel: [ 289.675574] [c0000003bfc5fbb0] [c00000000003b81c] eeh_handle_normal_event+0x3ec/0x540
Mar 18 14:39:09 freak kernel: [ 289.675577] [c0000003bfc5fc60] [c00000000003bbd4] eeh_handle_event+0x174/0x360
Mar 18 14:39:09 freak kernel: [ 289.675580] [c0000003bfc5fd10] [c00000000003bfa8] eeh_event_handler+0x1e8/0x1f0
Mar 18 14:39:09 freak kernel: [ 289.675583] [c0000003bfc5fdc0] [c000000000108dd4] kthread+0x154/0x1a0
Mar 18 14:39:09 freak kernel: [ 289.675586] [c0000003bfc5fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
Mar 18 14:39:09 freak kernel: [ 289.675588] Instruction dump:
Mar 18 14:39:09 freak kernel: [ 289.675590] 2f84ffff 4d9e0020 7c0802a6 f8010010 f821ffa1 39200000 7c8407b4 912303d0
Mar 18 14:39:09 freak kernel: [ 289.675596] 3d220000 e8698070 4801f159 e8410018 <0fe00000> 38210060 e8010010 7c0803a6
Mar 18 14:39:09 freak kernel: [ 289.675602] ---[ end trace 113989c345fee0d3 ]---
Mar 18 14:39:09 freak kernel: [ 289.675642] cxl afu0.0: Activating AFU directed mode

== Comment: #2 - Vaibhav Jain - 2017-03-20 05:00:20 ==
Have sent a fix patch to ppc-dev list for review https://patchwork.ozlabs.org/patch/740876/

== Comment: #3 - Vaibhav Jain - 2017-05-16 01:56:32 ==
Patch merged to main line viz commit ea9a26d117cf0637c71d3e0076f4a124bf5859df ('cxl: Force context lock during EEH flow')

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-152708 severity-low targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
bugproxy (bugproxy)
tags: added: targetmilestone-inin1704
removed: targetmilestone-inin---
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo)
tags: added: ubuntu-17.04
Changed in linux (Ubuntu):
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Zesty test kernel with a pick of commit ea9a26d117c. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1694485/

Can you test this kernel and see if it resolves this bug?

Manoj Iyer (manjo)
tags: added: triage-g
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-01 10:07 EDT-------
Sudeesh, Vaibhav,

Canonical is waiting on you testing the kernel provided above.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-01 23:37 EDT-------
(In reply to comment #11)
> Sudeesh, Vaibhav,
>
> Canonical is waiting on you testing the kernel provided above.

Unable to validate the issue due to the issue reported in https://bugzilla.linux.ibm.com/show_bug.cgi?id=156746 .

root@ltc84-pkvm1:~# echo 10000 > /sys/kernel/debug/powerpc/eeh_max_freezes
root@ltc84-pkvm1:~# echo 1 > /sys/class/cxl/card0/perst_reloads_same_image
root@ltc84-pkvm1:~# lspci | grep acc
0001:01:00.0 Processing accelerators: IBM Device 0477 (rev 01)
0002:00:00.0 Processing accelerators: IBM Device 4350 (rev 0a)
root@ltc84-pkvm1:~# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound

--- Console Log ---
Ubuntu 17.04 ltc84-pkvm1 hvc0

ltc84-pkvm1 login: 3.68562|Ignoring boot flags, incorrect version 0x0
3.76138|ISTEP 6. 3
4.20761|ISTEP 6. 4
4.20823|ISTEP 6. 5
44.57717|HWAS|PRESENT> DIMM[03]=AAAAAAAAAAAAAAAA
44.57718|HWAS|PRESENT> Membuf[04]=CCCC000000000000
44.57718|HWAS|PRESENT> Proc[05]=C000000000000000
44.66118|ISTEP 6. 6
45.62939|================================================
45.62939|Error reported by unknown (0xE500)
45.62939| <none>
45.62939| ModuleId 0x0b unknown
45.62939| ReasonCode 0xe540 unknown
45.62940| UserData1 unknown : 0x0005000000000101
45.62940| UserData2 unknown : 0x4241000500000000
45.62940|User Data Section 0, type UD
45.62940| Subsection type 0x06
45.62941| ComponentId errl (0x0100)
45.62941| CALLOUT
45.62941| PROCEDURE ERROR
45.62941| Procedure: 16
45.62941|User Data Section 1, type UD
45.62942| Subsection type 0x04
45.62942| ComponentId errl (0x0100)
45.62942|User Data Section 2, type UD
45.62942| Subsection type 0x06
45.62942| ComponentId errl (0x0100)
45.62943| CALLOUT
45.62943| HW CALLOUT
45.62943| Reporting CPU ID: 15
45.62945| Called out entity:
45.62945|User Data Section 3, type UD
45.62945| Subsection type 0x33
45.62946| ComponentId unknown (0xe500)
45.62946|User Data Section 4, type UD
45.62946| Subsection type 0x01
45.62946| ComponentId unknown (0xe500)
45.62947| STRING
45.62947|
45.62947|User Data Section 5, type UD
45.62947| Subsection type 0x15
45.62947| ComponentId hb-trace (0x3100)
45.62947|User Data Section 6, type UD
45.62948| Subsection type 0x03
45.62948| ComponentId errl (0x0100)
45.62948|User Data Section 7, type UD
45.62948| Subsection type 0x01
45.62949| ComponentId errl (0x0100)
45.62949| STRING
45.62949| Hostboot Build ID: hostboot-2eb7706-740e8ce/hbicore.bin
45.62949|User Data Section 8, type UD
45.62949| Subsection type 0x04
45.62950| ComponentId errl (0x0100)
45.62950|================================================
52.30974|ISTEP 6. 7
53.96080|ISTEP 6. 8
54.00225|ISTEP 6. 9
57.60161|ISTEP 6.10
<snip>

------- Comment From <email address hidden> 2017-08-02 00:42 EDT-------
https://bugzilla.linux.ibm.com/show_bug.cgi?id=154870 has to be fixed for validating this bug.

Manoj Iyer (manjo)
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Moving to "incomplete", please update bug status back when you are able to validate.

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (6.7 KiB)

------- Comment From <email address hidden> 2017-12-01 02:01 EDT-------
The trace reported is no more seen ; But I see some other trace in dmesg;

root@ltc84-pkvm1:~# echo 10000 > /sys/kernel/debug/powerpc/eeh_max_freezes
root@ltc84-pkvm1:~# echo 1 > /sys/class/cxl/card0/perst_reloads_same_image
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~# lspci | grep acc
0001:01:00.0 Processing accelerators: IBM Device 0477 (rev 01)
0002:00:00.0 Processing accelerators: IBM Device 4350 (rev 0a)
root@ltc84-pkvm1:~# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~# uname -a
Linux ltc84-pkvm1 4.10.0-40-generic #44-Ubuntu SMP Thu Nov 9 14:48:23 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ltc84-pkvm1:~#

root@ltc84-pkvm1:~# dmesg
<snip>
[ 123.426172] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 123.573736] Ebtables v2.0 registered
[ 123.964678] virbr0: port 1(virbr0-nic) entered blocking state
[ 123.964682] virbr0: port 1(virbr0-nic) entered disabled state
[ 123.964870] device virbr0-nic entered promiscuous mode
[ 124.298173] virbr0: port 1(virbr0-nic) entered blocking state
[ 124.298176] virbr0: port 1(virbr0-nic) entered listening state
[ 124.372069] virbr0: port 1(virbr0-nic) entered disabled state
[ 171.671205] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 171.671211] Error detail: Unknown
[ 171.671214] HMER: 8040000000000000
[ 171.671218] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 171.671220] Error detail: Unknown
[ 171.671223] HMER: 8040000000000000
[ 171.671382] EEH: Fenced PHB#1 detected, location: N/A
[ 171.672512] EEH: This PCI device has failed 1 times in the last hour
[ 171.672513] EEH: Notify device drivers to shutdown
[ 171.672522] cxl afu0.0: Deactivating AFU directed mode
[ 171.672660] cxl afu0.0: PSL Purge called with link down, ignoring
[ 171.673304] EEH: Collect temporary log
[ 171.673306] PHB3 PHB#1 Diag-data (Version: 1)
[ 171.673307] brdgCtl: 0000ffff
[ 171.673309] UtlSts: 00200000 00000000 00000000
[ 171.673311] RootSts: ffffffff ffffffff ffffffff ffffffff 0000ffff
[ 171.673312] RootErrSts: ffffffff ffffffff ffffffff
[ 171.673313] RootErrLog: ffffffff ffffffff ffffffff ffffffff
[ 171.673314] RootErrLog1: ffffffff 0000000000000000 0000000000000000
[ 171.673316] nFir: 0000809000000000 0030006e00000000 0000800000000000
[ 171.673317] PhbSts: 0000001800000000 0000001800000000
[ 171.673318] Lem: 8000020000800000 40018e2400022482 8000000000000000
[ 171.673320] OutErr: 8000002000000000 8000000000000000 1210026000020003 0000400000000000
[ 171.673321] InBErr: 0000000040000000 0000000040000000 0000080000000000 000c104010010000
[ 171.673323] EEH: Reset without hotplug activity
[ 176.174078] EEH: Notify device drivers the completion of reset
[ 176.174089] cxl-pci 0001:01:00.0: enabling device (0140 -> 0142)
[ 176.174404] pci 0001:01 : [PE# 00] Switching PHB to CXL
[ 176.174505] pci 0001:01 : [PE# 00] Switching PHB to CXL
[ 176.186032] Adapter context unlocked with 0 active contexts
[ 176.186109] ------------[ cut here ]------------
[ 176.186120...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (7.1 KiB)

------- Comment From <email address hidden> 2017-12-01 03:46 EDT-------
(In reply to comment #16)
> The trace reported is no more seen ; But I see some other trace in dmesg;
>
>
>
> root@ltc84-pkvm1:~# echo 10000 > /sys/kernel/debug/powerpc/eeh_max_freezes
> root@ltc84-pkvm1:~# echo 1 > /sys/class/cxl/card0/perst_reloads_same_image
> root@ltc84-pkvm1:~#
> root@ltc84-pkvm1:~#
> root@ltc84-pkvm1:~# lspci | grep acc
> 0001:01:00.0 Processing accelerators: IBM Device 0477 (rev 01)
> 0002:00:00.0 Processing accelerators: IBM Device 4350 (rev 0a)
> root@ltc84-pkvm1:~# echo 0x8000000000000000 >
> /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
> root@ltc84-pkvm1:~#
> root@ltc84-pkvm1:~# uname -a
> Linux ltc84-pkvm1 4.10.0-40-generic #44-Ubuntu SMP Thu Nov 9 14:48:23 UTC
> 2017 ppc64le ppc64le ppc64le GNU/Linux
> root@ltc84-pkvm1:~#
>
> root@ltc84-pkvm1:~# dmesg
> <snip>
> [ 123.426172] ip6_tables: (C) 2000-2006 Netfilter Core Team
> [ 123.573736] Ebtables v2.0 registered
> [ 123.964678] virbr0: port 1(virbr0-nic) entered blocking state
> [ 123.964682] virbr0: port 1(virbr0-nic) entered disabled state
> [ 123.964870] device virbr0-nic entered promiscuous mode
> [ 124.298173] virbr0: port 1(virbr0-nic) entered blocking state
> [ 124.298176] virbr0: port 1(virbr0-nic) entered listening state
> [ 124.372069] virbr0: port 1(virbr0-nic) entered disabled state
> [ 171.671205] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 171.671211] Error detail: Unknown
> [ 171.671214] HMER: 8040000000000000
> [ 171.671218] Harmless Hypervisor Maintenance interrupt [Recovered]
> [ 171.671220] Error detail: Unknown
> [ 171.671223] HMER: 8040000000000000
> [ 171.671382] EEH: Fenced PHB#1 detected, location: N/A
> [ 171.672512] EEH: This PCI device has failed 1 times in the last hour
> [ 171.672513] EEH: Notify device drivers to shutdown
> [ 171.672522] cxl afu0.0: Deactivating AFU directed mode
> [ 171.672660] cxl afu0.0: PSL Purge called with link down, ignoring
> [ 171.673304] EEH: Collect temporary log
> [ 171.673306] PHB3 PHB#1 Diag-data (Version: 1)
> [ 171.673307] brdgCtl: 0000ffff
> [ 171.673309] UtlSts: 00200000 00000000 00000000
> [ 171.673311] RootSts: ffffffff ffffffff ffffffff ffffffff 0000ffff
> [ 171.673312] RootErrSts: ffffffff ffffffff ffffffff
> [ 171.673313] RootErrLog: ffffffff ffffffff ffffffff ffffffff
> [ 171.673314] RootErrLog1: ffffffff 0000000000000000 0000000000000000
> [ 171.673316] nFir: 0000809000000000 0030006e00000000
> 0000800000000000
> [ 171.673317] PhbSts: 0000001800000000 0000001800000000
> [ 171.673318] Lem: 8000020000800000 40018e2400022482
> 8000000000000000
> [ 171.673320] OutErr: 8000002000000000 8000000000000000
> 1210026000020003 0000400000000000
> [ 171.673321] InBErr: 0000000040000000 0000000040000000
> 0000080000000000 000c104010010000
> [ 171.673323] EEH: Reset without hotplug activity
> [ 176.174078] EEH: Notify device drivers the completion of reset
> [ 176.174089] cxl-pci 0001:01:00.0: enabling device (0140 -> 0142)
> [ 176.174404] pci 0001:01 : [PE# 00] Switching PHB to CXL
> [ 176.174505] pci 0001:01 : [PE# 00] S...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-12-01 04:00 EDT-------
Tried testing given kernel (linux-image-4.10.0-26-generic_4.10.0-26.30~lp1694485_ppc64el.deb). Unfortunately still I'm hitting the issue mentioned in #12(BZ156746 and BZ154870) . That fix has gone into 4.10-30 kernel. It would require a new kernel rebuilt with both the fixes to validate this issue.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a 4.10.0-40 based test kernel with a pick of commit:
ea9a26d cxl: Force context lock during EEH flow

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1694485/

Can you test this kernel and see if it resolves this bug?

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (5.1 KiB)

------- Comment From <email address hidden> 2017-12-04 03:43 EDT-------
The reported issue is no more seen with the given kernel.

root@ltc84-pkvm1:~# lspci | grep acc
0001:01:00.0 Processing accelerators: IBM Device 0477 (rev 01)
0002:00:00.0 Processing accelerators: IBM Device 4350 (rev 0a)
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~# echo 10000 > /sys/kernel/debug/powerpc/eeh_max_freezes
root@ltc84-pkvm1:~# echo 1 > /sys/class/cxl/card0/perst_reloads_same_image
root@ltc84-pkvm1:~# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~#
root@ltc84-pkvm1:~# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
root@ltc84-pkvm1:~# dpkg -l | grep linux-im
rc linux-image-4.10.0-26-generic 4.10.0-26.30~lp1694485 ppc64el Linux kernel image for version 4.10.0 on PowerPC 64el SMP
ii linux-image-4.10.0-40-generic 4.10.0-40.44~lp1694485 ppc64el Linux kernel image for version 4.10.0 on PowerPC 64el SMP
rc linux-image-extra-4.10.0-26-generic 4.10.0-26.30~lp1694485 ppc64el Linux kernel extra modules for version 4.10.0 on PowerPC 64el SMP
rc linux-image-extra-4.10.0-40-generic 4.10.0-40.44~lp1694485 ppc64el Linux kernel extra modules for version 4.10.0 on PowerPC 64el SMP
root@ltc84-pkvm1:~# uname -a
Linux ltc84-pkvm1 4.10.0-40-generic #44~lp1694485 SMP Sat Dec 2 20:43:42 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ltc84-pkvm1:~#

root@ltc84-pkvm1:~# dmesg
[ 115.720740] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 115.720747] EEH: Fenced PHB#1 detected, location: N/A
[ 115.721905] EEH: This PCI device has failed 1 times in the last hour
[ 115.721906] EEH: Notify device drivers to shutdown
[ 115.721916] cxl afu0.0: Deactivating AFU directed mode
[ 115.722170] cxl afu0.0: PSL Purge called with link down, ignoring
[ 115.722585] Error detail: Unknown
[ 115.722586] HMER: 8040000000000000
[ 115.722588] Harmless Hypervisor Maintenance interrupt [Recovered]
[ 115.722588] Error detail: Unknown
[ 115.722589] HMER: 8040000000000000
[ 115.722682] EEH: Collect temporary log
[ 115.722684] PHB3 PHB#1 Diag-data (Version: 1)
[ 115.722686] brdgCtl: 0000ffff
[ 115.722687] UtlSts: 00200000 00000000 00000000
[ 115.722689] RootSts: ffffffff ffffffff ffffffff ffffffff 0000ffff
[ 115.722690] RootErrSts: ffffffff ffffffff ffffffff
[ 115.722691] RootErrLog: ffffffff ffffffff ffffffff ffffffff
[ 115.722693] RootErrLog1: ffffffff 0000000000000000 0000000000000000
[ 115.722694] nFir: 0000809000000000 0030006e00000000 0000800000000000
[ 115.722695] PhbSts: 0000001800000000 0000001800000000
[ 115.722697] Lem: 8000020000800000 40018e2400022482 8000000000000000
[ 115.722699] OutErr: 8000002000000000 8000000000000000 1210066000020003 0000c00000000000
[ 115.722700] InBErr: 0000000040000000 0000000040000000 0000080000000000 000c104010010000
[ 115.722702] EEH: Reset without hotplug activity
[ 120.232880] EEH: Notify device drivers the completion of reset
[ 120.232891] cxl-pci 0001:01:00.0: enabling dev...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in ubuntu-power-systems:
status: Incomplete → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
Revision history for this message
Manoj Iyer (manjo) wrote :

Since 4.10 kernel was replaced with 4.13 linux-hwe, would it make sense to resubmit these patches for 4.13 (Artful)? Would you (jsalisbury) be able to add a target series for Artful for tracking?

Stefan Bader (smb)
Changed in linux (Ubuntu Zesty):
status: In Progress → Won't Fix
Manoj Iyer (manjo)
tags: added: triage-a
removed: triage-g
Changed in linux (Ubuntu Artful):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
status: In Progress → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: In Progress → Fix Released
Frank Heimes (fheimes)
tags: added: triage-g
removed: triage-a
bugproxy (bugproxy)
tags: added: targetmilestone-inin1710
removed: targetmilestone-inin1704
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.