Ubuntu
linux package

Crash@pcibios_set_pcie_reset_state+0x118/0x280 in capiredp01 with latest level - 160823-GA3-FlashGT

Bug #1645826 reported by bugproxy on 2016-11-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	Undecided	Unassigned
	Xenial	Fix Released	Undecided	Tim Gardner

Bug Description

== Comment: #26 - Andrew Donnellan - 2016-11-24 19:55:52 ==
Ubuntu kernel team, please apply the following fixup to the Xenial kernel tree.

--------------------------------------------------------------

From 631804b1548b035cada4b2c14ab708310a8aa607 Mon Sep 17 00:00:00 2001
From: Gavin Shan <email address hidden>
Date: Mon, 12 Sep 2016 10:50:16 +1000
Subject: [PATCH] powerpc/eeh: Remove EEH_PE_PRI_BUS in full hotplug recovery

commit 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary
bus") was wrongly backporting upstream commit a3aa256b7258: It
should clear the PE's flag (EEH_PE_PRI_BUS) in full hotplug instead
of partial hotplug scenario.

This fixes the issue by clearing EEH_PE_PRI_BUS in full hotplug
scenario only.

Fixes: 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary bus")
Signed-off-by: Gavin Shan <email address hidden>
---
arch/powerpc/kernel/eeh_driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index c453b53..829ab8e 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -630,13 +630,13 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
    * rebuilt when adding PCI devices.
    */
   eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
+ eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   pcibios_add_pci_devices(bus);
  } else if (frozen_bus && removed) {
   pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
   ssleep(5);

   eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
- eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   pcibios_add_pci_devices(frozen_bus);
  }
  eeh_pe_state_clear(pe, EEH_PE_KEEP);
--
2.1.0

Historical context:
==== State: Open by: ukrishn on 08 September 2016 18:15:32 ====

Seems like this is easily recreatable. Mike Vageline just hit the issue by doing couple of PERST on a FlashGT card.

Here is the note from him -
I had downloaded 0908, then perst, modprob'd to verify 0908, then rmmod, then perst to factory, modprob'd, verified it was 0903, rmmod, then perst again to user... xmon

p8tul12-lp1 login: [ 647.501340] Fatal Hypervisor Maintenance interrupt [Recovered]
[ 647.501348] EEH: Fenced PHB#2 detected, location: N/A
[ 647.501528] Error detail: Malfunction Alert
[ 647.501590] HMER: 8040000000000000
[ 647.501637] Unknown Core check stop.
[ 647.502584] Fatal Hypervisor Maintenance interrupt [Recovered]
[ 647.502588] Error detail: Malfunction Alert
[ 647.502590] HMER: 8040000000000000
[ 647.502591] Unknown Core check stop.
[ 665.369299] PCI: Memory resource 0 not set for host bridge /pciex@3fffe40400000/pci@0/device@0 (domain 5)
[ 676.293638] Back level AFU, please upgrade. AFU version 160903N0 interface version 0xffffffffffffffff
[ 676.293842] cxlflash 0005:00:00.0: cxlflash_probe: call to init_afu failed rc=-22!
[ 704.863543] Unable to handle kernel paging request for data at address 0x00000110
[ 704.863673] Faulting instruction address: 0xc000000000083e08
cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
    pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
    lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
    sp: c000000f01cbfa50
   msr: 9000000000009033
   dar: 110
dsisr: 40000000
  current = 0xc000000f014bc8e0
  paca = 0xc000000007b41300 softe: 0 irq_happened: 0x01
    pid = 10688, comm = sh
enter ? for help
[c000000f01cbfad0] c000000000038bb8 pcibios_set_pcie_reset_state+0x118/0x280
[c000000f01cbfb50] c0000000005e9450 pci_set_pcie_reset_state+0x30/0x50
[c000000f01cbfb80] d000000007c9f7bc cxl_pci_reset+0x5c/0xc0 [cxl]
[c000000f01cbfbf0] d000000007c992a4 reset_adapter_store+0x84/0x120 [cxl]
[c000000f01cbfc80] c0000000006d2378 dev_attr_store+0x68/0xa0
[c000000f01cbfcc0] c000000000398290 sysfs_kf_write+0x80/0xb0
[c000000f01cbfd00] c0000000003971a8 kernfs_fop_write+0x188/0x200
[c000000f01cbfd50] c0000000002e1a6c __vfs_write+0x6c/0xe0
[c000000f01cbfd90] c0000000002e27a0 vfs_write+0xc0/0x230
[c000000f01cbfde0] c0000000002e37dc SyS_write+0x6c/0x110
[c000000f01cbfe30] c000000000009204 system_call+0x38/0xb4
--- Exception: c01 (System Call) at 00003fff9c610eb8
SP (3fffdeaa0480) is in userspace
2:mon>

==== State: Open by: ukrishn on 09 September 2016 13:11:49 ====

2:mon> e
cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
    pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
    lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
    sp: c000000f01cbfa50
   msr: 9000000000009033
   dar: 110
dsisr: 40000000
  current = 0xc000000f014bc8e0
  paca = 0xc000000007b41300 softe: 0 irq_happened: 0x01
    pid = 10688, comm = sh
2:mon>

c000000000083df4 4bfb6f25 bl c00000000003ad18 # eeh_pe_bus_get+0x8/0xe0
c000000000083df8 60000000 nop
c000000000083dfc e9230010 ld r9,16(r3)
c000000000083e00 2fa90000 cmpdi cr7,r9,0
c000000000083e04 419e00dc beq cr7,c000000000083ee0 # pnv_eeh_reset+0x140/0x170
c000000000083e08 e9290010 ld r9,16(r9)

R03 = c0000007f7db4800
R09 = 0000000000000100

2:mon> d c0000007f7db4800
c0000007f7db4800 00f8dbf7070000c0 0000000000000000 |................|
c0000007f7db4810 0001000000000000 <<<<< This should have either been a null
or a valid parent pointer.

As Andrew suspected, this could be a memory corruption and the problem seems to
be easily recreatable on Ubuntu 4.4.0-36 Xenial kernel. So far, the scenario has
been that they are doing repeated PERST with unload and reload of cxlflash driver.

1. unload cxlflash
2. PERST
3. modprobe cxlflash

When the above 3 steps are repeated especially after a new AFU image install,
this problem seems to be hit.

== Comment: #26 - Andrew Donnellan - 2016-11-24 19:55:52 ==
Ubuntu kernel team, please apply the following fixup to the Xenial kernel tree.

--------------------------------------------------------------

This fixes the issue by clearing EEH_PE_PRI_BUS in full hotplug
scenario only.

   eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
- eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   pcibios_add_pci_devices(frozen_bus);
  }
  eeh_pe_state_clear(pe, EEH_PE_KEEP);
--
2.1.0

Tags:

Revision history for this message

bugproxy (bugproxy) wrote on 2016-11-29: Patch v2

Patch v2 Edit (1.5 KiB, text/plain)

Default Comment by Bridge

tags:	added: architecture-ppc64le bugnameltc-145701 severity-critical targetmilestone-inin1604
Changed in ubuntu:
assignee:	nobody → Taco Screen team (taco-screen-team)
affects:	ubuntu → linux (Ubuntu)

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2016-11-29:

https://lists.ubuntu.com/archives/kernel-team/2016-November/081180.html

Changed in linux (Ubuntu Xenial):
assignee:	nobody → Tim Gardner (timg-tpi)
status:	New → In Progress
Changed in linux (Ubuntu):
assignee:	Taco Screen team (taco-screen-team) → nobody
status:	New → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2016-11-30: Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-29 23:36 EDT-------
Historical context:==== State: Assigned by: cde00 on 29 November 2016 22:36:16 ====

Luis Henriques (henrix) on 2016-12-08

Changed in linux (Ubuntu Xenial):
status:	In Progress → Fix Committed

Luis Henriques (henrix) on 2017-01-11

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2017-05-09:

Download full text (12.1 KiB)

------- Comment From <email address hidden> 2017-05-09 16:39 EDT-------
This CMVC defect is being cancelled by the CDE Bridge because the corresponding CQ Defect [SW364404] was transferred out of the bridge domain.
Here are the additional details:
New Subsystem = ppc_triage
New Release = unspecified
New Component = ubuntu_linux
New OwnerInfo = Chavez, Luciano (<email address hidden>)
To continue tracking this issue, please follow CQ defect [SW364404].

*** This bug has been marked as a duplicate of bug 134013 ***
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-lts-xenial/linux-lts-xenial_4.4.0-34.53~14.04.1/changelog

Ubuntu 4.4.0-35-generic changelog shows following entries about the fix of this bug.
But, still reporter found this bug on 16.04.1 .

Hi Andrew,

Can you please take a look?

-----------------------------------------

[ Tim Gardner ]

* Release Tracking Bug
- LP: #1546283

* Naples/Zen, NTB Driver (LP: #1542071)
- [Config] CONFIG_NTB_AMD=m
- NTB: Add support for AMD PCI-Express Non-Transparent Bridge

* [Hyper-V] kernel panic occurs when installing Ubuntu Server x32 (LP: #1495983)
- SAUCE: storvsc: use small sg_tablesize on x86

* Enable arm64 emulation of removed ARMv7 instructions (LP: #1545542)
- [Config] CONFIG_ARMV8_DEPRECATED=y

* Surelock-GA2:kernel panic/ exception @ pcibios_set_pcie_reset_state+0x118/0x280 + cxl_reset+0x5c/0xc0 (LP: #1545037)
- powerpc/eeh: Fix stale cached primary bus
----------------------------------

#=#=# 2016-08-30 02:56:17 (CDT) #=#=#
New OwnerInfo = [<email address hidden>]
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#
1) Has the original issue here been resolved? If so, then please retitle the bug to reflect the current issue at hand (and in future open a separate defect). If not - then please reassign back to Surelock team and open a separate defect or reopen 134013.

2) I can't access the logs on 9.114.84.102, connection refused. Please either give me access instructions or put the logs somewhere more accessible.

I have just checked 9.114.84.102 and it is up - and may be it failed with Firewall Authentication.

But I have copied the log in gsa @ -

[ pradghos @ aixbase @ /gsa/dubgsa/home/p/r/pradghos/web/public/SW364404 ]
$ls -l
total 5492
-rwxrwxrwx 1 pradghos pradghos 2726194 Aug 31 05:35 cxlffdc.05_43_29_Aug_29_2016.tgz
-rwxrwxrwx 1 pradghos pradghos 81946 Aug 31 05:35 kern.buff.log
[ pradghos @ aixbase @ /gsa/dubgsa/home/p/r/pradghos/web/public/SW364404 ]
$

After enabling XMON, we are seeing the crash ; but not sure if it is similar to earlier hang or not.
However, we were doing quite similar stuff in both the cases - that is update the AFU image if not updated and reload it(perst).

I would probably change the abstract to indicate the current crash for now.

Thanks!
Pradipta

#=#=# 2016-08-31 06:05:14 (CDT) #=#=#
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

another test - Cflash_FVT_Suite.E_CAPI_LINK_DOWN - where I have observed this crash in latest GT build.
Adding Russell Currey to the Cc list - there's a decent chance this is a bug in generic EEH code if it's anything like the last time we saw this signature.

Pradipta, can you retry this defect with the latest AFU image from T...

------- Comment From cdeadmin@us.ibm.com 2017-05-09 16:39 EDT-------
This CMVC defect is being cancelled by the CDE Bridge because the corresponding CQ Defect [SW364404] was transferred out of the bridge domain.
Here are the additional details:
New Subsystem = ppc_triage
New Release = unspecified
New Component = ubuntu_linux
New OwnerInfo = Chavez, Luciano (chavez@us.ibm.com)
To continue tracking this issue, please follow CQ defect [SW364404].

*** This bug has been marked as a duplicate of bug 134013 ***
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-lts-xenial/linux-lts-xenial_4.4.0-34.53~14.04.1/changelog

Ubuntu 4.4.0-35-generic changelog shows following entries about the fix of this bug.
But, still reporter found this bug on 16.04.1 .

Hi Andrew,

Can you please take a look?

-----------------------------------------

[ Tim Gardner ]

* Release Tracking Bug
- LP: #1546283

*  Naples/Zen, NTB Driver  (LP: #1542071)
- [Config] CONFIG_NTB_AMD=m
- NTB: Add support for AMD PCI-Express Non-Transparent Bridge

* [Hyper-V] kernel panic occurs when installing Ubuntu Server x32 (LP: #1495983)
- SAUCE: storvsc: use small sg_tablesize on x86

* Enable arm64 emulation of removed ARMv7 instructions (LP: #1545542)
- [Config] CONFIG_ARMV8_DEPRECATED=y

* Surelock-GA2:kernel panic/ exception @ pcibios_set_pcie_reset_state+0x118/0x280 + cxl_reset+0x5c/0xc0 (LP: #1545037)
- powerpc/eeh: Fix stale cached primary bus
----------------------------------

#=#=# 2016-08-30 02:56:17 (CDT) #=#=#
New OwnerInfo = [andonnel@au1.ibm.com]
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#
1) Has the original issue here been resolved? If so, then please retitle the bug to reflect the current issue at hand (and in future open a separate defect). If not - then please reassign back to Surelock team and open a separate defect or reopen 134013.

2) I can't access the logs on 9.114.84.102, connection refused. Please either give me access instructions or put the logs somewhere more accessible.

I have just checked 9.114.84.102 and it is up -  and  may be it failed with Firewall Authentication.

But I have copied the log in gsa @ -

[ pradghos @ aixbase @ /gsa/dubgsa/home/p/r/pradghos/web/public/SW364404 ]
$ls -l
total 5492
-rwxrwxrwx    1 pradghos pradghos    2726194 Aug 31 05:35 cxlffdc.05_43_29_Aug_29_2016.tgz
-rwxrwxrwx    1 pradghos pradghos      81946 Aug 31 05:35 kern.buff.log
[ pradghos @ aixbase @ /gsa/dubgsa/home/p/r/pradghos/web/public/SW364404 ]
$

I would probably change the abstract to indicate the current crash for now.

Thanks!
Pradipta

#=#=# 2016-08-31 06:05:14 (CDT) #=#=#
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

another test  - Cflash_FVT_Suite.E_CAPI_LINK_DOWN - where I have observed this crash in latest GT build.
Adding Russell Currey to the Cc list - there's a decent chance this is a bug in generic EEH code if it's anything like the last time we saw this signature.

Pradipta, can you retry this defect with the latest AFU image from Todd ?

I am planning to retry the test on September 9, 2016 India local time.

Thanks!
Pradipta
I suspect this bug is a pure kernel bug which probably isn't influenced by the AFU.

How reproducible is this? These memory corruption issues are very difficult to track down and we need to prioritise accordingly.

Andrew,

#=#=# 2016-09-08 18:15:54 (CDT) #=#=#
subscribe - Vageline, Michael p. (mpvageli@us.ibm.com)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

Thanks Uma and Mike V for the recreate. Do I need to attempt another recreate any more ? ..
Right now, I hit another issue defect#SW359608 in cougarp01 and waiting for capiredp01 to be available.

Thanks!
Pradipta

2:mon> e
cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
sp: c000000f01cbfa50
msr: 9000000000009033
dar: 110
dsisr: 40000000
current = 0xc000000f014bc8e0
paca    = 0xc000000007b41300   softe: 0        irq_happened: 0x01
pid   = 10688, comm = sh
2:mon>

c000000000083df4  4bfb6f25      bl      c00000000003ad18        # eeh_pe_bus_get+0x8/0xe0
c000000000083df8  60000000      nop
c000000000083dfc  e9230010      ld      r9,16(r3)
c000000000083e00  2fa90000      cmpdi   cr7,r9,0
c000000000083e04  419e00dc      beq     cr7,c000000000083ee0    # pnv_eeh_reset+0x140/0x170
c000000000083e08  e9290010      ld      r9,16(r9)

R03 = c0000007f7db4800
R09 = 0000000000000100

2:mon> d c0000007f7db4800
c0000007f7db4800 00f8dbf7070000c0 0000000000000000  |................|

I was looking at the Ubuntu changelog for the recent kernels. This specific fix seems related to this problem.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1603449

This went in 4.4.0-35.54 and that is the kernel we started seeing this crash from. I'm wondering if there is a new timing issue that is exposed with this patch. I would like to have Gavin look at this problem. I will add him to the subscribers to take a look.

#=#=# 2016-09-09 15:19:07 (CDT) #=#=#
subscribe - Shan, Guo wen (gwshan@au1.ibm.com)
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

From Gavin (patch attached to bug as 112119)

-----------------------------------------------------

Hi Uma, it seems commit bf513d68cfbe ("powerpc/eeh: Fix invalid cached PE primary bus") didn't backport
upstream commit a3aa256b7258 ("powerpc/eeh: Fix invalid cached PE primary bus"). I think we have two
options: (A) revert commit bf513d68cfbe and provide fix for correct backporting; (B) have an additional fix
to sort it out.

The attached fix is (B). Please have a try and let me know the result. Please share the machine access
info if it doesn't help.

Thanks,
Gavin

Uma, any updates on this?

Andrew,

I was not able to reliably recreate the crash with or without the fix that Gavin provided. So I requested him to go ahead with backport request of his corrected patch to Ubuntu. I did not hear back from him. If the patch had issues, I would recommend not waiting for a recreate from our end. Please get the corrected patch pulled into Ubuntu.

Thanks !

The attached patch to fix the issue introduced by backporting upstream commit a3aa256b725 ("powerpc/eeh: Fix invalid cached PE primary bus") isn't shown in last ubuntu-xenial kernel. Please suggest who can help pushing the fix to ubuntu-xenial kernel?

Created mirror request (28119) Canonical Launchpad.

Information on this bug will potentially be exposed to the public. Before you proceed, please make sure you read Content Guidelines for LTC Bugzilla : Confidential vs. Non-confidential[1].

[1] - ftp://ausgsa.ibm.com/projects/l/ltc/ToolsInfrastructure/ProjectStatus/Bugzilla/Bugzilla_Content_Education_v2.pdf

Requesting mirroring to Canonical Launchpad to apply backport fix patch

The bug is ready to be mirrored to:

Distro:    Canonical Launchpad.
Project:   ubuntu
Package:   linux

2:mon> e
cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
sp: c000000f01cbfa50
msr: 9000000000009033
dar: 110
dsisr: 40000000
current = 0xc000000f014bc8e0
paca    = 0xc000000007b41300   softe: 0        irq_happened: 0x01
pid   = 10688, comm = sh
2:mon>

c000000000083df4  4bfb6f25      bl      c00000000003ad18        # eeh_pe_bus_get+0x8/0xe0
c000000000083df8  60000000      nop
c000000000083dfc  e9230010      ld      r9,16(r3)
c000000000083e00  2fa90000      cmpdi   cr7,r9,0
c000000000083e04  419e00dc      beq     cr7,c000000000083ee0    # pnv_eeh_reset+0x140/0x170
c000000000083e08  e9290010      ld      r9,16(r9)

R03 = c0000007f7db4800
R09 = 0000000000000100

2:mon> d c0000007f7db4800
c0000007f7db4800 00f8dbf7070000c0 0000000000000000  |................|
c0000007f7db4810 0001000000000000  <<<<< This should have either been a null
or a valid parent pointer.

1. unload cxlflash
2. PERST
3. modprobe cxlflash

When the above 3 steps are repeated especially after a new AFU image install,
this problem seems to be hit.

Seems like this is easily recreatable. Mike Vageline just hit the issue by doing couple of PERST on a FlashGT card.

Here is the note from him -
I had downloaded 0908, then perst, modprob'd to verify 0908, then rmmod, then perst to factory, modprob'd, verified it was 0903, rmmod, then perst again to user... xmon

p8tul12-lp1 login: [  647.501340] Fatal Hypervisor Maintenance interrupt [Recovered]
[  647.501348] EEH: Fenced PHB#2 detected, location: N/A
[  647.501528]  Error detail: Malfunction Alert
[  647.501590] 	HMER: 8040000000000000
[  647.501637] 	Unknown Core check stop.
[  647.502584] Fatal Hypervisor Maintenance interrupt [Recovered]
[  647.502588]  Error detail: Malfunction Alert
[  647.502590] 	HMER: 8040000000000000
[  647.502591] 	Unknown Core check stop.
[  665.369299] PCI: Memory resource 0 not set for host bridge /pciex@3fffe40400000/pci@0/device@0 (domain 5)
[  676.293638] Back level AFU, please upgrade. AFU version 160903N0 interface version 0xffffffffffffffff
[  676.293842] cxlflash 0005:00:00.0: cxlflash_probe: call to init_afu failed rc=-22!
[  704.863543] Unable to handle kernel paging request for data at address 0x00000110
[  704.863673] Faulting instruction address: 0xc000000000083e08
cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
sp: c000000f01cbfa50
msr: 9000000000009033
dar: 110
dsisr: 40000000
current = 0xc000000f014bc8e0
paca    = 0xc000000007b41300	 softe: 0	 irq_happened: 0x01
pid   = 10688, comm = sh
enter ? for help
[c000000f01cbfad0] c000000000038bb8 pcibios_set_pcie_reset_state+0x118/0x280
[c000000f01cbfb50] c0000000005e9450 pci_set_pcie_reset_state+0x30/0x50
[c000000f01cbfb80] d000000007c9f7bc cxl_pci_reset+0x5c/0xc0 [cxl]
[c000000f01cbfbf0] d000000007c992a4 reset_adapter_store+0x84/0x120 [cxl]
[c000000f01cbfc80] c0000000006d2378 dev_attr_store+0x68/0xa0
[c000000f01cbfcc0] c000000000398290 sysfs_kf_write+0x80/0xb0
[c000000f01cbfd00] c0000000003971a8 kernfs_fop_write+0x188/0x200
[c000000f01cbfd50] c0000000002e1a6c __vfs_write+0x6c/0xe0
[c000000f01cbfd90] c0000000002e27a0 vfs_write+0xc0/0x230
[c000000f01cbfde0] c0000000002e37dc SyS_write+0x6c/0x110
[c000000f01cbfe30] c000000000009204 system_call+0x38/0xb4
--- Exception: c01 (System Call) at 00003fff9c610eb8
SP (3fffdeaa0480) is in userspace
2:mon>
Ubuntu kernel team, please apply the following fixup to the Xenial kernel tree.

--------------------------------------------------------------

From 631804b1548b035cada4b2c14ab708310a8aa607 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
Date: Mon, 12 Sep 2016 10:50:16 +1000
Subject: [PATCH] powerpc/eeh: Remove EEH_PE_PRI_BUS in full hotplug recovery

This fixes the issue by clearing EEH_PE_PRI_BUS in full hotplug
scenario only.

Fixes: 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary bus")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
arch/powerpc/kernel/eeh_driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index c453b53..829ab8e 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -630,13 +630,13 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
* rebuilt when adding PCI devices.
*/
eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
+		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pcibios_add_pci_devices(bus);
} else if (frozen_bus && removed) {
pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
ssleep(5);

eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pcibios_add_pci_devices(frozen_bus);
}
eeh_pe_state_clear(pe, EEH_PE_KEEP);
--
2.1.0

Revision history for this message

bugproxy (bugproxy) wrote on 2019-12-03:

------- Comment From <email address hidden> 2019-12-03 02:35 EDT-------
<Note by aixsuper, 2019/12/03 01:22:52 seq: 16 rel: 0 action: modify>
Changing originator of Defect from pradghos to krupa

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Patch v2 Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Crash@pcibios_set_pcie_reset_state+0x118/0x280 in capiredp01 with latest level - 160823-GA3-FlashGT

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package