[LTCTest][OPAL][OP930] Machine hangs after injecting the Machine Check Error

Bug #1848127 reported by bugproxy on 2019-10-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Manoj Iyer
Bionic
Critical
Manoj Iyer
Disco
Critical
Manoj Iyer
Eoan
Critical
Manoj Iyer

Bug Description

[IMPACT]
MCE test renders the system unresponsive on P9 open power hardware (Withersoon)

[TEST]
A test kernel is available in ppa:ubuntu-power-triage/lp1848127. Please see the [OTHER] section for test details and comment #7 for results with the PPA kernel.

[FIX]
IBM has identified the following patch that fixes this issue:
commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee
Author: Balbir Singh <email address hidden>
Date: Tue Aug 20 13:43:47 2019 +0530

    powerpc/mce: Fix MCE handling for huge pages

[REGRESSION POTENTIAL]
The patch is applicable the powerpc architecture and limited in scope to MCE handling for huge pages. Patch does not touch any generic code. Regression if any is limited to powerpc MCE handling.

[OTHER]
== Comment: #0 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2019-05-07 23:31:20 ==
Install a P9 Open Power Hardware with the latest OP930 Firmware images built from the upstream op-build git tree.

root@witherspoon:~# cat /etc/os-release
ID="openbmc-phosphor"
NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"
VERSION="ibm-v2.3"
VERSION_ID="ibm-v2.3-476-g2d622cb-r32-0-g9973ab0"
PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.3"
BUILD_ID="ibm-v2.3-476-g2d622cb-r32"
root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION
 open-power-witherspoon-v2.3-rc2-58-g59fd0743
        buildroot-2019.02.2-17-g93b841d204
        skiboot-v6.3-rc2
        hostboot-19a436e
        occ-58e422d
        linux-5.0.9-openpower1-p3a4d5a4
        petitboot-v1.10.3
        machine-xml-a6f4df3
        hostboot-binaries-hw043019a.940
        capp-ucode-p9-dd2-v4
        sbe-249671d
        hcode-hw040319a.940

Then enable sw xstop manually by using below command:

root@ltc-wspoon11:~# nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
root@ltc-wspoon11:~# nvram -p ibm,skiboot --print-config
"ibm,skiboot" Partition
--------------------------
experimental-fast-reset=1
snarf-mode=noooooo
opal-sw-xstop=enable

Then from the Linux HOST injected the MCE UE Error on the machine as follows:

root@ltc-wspoon11:~# ./probe_cpus.sh -L
CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs: 0 1 2 3
CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs: 4 5 6 7
CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs: 8 9 10 11
CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs: 12 13 14 15
CHIP ID: 0 CORE ID: 6 THREADS: 4 CPUs: 16 17 18 19
CHIP ID: 0 CORE ID: 7 THREADS: 4 CPUs: 20 21 22 23
CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs: 24 25 26 27
CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs: 28 29 30 31
CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs: 32 33 34 35
CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs: 36 37 38 39
CHIP ID: 0 CORE ID: 12 THREADS: 4 CPUs: 40 41 42 43
CHIP ID: 0 CORE ID: 13 THREADS: 4 CPUs: 44 45 46 47
CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs: 48 49 50 51
CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs: 52 53 54 55
CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs: 56 57 58 59
CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs: 60 61 62 63
CHIP ID: 0 CORE ID: 20 THREADS: 4 CPUs: 64 65 66 67
CHIP ID: 0 CORE ID: 21 THREADS: 4 CPUs: 68 69 70 71
CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs: 72 73 74 75
CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs: 76 77 78 79
CHIP ID: 8 CORE ID: 8 THREADS: 4 CPUs: 80 81 82 83
CHIP ID: 8 CORE ID: 9 THREADS: 4 CPUs: 84 85 86 87
CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs: 88 89 90 91
CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs: 92 93 94 95
CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs: 96 97 98 99
CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs: 100 101 102 103
CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs: 104 105 106 107
CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs: 108 109 110 111
CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs: 112 113 114 115
CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs: 116 117 118 119
CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs: 120 121 122 123
CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs: 124 125 126 127
CHIP ID: 8 CORE ID: 20 THREADS: 4 CPUs: 128 129 130 131
CHIP ID: 8 CORE ID: 21 THREADS: 4 CPUs: 132 133 134 135
CHIP ID: 8 CORE ID: 22 THREADS: 4 CPUs: 136 137 138 139
CHIP ID: 8 CORE ID: 23 THREADS: 4 CPUs: 140 141 142 143

-----------------------------
p[0]
   eq[0,1,2,3,4,5]
   ex[0,1,3,4,5,6,8,9,10]
    c[0,1,2,3,6,7,8,9,10,11,12,13,16,17,18,19,20,21]
p[8]
   eq[1,2,3,4,5]
   ex[3,4,5,6,7,8,9,10,11]
    c[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
-----------------------------

----------Processor Layout-------------------
p[0]
        +---EQ00----+ +---EQ02----+ +---EQ04----+
        |EX-0 C0 | |EX-4 C8 | |EX-8 C16|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-0 C1 | |EX-4 C9 | |EX-8 C17|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-1 C2 | |EX-5 C10| |EX-9 C18|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-1 C3 | |EX-5 C11| |EX-9 C19|
        +-----------+ +-----------+ +-----------+

        +---EQ01----+ +---EQ03----+ +---EQ05----+
        | | |EX-6 C12| |EX-10 C20|
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-6 C13| |EX-10 C21|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-3 C6 | | | | |
        + - - - - - + + - - - - - + + - - - - - +
        |EX-3 C7 | | | | |
        +-----------+ +-----------+ +-----------+

p[8]
        +---EQ00----+ +---EQ02----+ +---EQ04----+
        | | |EX-4 C8 | |EX-8 C16|
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-4 C9 | |EX-8 C17|
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-5 C10| |EX-9 C18|
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-5 C11| |EX-9 C19|
        +-----------+ +-----------+ +-----------+

        +---EQ01----+ +---EQ03----+ +---EQ05----+
        | | |EX-6 C12| |EX-10 C20|
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-6 C13| |EX-10 C21|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-3 C6 | |EX-7 C14| |EX-11 C22|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-3 C7 | |EX-7 C15| |EX-11 C23|
        +-----------+ +-----------+ +-----------+

root@ltc-wspoon11:~# ./statedisable.sh
./statedisable.sh: line 10: /sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory
./statedisable.sh: line 11: /sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory

root@ltc-wspoon11:~# cpupower idle-info
CPUidle driver: powernv_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 7
Available idle states: snooze stop0_lite stop0 stop1 stop2 stop4 stop5
snooze (DISABLED) :
Flags/Description: snooze
Latency: 0
Usage: 81861
Duration: 29748269
stop0_lite (DISABLED) :
Flags/Description: stop0_lite
Latency: 1
Usage: 70
Duration: 1982345
stop0 (DISABLED) :
Flags/Description: stop0
Latency: 2
Usage: 274
Duration: 125896
stop1 (DISABLED) :
Flags/Description: stop1
Latency: 5
Usage: 36
Duration: 4922
stop2 (DISABLED) :
Flags/Description: stop2
Latency: 10
Usage: 3745
Duration: 88300041
stop4 (DISABLED) :
Flags/Description: stop4
Latency: 100
Usage: 65
Duration: 1048951
stop5 (DISABLED) :
Flags/Description: stop5
Latency: 200
Usage: 30377
Duration: 61977191643

root@ltc-wspoon11:~#./run_workload.sh

root@ltc-wspoon11:~# ./scom_addr_p9.sh 0x1001080c 15
EQ[ 3]: 0x1301080c
EX[ 7]: 0x13010c0c
 C[15]: 0x3f01080c
root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/getscom -c 0x8 0x13010c0c
0000000000000000
root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000
0c00000000000000
root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000
0c00000000000000

After injecting the Machine check error, the HOST Linux stops pinging and the console access to the machine also gets lost.

But still the Open BMC shell and GUI still shows that the HOST is in Running state.

== Comment: #1 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2019-05-07 23:33:31 ==
The machine is installed with the Ubuntu 18.04 Linux OS.

root@ltc-wspoon11:~# uname -a
Linux ltc-wspoon11 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:26:19 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-wspoon11:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
root@ltc-wspoon11:~# cat /proc/cpuinfo | tail
cpu : POWER9, altivec supported
clock : 2300.000000MHz
revision : 2.3 (pvr 004e 1203)

timebase : 512000000
platform : PowerNV
model : 8335-GTH
machine : PowerNV 8335-GTH
firmware : OPAL
MMU : Radix

root@ltc-wspoon11:~# lsmcode
Version of System Firmware :
 Product Name : OpenPOWER Firmware
 Product Version : witherspoon-v2.3-rc2-58-g59fd0743
 Product Extra : skiboot-v6.3-rc2
 Product Extra : bmc-firmware-version-2.03
 Product Extra : occ-58e422d
 Product Extra : hostboot-19a436e
 Product Extra : buildroot-2019.02.2-17-g93b841d204
 Product Extra : capp-ucode-p9-dd2-v4
 Product Extra : machine-xml-a6f4df3
 Product Extra : hostboot-binaries-hw043019a.940
 Product Extra : sbe-249671d
 Product Extra : hcode-hw040319a.940
 Product Extra : petitboot-v1.10.3
 Product Extra : linux-5.0.9-openpower1-p3a4d5a4

== Comment: #3 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2019-05-07 23:42:35 ==

I quickly tested MCE on op930 build ( IBM-witherspoon-ibm-OP9-v2.2-3.5) with 4.15.0-47-generic and found no hang. But on further investigation I see that the hang issue is seen from kernel version 4.15.0-48-generic and above. Looks like changes that gone in 4.15.0-48-generic version causing the hang issue. Still investigating....

== Comment: #9 - Application Cdeadmin <email address hidden> - 2019-05-22 06:45:07 ==
==== State: Working by: jayeshp on 22 May 2019 06:37:27 ====

Any update?

== Comment: #11 - MAHESH J. SALGAONKAR <email address hidden> - 2019-09-19 04:44:01 ==
The hang issues should go away with below patch.

commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee
Author: Balbir Singh <email address hidden>
Date: Tue Aug 20 13:43:47 2019 +0530

    powerpc/mce: Fix MCE handling for huge pages

    The current code would fail on huge pages addresses, since the shift would
    be incorrect. Use the correct page shift value returned by
    __find_linux_pte() to get the correct physical address. The code is more
    generic and can handle both regular and compound pages.

    Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
    Signed-off-by: Balbir Singh <email address hidden>
    [<email address hidden>: Fixup pseries_do_memory_failure()]
    Signed-off-by: Reza Arbab <email address hidden>
    Tested-by: Mahesh Salgaonkar <email address hidden>
    Signed-off-by: Santosh Sivaraj <email address hidden>
    Cc: <email address hidden> # v4.15+
    Signed-off-by: Michael Ellerman <email address hidden>
    Link: https://<email address hidden>

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-177462 severity-critical targetmilestone-inin18042

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Frank Heimes (fheimes) on 2019-10-15
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Frank Heimes (fheimes) wrote :

Since the mentioned patch "powerpc/mce: Fix MCE handling for huge pages" got upstream with kernel v5.4-rc1 I think it needs to be applied to eoan, disco and bionic.

Manoj Iyer (manjo) on 2019-10-15
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
Manoj Iyer (manjo) on 2019-10-15
Changed in linux (Ubuntu Eoan):
assignee: Canonical Kernel Team (canonical-kernel-team) → Manoj Iyer (manjo)
Manoj Iyer (manjo) wrote :

Please test with the eoan kernel in this PPA https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127/ and report back here. After you have successfully verified this kernel I will submit the SRU for Eoan.

Download full text (7.7 KiB)

------- Comment From <email address hidden> 2019-10-16 02:17 EDT-------
I have installed the eoan kernel in this PPA https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127/+packages and tested again.

root@ltc-wspoon11:~# sudo add-apt-repository ppa:ubuntu-power-triage/lp1848127

More info: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127
Press [ENTER] to continue or Ctrl-c to cancel adding it.

Get:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Ign:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Hit:4 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:6 http://us.ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Ign:7 http://ddebs.ubuntu.com bionic InRelease
Ign:8 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic InRelease
Hit:9 http://us.ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Ign:10 http://ddebs.ubuntu.com bionic-updates InRelease
Err:11 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic Release
404 Not Found [IP: 91.189.95.83 80]
Ign:12 http://ddebs.ubuntu.com bionic-proposed InRelease
Hit:13 http://ddebs.ubuntu.com bionic Release
Hit:15 http://ddebs.ubuntu.com bionic-updates Release
Hit:17 http://ddebs.ubuntu.com bionic-proposed Release
Reading package lists... Done
E: The repository 'http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

root@ltc-wspoon11:~# sudo apt-get update
Get:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Ign:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Hit:4 http://us.ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Ign:6 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic InRelease
Get:7 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease [88.7 kB]
Hit:8 http://us.ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Ign:9 http://ddebs.ubuntu.com bionic InRelease
Err:10 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic Release
404 Not Found [IP: 91.189.95.83 80]
Ign:11 http://ddebs.ubuntu.com bionic-updates InRelease
Ign:12 http://ddebs.ubuntu.com bionic-proposed InRelease
Hit:13 http://ddebs.ubuntu.com bionic Release
Hit:15 http://ddebs.ubuntu.com bionic-updates Release
Hit:17 http://ddebs.ubuntu.com bionic-proposed Release
Reading package lists... Done
E: The repository 'http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by defa...

Read more...

bugproxy (bugproxy) wrote :
Download full text (5.6 KiB)

------- Comment From <email address hidden> 2019-10-16 02:27 EDT-------
Then executed the test again with the eoan kernel installed on the system.

root@ltc-wspoon11:~# uname -a
Linux ltc-wspoon11 5.3.0-18-generic #19~lp1848127+build.2-Ubuntu SMP Tue Oct 15 22:29:26 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

root@ltc-wspoon11:~# ./probe_cpus.sh -L
CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs: 0 1 2 3
CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs: 4 5 6 7
CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs: 8 9 10 11
CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs: 12 13 14 15
CHIP ID: 0 CORE ID: 6 THREADS: 4 CPUs: 16 17 18 19
CHIP ID: 0 CORE ID: 7 THREADS: 4 CPUs: 20 21 22 23
CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs: 24 25 26 27
CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs: 28 29 30 31
CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs: 32 33 34 35
CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs: 36 37 38 39
CHIP ID: 0 CORE ID: 12 THREADS: 4 CPUs: 40 41 42 43
CHIP ID: 0 CORE ID: 13 THREADS: 4 CPUs: 44 45 46 47
CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs: 48 49 50 51
CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs: 52 53 54 55
CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs: 56 57 58 59
CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs: 60 61 62 63
CHIP ID: 0 CORE ID: 20 THREADS: 4 CPUs: 64 65 66 67
CHIP ID: 0 CORE ID: 21 THREADS: 4 CPUs: 68 69 70 71
CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs: 72 73 74 75
CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs: 76 77 78 79
CHIP ID: 8 CORE ID: 8 THREADS: 4 CPUs: 80 81 82 83
CHIP ID: 8 CORE ID: 9 THREADS: 4 CPUs: 84 85 86 87
CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs: 88 89 90 91
CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs: 92 93 94 95
CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs: 96 97 98 99
CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs: 100 101 102 103
CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs: 104 105 106 107
CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs: 108 109 110 111
CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs: 112 113 114 115
CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs: 116 117 118 119
CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs: 120 121 122 123
CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs: 124 125 126 127
CHIP ID: 8 CORE ID: 20 THREADS: 4 CPUs: 128 129 130 131
CHIP ID: 8 CORE ID: 21 THREADS: 4 CPUs: 132 133 134 135
CHIP ID: 8 CORE ID: 22 THREADS: 4 CPUs: 136 137 138 139
CHIP ID: 8 CORE ID: 23 THREADS: 4 CPUs: 140 141 142 143

-----------------------------
p[0]
eq[0,1,2,3,4,5]
ex[0,1,3,4,5,6,8,9,10]
c[0,1,2,3,6,7,8,9,10,11,12,13,16,17,18,19,20,21]
p[8]
eq[1,2,3,4,5]
ex[3,4,5,6,7,8,9,10,11]
c[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
-----------------------------

----------Processor Layout-------------------
p[0]
+---EQ00----+ +---EQ02----+ +---EQ04----+
|EX-0 C0 | |EX-4 C8 | |EX-8 C16|
+ - - - - - + + - - - - - + + - - - - - +
|EX-0 C1 | |EX-4 C9 | |EX-8 C17|
+ - - - - - + + - - - - - + + - - - - - +
|EX-1 C2 | |EX-5 C10| |EX-9 C18|
+ - - - - - + + - - - - - + + - - - - - +
|EX-1 C3 | |EX-5 C11| |EX-9 C19|
+-----------+ +-----------+ +-----------+

+---EQ01----+ +---EQ03----+ +---EQ05----+
| | |EX-6 C12| |EX-10 C20|
+ - - - - - + + - - - - - + + - - - - - +
| | |EX-6 C13| |EX-10 C21|
+ - - - - - + + - -...

Read more...

Manoj Iyer (manjo) on 2019-10-16
description: updated
Changed in linux (Ubuntu Eoan):
status: New → In Progress

This patch has already been applied to Eoan for the following
upstream stable update:

Eoan update: v5.3.6 upstream stable release
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1848039

Thanks,
Kleber

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-10-18 12:22 EDT-------
== Comment: #9 - Application Cdeadmin <email address hidden> - 2019-05-22 06:45:07 ====== State: Working by: cde00 on 18 October 2019 11:22:30 ====

Manoj Iyer (manjo) on 2019-10-21
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Andrew Cloke (andrew-cloke) wrote :

Next steps: wait for the patch to land in Eoan -updates, and then propose SRU for bionic and disco.

Manoj Iyer (manjo) wrote :

IBM,

Please test the Bionic *and* Disco kernels available in PPA: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127

Post the test output from *both* the kernels to this report. Once I get positive test results I can SRU the patches.

Changed in linux (Ubuntu Disco):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Changed in linux (Ubuntu Bionic):
assignee: nobody → Manoj Iyer (manjo)
Changed in linux (Ubuntu Disco):
assignee: nobody → Manoj Iyer (manjo)
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
Changed in linux (Ubuntu Disco):
importance: Undecided → Critical
bugproxy (bugproxy) wrote :
Download full text (9.8 KiB)

------- Comment From <email address hidden> 2019-11-06 01:01 EDT-------
I have installed both the Bionic *and* Disco kernels available in PPA: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127

Then executed the MCE UE tests again on the machine with both the kernels.

root@ltc-wspoon4:~# apt-get install linux-image-unsigned-5.0.0-33-generic/disco
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '5.0.0-33.35~lp1848127+build.1' (lp1848127:19.04/disco [ppc64el]) for 'linux-image-unsigned-5.0.0-33-generic'
The following additional packages will be installed:
linux-modules-5.0.0-33-generic
Suggested packages:
fdutils linux-doc-5.0.0 | linux-source-5.0.0 linux-headers-5.0.0-33-generic
The following NEW packages will be installed:
linux-image-unsigned-5.0.0-33-generic linux-modules-5.0.0-33-generic
0 upgraded, 2 newly installed, 0 to remove and 3 not upgraded.
Need to get 20.7 MB of archives.
After this operation, 106 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu disco/main ppc64el linux-modules-5.0.0-33-generic ppc64el 5.0.0-33.35~lp1848127+build.1 [14.0 MB]
Get:2 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu disco/main ppc64el linux-image-unsigned-5.0.0-33-generic ppc64el 5.0.0-33.35~lp1848127+build.1 [6,748 kB]
Fetched 20.7 MB in 13s (1,546 kB/s)
Selecting previously unselected package linux-modules-5.0.0-33-generic.
(Reading database ... 71699 files and directories currently installed.)
Preparing to unpack .../linux-modules-5.0.0-33-generic_5.0.0-33.35~lp1848127+build.1_ppc64el.deb ...
Unpacking linux-modules-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ...
Selecting previously unselected package linux-image-unsigned-5.0.0-33-generic.
Preparing to unpack .../linux-image-unsigned-5.0.0-33-generic_5.0.0-33.35~lp1848127+build.1_ppc64el.deb ...
Unpacking linux-image-unsigned-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ...
Setting up linux-modules-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ...
Setting up linux-image-unsigned-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ...
I: /boot/vmlinux is now a symlink to vmlinux-5.0.0-33-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.0.0-33-generic
Processing triggers for linux-image-unsigned-5.0.0-33-generic (5.0.0-33.35~lp1848127+build.1) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-5.0.0-33-generic
cryptsetup: WARNING: The initramfs image may not contain cryptsetup binaries
nor crypto modules. If that's on purpose, you may want to uninstall the
'cryptsetup-initramfs' package in order to disable the cryptsetup initramfs
integration and avoid this warning.
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinux-5.0.0-33-generic
Found initrd image: /boot/initrd.img-5.0.0-33-generic
Found linux image: /boot/vmlinux-5.0.0-32-generic
Found ini...

Read more...

Changed in ubuntu-power-systems:
status: Incomplete → In Progress
Changed in linux (Ubuntu Disco):
status: Incomplete → In Progress
Changed in linux (Ubuntu Bionic):
status: Incomplete → In Progress
bugproxy (bugproxy) wrote :
Download full text (14.9 KiB)

------- Comment From <email address hidden> 2019-11-06 01:44 EDT-------
root@ltc-wspoon11:~# add-apt-repository ppa:ubuntu-power-triage/lp1848127

More info: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127
Press [ENTER] to continue or Ctrl-c to cancel adding it.

Get:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Ign:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Hit:4 http://us.ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Hit:6 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Ign:7 http://ddebs.ubuntu.com bionic InRelease
Hit:8 http://us.ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Ign:9 http://ddebs.ubuntu.com bionic-updates InRelease
Hit:10 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic InRelease
Ign:11 http://ddebs.ubuntu.com bionic-proposed InRelease
Hit:12 http://ddebs.ubuntu.com bionic Release
Hit:14 http://ddebs.ubuntu.com bionic-updates Release
Hit:16 http://ddebs.ubuntu.com bionic-proposed Release
Reading package lists... Done
root@ltc-wspoon11:~# apt-get update
Get:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Ign:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67 InRelease
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67 Release [574 B]
Hit:4 http://us.ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Ign:6 http://ddebs.ubuntu.com bionic InRelease
Hit:7 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Hit:8 http://us.ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Hit:9 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic InRelease
Ign:10 http://ddebs.ubuntu.com bionic-updates InRelease
Ign:11 http://ddebs.ubuntu.com bionic-proposed InRelease
Hit:12 http://ddebs.ubuntu.com bionic Release
Hit:14 http://ddebs.ubuntu.com bionic-updates Release
Hit:16 http://ddebs.ubuntu.com bionic-proposed Release
Reading package lists... Done

root@ltc-wspoon11:~# apt-get install linux-image-unsigned-4.15.0-68-generic/bionic
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '4.15.0-68.77~lp1848127+build.1' (lp1848127:18.04/bionic [ppc64el]) for 'linux-image-unsigned-4.15.0-68-generic'
The following additional packages will be installed:
linux-modules-4.15.0-68-generic
Suggested packages:
fdutils linux-doc-4.15.0 | linux-source-4.15.0 linux-headers-4.15.0-68-generic
The following NEW packages will be installed:
linux-image-unsigned-4.15.0-68-generic linux-modules-4.15.0-68-generic
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 18.6 MB of archives.
After this operation, 92.8 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic/main ppc64el linux-modules-4.15.0-68-generic ppc64el 4.15...

Manoj Iyer (manjo) wrote :
Manoj Iyer (manjo) wrote :
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Manoj Iyer (manjo) on 2019-12-10
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers