Activity log for bug #1848127

Date Who What changed Old value New value Message
2019-10-15 05:39:16 bugproxy bug added bug
2019-10-15 05:39:18 bugproxy tags architecture-ppc64le bugnameltc-177462 severity-critical targetmilestone-inin18042
2019-10-15 05:39:20 bugproxy attachment added MyFFDC logs are attached https://bugs.launchpad.net/bugs/1848127/+attachment/5297218/+files/20190508043654984221_Myffdc.zip
2019-10-15 05:39:22 bugproxy attachment added Backtrace from BMC using pdbg. https://bugs.launchpad.net/bugs/1848127/+attachment/5297219/+files/pdbg_backtrace.txt
2019-10-15 05:39:24 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2019-10-15 05:39:28 bugproxy affects ubuntu kernel-package (Ubuntu)
2019-10-15 05:59:25 Frank Heimes affects kernel-package (Ubuntu) linux (Ubuntu)
2019-10-15 05:59:52 Frank Heimes bug task added ubuntu-power-systems
2019-10-15 06:00:05 Frank Heimes ubuntu-power-systems: status New Triaged
2019-10-15 06:00:12 Frank Heimes ubuntu-power-systems: importance Undecided Critical
2019-10-15 06:00:29 Frank Heimes ubuntu-power-systems: assignee Canonical Kernel Team (canonical-kernel-team)
2019-10-15 13:10:55 Manoj Iyer linux (Ubuntu): assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Canonical Kernel Team (canonical-kernel-team)
2019-10-15 13:10:57 Manoj Iyer linux (Ubuntu): importance Undecided Critical
2019-10-15 13:24:53 Manoj Iyer nominated for series Ubuntu Eoan
2019-10-15 13:24:53 Manoj Iyer bug task added linux (Ubuntu Eoan)
2019-10-15 13:27:16 Manoj Iyer nominated for series Ubuntu Bionic
2019-10-15 13:27:16 Manoj Iyer bug task added linux (Ubuntu Bionic)
2019-10-15 13:27:16 Manoj Iyer nominated for series Ubuntu Disco
2019-10-15 13:27:16 Manoj Iyer bug task added linux (Ubuntu Disco)
2019-10-15 17:23:48 Manoj Iyer linux (Ubuntu Eoan): assignee Canonical Kernel Team (canonical-kernel-team) Manoj Iyer (manjo)
2019-10-16 19:43:39 Manoj Iyer description == Comment: #0 - PAVAMAN SUBRAMANIYAM <pavsubra@in.ibm.com> - 2019-05-07 23:31:20 == Install a P9 Open Power Hardware with the latest OP930 Firmware images built from the upstream op-build git tree. root@witherspoon:~# cat /etc/os-release ID="openbmc-phosphor" NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)" VERSION="ibm-v2.3" VERSION_ID="ibm-v2.3-476-g2d622cb-r32-0-g9973ab0" PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.3" BUILD_ID="ibm-v2.3-476-g2d622cb-r32" root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION open-power-witherspoon-v2.3-rc2-58-g59fd0743 buildroot-2019.02.2-17-g93b841d204 skiboot-v6.3-rc2 hostboot-19a436e occ-58e422d linux-5.0.9-openpower1-p3a4d5a4 petitboot-v1.10.3 machine-xml-a6f4df3 hostboot-binaries-hw043019a.940 capp-ucode-p9-dd2-v4 sbe-249671d hcode-hw040319a.940 Then enable sw xstop manually by using below command: root@ltc-wspoon11:~# nvram -p ibm,skiboot --update-config opal-sw-xstop=enable root@ltc-wspoon11:~# nvram -p ibm,skiboot --print-config "ibm,skiboot" Partition -------------------------- experimental-fast-reset=1 snarf-mode=noooooo opal-sw-xstop=enable Then from the Linux HOST injected the MCE UE Error on the machine as follows: root@ltc-wspoon11:~# ./probe_cpus.sh -L CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs: 0 1 2 3 CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs: 4 5 6 7 CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs: 8 9 10 11 CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs: 12 13 14 15 CHIP ID: 0 CORE ID: 6 THREADS: 4 CPUs: 16 17 18 19 CHIP ID: 0 CORE ID: 7 THREADS: 4 CPUs: 20 21 22 23 CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs: 24 25 26 27 CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs: 28 29 30 31 CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs: 32 33 34 35 CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs: 36 37 38 39 CHIP ID: 0 CORE ID: 12 THREADS: 4 CPUs: 40 41 42 43 CHIP ID: 0 CORE ID: 13 THREADS: 4 CPUs: 44 45 46 47 CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs: 48 49 50 51 CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs: 52 53 54 55 CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs: 56 57 58 59 CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs: 60 61 62 63 CHIP ID: 0 CORE ID: 20 THREADS: 4 CPUs: 64 65 66 67 CHIP ID: 0 CORE ID: 21 THREADS: 4 CPUs: 68 69 70 71 CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs: 72 73 74 75 CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs: 76 77 78 79 CHIP ID: 8 CORE ID: 8 THREADS: 4 CPUs: 80 81 82 83 CHIP ID: 8 CORE ID: 9 THREADS: 4 CPUs: 84 85 86 87 CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs: 88 89 90 91 CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs: 92 93 94 95 CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs: 96 97 98 99 CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs: 100 101 102 103 CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs: 104 105 106 107 CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs: 108 109 110 111 CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs: 112 113 114 115 CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs: 116 117 118 119 CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs: 120 121 122 123 CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs: 124 125 126 127 CHIP ID: 8 CORE ID: 20 THREADS: 4 CPUs: 128 129 130 131 CHIP ID: 8 CORE ID: 21 THREADS: 4 CPUs: 132 133 134 135 CHIP ID: 8 CORE ID: 22 THREADS: 4 CPUs: 136 137 138 139 CHIP ID: 8 CORE ID: 23 THREADS: 4 CPUs: 140 141 142 143 ----------------------------- p[0] eq[0,1,2,3,4,5] ex[0,1,3,4,5,6,8,9,10] c[0,1,2,3,6,7,8,9,10,11,12,13,16,17,18,19,20,21] p[8] eq[1,2,3,4,5] ex[3,4,5,6,7,8,9,10,11] c[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] ----------------------------- ----------Processor Layout------------------- p[0] +---EQ00----+ +---EQ02----+ +---EQ04----+ |EX-0 C0 | |EX-4 C8 | |EX-8 C16| + - - - - - + + - - - - - + + - - - - - + |EX-0 C1 | |EX-4 C9 | |EX-8 C17| + - - - - - + + - - - - - + + - - - - - + |EX-1 C2 | |EX-5 C10| |EX-9 C18| + - - - - - + + - - - - - + + - - - - - + |EX-1 C3 | |EX-5 C11| |EX-9 C19| +-----------+ +-----------+ +-----------+ +---EQ01----+ +---EQ03----+ +---EQ05----+ | | |EX-6 C12| |EX-10 C20| + - - - - - + + - - - - - + + - - - - - + | | |EX-6 C13| |EX-10 C21| + - - - - - + + - - - - - + + - - - - - + |EX-3 C6 | | | | | + - - - - - + + - - - - - + + - - - - - + |EX-3 C7 | | | | | +-----------+ +-----------+ +-----------+ p[8] +---EQ00----+ +---EQ02----+ +---EQ04----+ | | |EX-4 C8 | |EX-8 C16| + - - - - - + + - - - - - + + - - - - - + | | |EX-4 C9 | |EX-8 C17| + - - - - - + + - - - - - + + - - - - - + | | |EX-5 C10| |EX-9 C18| + - - - - - + + - - - - - + + - - - - - + | | |EX-5 C11| |EX-9 C19| +-----------+ +-----------+ +-----------+ +---EQ01----+ +---EQ03----+ +---EQ05----+ | | |EX-6 C12| |EX-10 C20| + - - - - - + + - - - - - + + - - - - - + | | |EX-6 C13| |EX-10 C21| + - - - - - + + - - - - - + + - - - - - + |EX-3 C6 | |EX-7 C14| |EX-11 C22| + - - - - - + + - - - - - + + - - - - - + |EX-3 C7 | |EX-7 C15| |EX-11 C23| +-----------+ +-----------+ +-----------+ root@ltc-wspoon11:~# ./statedisable.sh ./statedisable.sh: line 10: /sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory ./statedisable.sh: line 11: /sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory root@ltc-wspoon11:~# cpupower idle-info CPUidle driver: powernv_idle CPUidle governor: menu analyzing CPU 0: Number of idle states: 7 Available idle states: snooze stop0_lite stop0 stop1 stop2 stop4 stop5 snooze (DISABLED) : Flags/Description: snooze Latency: 0 Usage: 81861 Duration: 29748269 stop0_lite (DISABLED) : Flags/Description: stop0_lite Latency: 1 Usage: 70 Duration: 1982345 stop0 (DISABLED) : Flags/Description: stop0 Latency: 2 Usage: 274 Duration: 125896 stop1 (DISABLED) : Flags/Description: stop1 Latency: 5 Usage: 36 Duration: 4922 stop2 (DISABLED) : Flags/Description: stop2 Latency: 10 Usage: 3745 Duration: 88300041 stop4 (DISABLED) : Flags/Description: stop4 Latency: 100 Usage: 65 Duration: 1048951 stop5 (DISABLED) : Flags/Description: stop5 Latency: 200 Usage: 30377 Duration: 61977191643 root@ltc-wspoon11:~#./run_workload.sh root@ltc-wspoon11:~# ./scom_addr_p9.sh 0x1001080c 15 EQ[ 3]: 0x1301080c EX[ 7]: 0x13010c0c C[15]: 0x3f01080c root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/getscom -c 0x8 0x13010c0c 0000000000000000 root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000 0c00000000000000 root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000 0c00000000000000 After injecting the Machine check error, the HOST Linux stops pinging and the console access to the machine also gets lost. But still the Open BMC shell and GUI still shows that the HOST is in Running state. == Comment: #1 - PAVAMAN SUBRAMANIYAM <pavsubra@in.ibm.com> - 2019-05-07 23:33:31 == The machine is installed with the Ubuntu 18.04 Linux OS. root@ltc-wspoon11:~# uname -a Linux ltc-wspoon11 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:26:19 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux root@ltc-wspoon11:~# cat /etc/os-release NAME="Ubuntu" VERSION="18.04.2 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.2 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic root@ltc-wspoon11:~# cat /proc/cpuinfo | tail cpu : POWER9, altivec supported clock : 2300.000000MHz revision : 2.3 (pvr 004e 1203) timebase : 512000000 platform : PowerNV model : 8335-GTH machine : PowerNV 8335-GTH firmware : OPAL MMU : Radix root@ltc-wspoon11:~# lsmcode Version of System Firmware : Product Name : OpenPOWER Firmware Product Version : witherspoon-v2.3-rc2-58-g59fd0743 Product Extra : skiboot-v6.3-rc2 Product Extra : bmc-firmware-version-2.03 Product Extra : occ-58e422d Product Extra : hostboot-19a436e Product Extra : buildroot-2019.02.2-17-g93b841d204 Product Extra : capp-ucode-p9-dd2-v4 Product Extra : machine-xml-a6f4df3 Product Extra : hostboot-binaries-hw043019a.940 Product Extra : sbe-249671d Product Extra : hcode-hw040319a.940 Product Extra : petitboot-v1.10.3 Product Extra : linux-5.0.9-openpower1-p3a4d5a4 == Comment: #3 - PAVAMAN SUBRAMANIYAM <pavsubra@in.ibm.com> - 2019-05-07 23:42:35 == I quickly tested MCE on op930 build ( IBM-witherspoon-ibm-OP9-v2.2-3.5) with 4.15.0-47-generic and found no hang. But on further investigation I see that the hang issue is seen from kernel version 4.15.0-48-generic and above. Looks like changes that gone in 4.15.0-48-generic version causing the hang issue. Still investigating.... == Comment: #9 - Application Cdeadmin <cdeadmin@us.ibm.com> - 2019-05-22 06:45:07 == ==== State: Working by: jayeshp on 22 May 2019 06:37:27 ==== Any update? == Comment: #11 - MAHESH J. SALGAONKAR <mahesh.salgaonkar@in.ibm.com> - 2019-09-19 04:44:01 == The hang issues should go away with below patch. commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee Author: Balbir Singh <bsingharora@gmail.com> Date: Tue Aug 20 13:43:47 2019 +0530 powerpc/mce: Fix MCE handling for huge pages The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh <bsingharora@gmail.com> [arbab@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org [IMPACT] MCE test renders the system unresponsive on P9 open power hardware (Withersoon) [TEST] A test kernel is available in ppa:ubuntu-power-triage/lp1848127. Please see the [OTHER] section for test details and comment #7 for results with the PPA kernel. [FIX] IBM has identified the following patch that fixes this issue: commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee Author: Balbir Singh <bsingharora@gmail.com> Date: Tue Aug 20 13:43:47 2019 +0530     powerpc/mce: Fix MCE handling for huge pages [REGRESSION POTENTIAL] The patch is applicable the powerpc architecture and limited in scope to MCE handling for huge pages. Patch does not touch any generic code. Regression if any is limited to powerpc MCE handling. [OTHER] == Comment: #0 - PAVAMAN SUBRAMANIYAM <pavsubra@in.ibm.com> - 2019-05-07 23:31:20 == Install a P9 Open Power Hardware with the latest OP930 Firmware images built from the upstream op-build git tree. root@witherspoon:~# cat /etc/os-release ID="openbmc-phosphor" NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)" VERSION="ibm-v2.3" VERSION_ID="ibm-v2.3-476-g2d622cb-r32-0-g9973ab0" PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.3" BUILD_ID="ibm-v2.3-476-g2d622cb-r32" root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION  open-power-witherspoon-v2.3-rc2-58-g59fd0743         buildroot-2019.02.2-17-g93b841d204         skiboot-v6.3-rc2         hostboot-19a436e         occ-58e422d         linux-5.0.9-openpower1-p3a4d5a4         petitboot-v1.10.3         machine-xml-a6f4df3         hostboot-binaries-hw043019a.940         capp-ucode-p9-dd2-v4         sbe-249671d         hcode-hw040319a.940 Then enable sw xstop manually by using below command: root@ltc-wspoon11:~# nvram -p ibm,skiboot --update-config opal-sw-xstop=enable root@ltc-wspoon11:~# nvram -p ibm,skiboot --print-config "ibm,skiboot" Partition -------------------------- experimental-fast-reset=1 snarf-mode=noooooo opal-sw-xstop=enable Then from the Linux HOST injected the MCE UE Error on the machine as follows: root@ltc-wspoon11:~# ./probe_cpus.sh -L CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs: 0 1 2 3 CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs: 4 5 6 7 CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs: 8 9 10 11 CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs: 12 13 14 15 CHIP ID: 0 CORE ID: 6 THREADS: 4 CPUs: 16 17 18 19 CHIP ID: 0 CORE ID: 7 THREADS: 4 CPUs: 20 21 22 23 CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs: 24 25 26 27 CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs: 28 29 30 31 CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs: 32 33 34 35 CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs: 36 37 38 39 CHIP ID: 0 CORE ID: 12 THREADS: 4 CPUs: 40 41 42 43 CHIP ID: 0 CORE ID: 13 THREADS: 4 CPUs: 44 45 46 47 CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs: 48 49 50 51 CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs: 52 53 54 55 CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs: 56 57 58 59 CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs: 60 61 62 63 CHIP ID: 0 CORE ID: 20 THREADS: 4 CPUs: 64 65 66 67 CHIP ID: 0 CORE ID: 21 THREADS: 4 CPUs: 68 69 70 71 CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs: 72 73 74 75 CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs: 76 77 78 79 CHIP ID: 8 CORE ID: 8 THREADS: 4 CPUs: 80 81 82 83 CHIP ID: 8 CORE ID: 9 THREADS: 4 CPUs: 84 85 86 87 CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs: 88 89 90 91 CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs: 92 93 94 95 CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs: 96 97 98 99 CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs: 100 101 102 103 CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs: 104 105 106 107 CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs: 108 109 110 111 CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs: 112 113 114 115 CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs: 116 117 118 119 CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs: 120 121 122 123 CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs: 124 125 126 127 CHIP ID: 8 CORE ID: 20 THREADS: 4 CPUs: 128 129 130 131 CHIP ID: 8 CORE ID: 21 THREADS: 4 CPUs: 132 133 134 135 CHIP ID: 8 CORE ID: 22 THREADS: 4 CPUs: 136 137 138 139 CHIP ID: 8 CORE ID: 23 THREADS: 4 CPUs: 140 141 142 143 ----------------------------- p[0]    eq[0,1,2,3,4,5]    ex[0,1,3,4,5,6,8,9,10]     c[0,1,2,3,6,7,8,9,10,11,12,13,16,17,18,19,20,21] p[8]    eq[1,2,3,4,5]    ex[3,4,5,6,7,8,9,10,11]     c[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] ----------------------------- ----------Processor Layout------------------- p[0]         +---EQ00----+ +---EQ02----+ +---EQ04----+         |EX-0 C0 | |EX-4 C8 | |EX-8 C16|         + - - - - - + + - - - - - + + - - - - - +         |EX-0 C1 | |EX-4 C9 | |EX-8 C17|         + - - - - - + + - - - - - + + - - - - - +         |EX-1 C2 | |EX-5 C10| |EX-9 C18|         + - - - - - + + - - - - - + + - - - - - +         |EX-1 C3 | |EX-5 C11| |EX-9 C19|         +-----------+ +-----------+ +-----------+         +---EQ01----+ +---EQ03----+ +---EQ05----+         | | |EX-6 C12| |EX-10 C20|         + - - - - - + + - - - - - + + - - - - - +         | | |EX-6 C13| |EX-10 C21|         + - - - - - + + - - - - - + + - - - - - +         |EX-3 C6 | | | | |         + - - - - - + + - - - - - + + - - - - - +         |EX-3 C7 | | | | |         +-----------+ +-----------+ +-----------+ p[8]         +---EQ00----+ +---EQ02----+ +---EQ04----+         | | |EX-4 C8 | |EX-8 C16|         + - - - - - + + - - - - - + + - - - - - +         | | |EX-4 C9 | |EX-8 C17|         + - - - - - + + - - - - - + + - - - - - +         | | |EX-5 C10| |EX-9 C18|         + - - - - - + + - - - - - + + - - - - - +         | | |EX-5 C11| |EX-9 C19|         +-----------+ +-----------+ +-----------+         +---EQ01----+ +---EQ03----+ +---EQ05----+         | | |EX-6 C12| |EX-10 C20|         + - - - - - + + - - - - - + + - - - - - +         | | |EX-6 C13| |EX-10 C21|         + - - - - - + + - - - - - + + - - - - - +         |EX-3 C6 | |EX-7 C14| |EX-11 C22|         + - - - - - + + - - - - - + + - - - - - +         |EX-3 C7 | |EX-7 C15| |EX-11 C23|         +-----------+ +-----------+ +-----------+ root@ltc-wspoon11:~# ./statedisable.sh ./statedisable.sh: line 10: /sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory ./statedisable.sh: line 11: /sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory root@ltc-wspoon11:~# cpupower idle-info CPUidle driver: powernv_idle CPUidle governor: menu analyzing CPU 0: Number of idle states: 7 Available idle states: snooze stop0_lite stop0 stop1 stop2 stop4 stop5 snooze (DISABLED) : Flags/Description: snooze Latency: 0 Usage: 81861 Duration: 29748269 stop0_lite (DISABLED) : Flags/Description: stop0_lite Latency: 1 Usage: 70 Duration: 1982345 stop0 (DISABLED) : Flags/Description: stop0 Latency: 2 Usage: 274 Duration: 125896 stop1 (DISABLED) : Flags/Description: stop1 Latency: 5 Usage: 36 Duration: 4922 stop2 (DISABLED) : Flags/Description: stop2 Latency: 10 Usage: 3745 Duration: 88300041 stop4 (DISABLED) : Flags/Description: stop4 Latency: 100 Usage: 65 Duration: 1048951 stop5 (DISABLED) : Flags/Description: stop5 Latency: 200 Usage: 30377 Duration: 61977191643 root@ltc-wspoon11:~#./run_workload.sh root@ltc-wspoon11:~# ./scom_addr_p9.sh 0x1001080c 15 EQ[ 3]: 0x1301080c EX[ 7]: 0x13010c0c  C[15]: 0x3f01080c root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/getscom -c 0x8 0x13010c0c 0000000000000000 root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000 0c00000000000000 root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 0c00000000000000 0c00000000000000 After injecting the Machine check error, the HOST Linux stops pinging and the console access to the machine also gets lost. But still the Open BMC shell and GUI still shows that the HOST is in Running state. == Comment: #1 - PAVAMAN SUBRAMANIYAM <pavsubra@in.ibm.com> - 2019-05-07 23:33:31 == The machine is installed with the Ubuntu 18.04 Linux OS. root@ltc-wspoon11:~# uname -a Linux ltc-wspoon11 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:26:19 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux root@ltc-wspoon11:~# cat /etc/os-release NAME="Ubuntu" VERSION="18.04.2 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.2 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic root@ltc-wspoon11:~# cat /proc/cpuinfo | tail cpu : POWER9, altivec supported clock : 2300.000000MHz revision : 2.3 (pvr 004e 1203) timebase : 512000000 platform : PowerNV model : 8335-GTH machine : PowerNV 8335-GTH firmware : OPAL MMU : Radix root@ltc-wspoon11:~# lsmcode Version of System Firmware :  Product Name : OpenPOWER Firmware  Product Version : witherspoon-v2.3-rc2-58-g59fd0743  Product Extra : skiboot-v6.3-rc2  Product Extra : bmc-firmware-version-2.03  Product Extra : occ-58e422d  Product Extra : hostboot-19a436e  Product Extra : buildroot-2019.02.2-17-g93b841d204  Product Extra : capp-ucode-p9-dd2-v4  Product Extra : machine-xml-a6f4df3  Product Extra : hostboot-binaries-hw043019a.940  Product Extra : sbe-249671d  Product Extra : hcode-hw040319a.940  Product Extra : petitboot-v1.10.3  Product Extra : linux-5.0.9-openpower1-p3a4d5a4 == Comment: #3 - PAVAMAN SUBRAMANIYAM <pavsubra@in.ibm.com> - 2019-05-07 23:42:35 == I quickly tested MCE on op930 build ( IBM-witherspoon-ibm-OP9-v2.2-3.5) with 4.15.0-47-generic and found no hang. But on further investigation I see that the hang issue is seen from kernel version 4.15.0-48-generic and above. Looks like changes that gone in 4.15.0-48-generic version causing the hang issue. Still investigating.... == Comment: #9 - Application Cdeadmin <cdeadmin@us.ibm.com> - 2019-05-22 06:45:07 == ==== State: Working by: jayeshp on 22 May 2019 06:37:27 ==== Any update? == Comment: #11 - MAHESH J. SALGAONKAR <mahesh.salgaonkar@in.ibm.com> - 2019-09-19 04:44:01 == The hang issues should go away with below patch. commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee Author: Balbir Singh <bsingharora@gmail.com> Date: Tue Aug 20 13:43:47 2019 +0530     powerpc/mce: Fix MCE handling for huge pages     The current code would fail on huge pages addresses, since the shift would     be incorrect. Use the correct page shift value returned by     __find_linux_pte() to get the correct physical address. The code is more     generic and can handle both regular and compound pages.     Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")     Signed-off-by: Balbir Singh <bsingharora@gmail.com>     [arbab@linux.ibm.com: Fixup pseries_do_memory_failure()]     Signed-off-by: Reza Arbab <arbab@linux.ibm.com>     Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>     Signed-off-by: Santosh Sivaraj <santosh@fossix.org>     Cc: stable@vger.kernel.org # v4.15+     Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>     Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org
2019-10-17 11:14:03 Kleber Sacilotto de Souza linux (Ubuntu Eoan): status New In Progress
2019-10-21 13:37:23 Manoj Iyer ubuntu-power-systems: status Triaged In Progress
2019-10-28 22:06:35 Manoj Iyer linux (Ubuntu Disco): status New Incomplete
2019-10-28 22:06:38 Manoj Iyer linux (Ubuntu Bionic): status New Incomplete
2019-10-28 22:06:41 Manoj Iyer ubuntu-power-systems: status In Progress Incomplete
2019-10-28 22:06:44 Manoj Iyer linux (Ubuntu Bionic): assignee Manoj Iyer (manjo)
2019-10-28 22:06:46 Manoj Iyer linux (Ubuntu Disco): assignee Manoj Iyer (manjo)
2019-10-28 22:06:52 Manoj Iyer linux (Ubuntu Bionic): importance Undecided Critical
2019-10-28 22:06:53 Manoj Iyer linux (Ubuntu Disco): importance Undecided Critical
2019-11-06 07:20:42 Andrew Cloke ubuntu-power-systems: status Incomplete In Progress
2019-11-06 07:20:48 Andrew Cloke linux (Ubuntu Disco): status Incomplete In Progress
2019-11-06 07:20:50 Andrew Cloke linux (Ubuntu Bionic): status Incomplete In Progress
2019-11-12 17:54:32 Manoj Iyer linux (Ubuntu Bionic): status In Progress Fix Committed
2019-11-12 17:54:35 Manoj Iyer linux (Ubuntu Disco): status In Progress Fix Committed
2019-11-12 17:54:38 Manoj Iyer linux (Ubuntu Eoan): status In Progress Fix Committed
2019-11-12 17:54:44 Manoj Iyer ubuntu-power-systems: status In Progress Fix Committed
2019-11-12 17:54:46 Manoj Iyer linux (Ubuntu): status In Progress Fix Committed
2019-12-02 14:47:26 Patricia Domingues linux (Ubuntu Bionic): status Fix Committed Fix Released
2019-12-02 14:47:32 Patricia Domingues linux (Ubuntu Disco): status Fix Committed Fix Released
2019-12-02 14:51:41 Patricia Domingues linux (Ubuntu Eoan): status Fix Committed Fix Released
2019-12-10 01:06:22 Manoj Iyer linux (Ubuntu): status Fix Committed Fix Released
2019-12-10 01:06:37 Manoj Iyer ubuntu-power-systems: status Fix Committed Fix Released