[LTCTest][OPAL][OP910.20] WARNING: CPU: 97 PID: 11965 at /build/linux-0zaMZw/linux-4.15.0/kernel/sched/core.c:1189 set_task_cpu+0x240/0x250

Bug #1774964 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Bionic
Fix Released
High
Joseph Salisbury

Bug Description

== SRU Justification ==
IBM reports seeing the following during their testing:
WARNING: CPU: 97 PID: 11965 at /build/linux-0zaMZw/linux-4.15.0/kernel/sched/core.c:1189 set_task_cpu+0x240/0x250

This is a regression and was introduced by the following two commits in
v4.15-rc1:
01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors")
ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")

This regression is fixed by commit 75ecfb49516c in v4.17-rc3. The
commit was also cc'd to upstream stable, but it is being SRU'd to get
the fix into Ubuntu without waiting for it to come down via stable
updates.

== Fix ==
75ecfb49516c ("powerpc/mce: Fix a bug where mce loops on memory UE.")

== Regression Potential ==
Low. Limited to powerpc. The commit was also cc'd to upstream stable
so it will recieve additional upstream stable review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

== Original Bug Descriptions ==
== Comment: #0 - PAVAMAN SUBRAMANIYAM <> - 2018-04-25 01:59:10 ==
---Problem Description---
WARNING: CPU: 97 PID: 11965 at /build/linux-0zaMZw/linux-4.15.0/kernel/sched/core.c:1189 set_task_cpu+0x240/0x250

---uname output---
Linux ltc-wspoon8 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = P9

---Debugger---
A debugger is not configured

---Steps to Reproduce---
Install a P9 Open Power Hardware with the latest OP910.20 Firmware images.

root@witherspoon:~# cat /etc/os-release
ID="openbmc-phosphor"
NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"
VERSION="ibm-v2.0"
VERSION_ID="ibm-v2.0-0-r46-0-gbed584c"
PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.0"
BUILD_ID="ibm-v2.0-0-r46"
root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION
open-power-witherspoon-v1.21.2-251-ge2e9363-dirty
        buildroot-2017.11-5-g65679be
        skiboot-v5.10.3-op910-1-p240231e
        hostboot-0aa5bed
        linux-4.14.24-openpower1-p3e84190
        petitboot-v1.6.6-pd7224b4
        machine-xml-22224af
        occ-8c5b727
        hostboot-binaries-9bd4056
        capp-ucode-p9-dd2-v3
        sbe-7e02c23

Then we have installed the Ubuntu 18.04 OS on the machine.

root@ltc-wspoon8:~# uname -a
Linux ltc-wspoon8 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-wspoon8:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
root@ltc-wspoon8:~# cat /proc/cpuinfo | tail
cpu : POWER9, altivec supported
clock : 2300.000000MHz
revision : 2.1 (pvr 004e 1201)

timebase : 512000000
platform : PowerNV
model : 8335-GTC........
machine : PowerNV 8335-GTC........
firmware : OPAL
MMU : Radix
root@ltc-wspoon8:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.15.0-20-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-20-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="root=UUID=a2cd572c-9047-4f0a-843b-6996fae3e999 ro quiet splash nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

root@ltc-wspoon8:~# ps -ef | grep opal
root 880 2 0 01:25 ? 00:00:00 [kopald]
root 3604 1 2 01:25 ? 00:00:03 /usr/sbin/opal-prd
root 4858 4278 0 01:28 pts/0 00:00:00 grep --color=auto opal

root@ltc-wspoon8:~# service opal-prd status
? opal-prd.service - OPAL PRD daemon
   Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-04-25 01:25:48 CDT; 2min 43s ago
     Docs: man:opal-prd(8)
 Main PID: 3604 (opal-prd)
    Tasks: 1 (limit: 22118)
   CGroup: /system.slice/opal-prd.service
           ??3604 /usr/sbin/opal-prd

Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: IMAGE: hbrt_init complete, version 0290000000000000
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: hservices_init done
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: calling enable_attns
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: ATTN_SLOW:I>>>ATTN_RT::enableAttns
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: ATTN_SLOW:I>Service::enableAttns() enter
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: ATTN_SLOW:I>Service::enableAttns() exit
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: ATTN_SLOW:I><<ATTN_RT::enableAttns rc: 0
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: calling get_ipoll_events
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: HBRT: enabling IPOLL events 0x5b90000000000000
Apr 25 01:25:52 ltc-wspoon8 opal-prd[3604]: FW: writing init message

We try to inject the Machine Check Memory UE error using scom utilities.

root@ltc-wspoon8:~# ./probe_cpus.sh -L
CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs: 0 1 2 3
CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs: 4 5 6 7
CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs: 8 9 10 11
CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs: 12 13 14 15
CHIP ID: 0 CORE ID: 4 THREADS: 4 CPUs: 16 17 18 19
CHIP ID: 0 CORE ID: 5 THREADS: 4 CPUs: 20 21 22 23
CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs: 24 25 26 27
CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs: 28 29 30 31
CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs: 32 33 34 35
CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs: 36 37 38 39
CHIP ID: 0 CORE ID: 14 THREADS: 4 CPUs: 40 41 42 43
CHIP ID: 0 CORE ID: 15 THREADS: 4 CPUs: 44 45 46 47
CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs: 48 49 50 51
CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs: 52 53 54 55
CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs: 56 57 58 59
CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs: 60 61 62 63
CHIP ID: 0 CORE ID: 22 THREADS: 4 CPUs: 64 65 66 67
CHIP ID: 0 CORE ID: 23 THREADS: 4 CPUs: 68 69 70 71
CHIP ID: 8 CORE ID: 0 THREADS: 4 CPUs: 72 73 74 75
CHIP ID: 8 CORE ID: 1 THREADS: 4 CPUs: 76 77 78 79
CHIP ID: 8 CORE ID: 2 THREADS: 4 CPUs: 80 81 82 83
CHIP ID: 8 CORE ID: 3 THREADS: 4 CPUs: 84 85 86 87
CHIP ID: 8 CORE ID: 4 THREADS: 4 CPUs: 88 89 90 91
CHIP ID: 8 CORE ID: 5 THREADS: 4 CPUs: 92 93 94 95
CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs: 96 97 98 99
CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs: 100 101 102 103
CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs: 104 105 106 107
CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs: 108 109 110 111
CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs: 112 113 114 115
CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs: 116 117 118 119
CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs: 120 121 122 123
CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs: 124 125 126 127
CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs: 128 129 130 131
CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs: 132 133 134 135
CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs: 136 137 138 139
CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs: 140 141 142 143

-----------------------------
p[0]
   eq[0,1,2,3,4,5]
   ex[0,1,2,4,5,7,8,9,11]
    c[0,1,2,3,4,5,8,9,10,11,14,15,16,17,18,19,22,23]
p[8]
   eq[0,1,2,3,4]
   ex[0,1,2,3,5,6,7,8,9]
    c[0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17,18,19]
-----------------------------

----------Processor Layout-------------------
p[0]
        +---EQ00----+ +---EQ02----+ +---EQ04----+
        |EX-0 C0 | |EX-4 C8 | |EX-8 C16|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-0 C1 | |EX-4 C9 | |EX-8 C17|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-1 C2 | |EX-5 C10| |EX-9 C18|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-1 C3 | |EX-5 C11| |EX-9 C19|
        +-----------+ +-----------+ +-----------+

        +---EQ01----+ +---EQ03----+ +---EQ05----+
        |EX-2 C4 | | | | |
        + - - - - - + + - - - - - + + - - - - - +
        |EX-2 C5 | | | | |
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-7 C14| |EX-11 C22|
        + - - - - - + + - - - - - + + - - - - - +
        | | |EX-7 C15| |EX-11 C23|
        +-----------+ +-----------+ +-----------+

p[8]
        +---EQ00----+ +---EQ02----+ +---EQ04----+
        |EX-0 C0 | | | |EX-8 C16|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-0 C1 | | | |EX-8 C17|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-1 C2 | |EX-5 C10| |EX-9 C18|
        + - - - - - + + - - - - - + + - - - - - +
        |EX-1 C3 | |EX-5 C11| |EX-9 C19|
        +-----------+ +-----------+ +-----------+

        +---EQ01----+ +---EQ03----+ +---EQ05----+
        |EX-2 C4 | |EX-6 C12| | |
        + - - - - - + + - - - - - + + - - - - - +
        |EX-2 C5 | |EX-6 C13| | |
        + - - - - - + + - - - - - + + - - - - - +
        |EX-3 C6 | |EX-7 C14| | |
        + - - - - - + + - - - - - + + - - - - - +
        |EX-3 C7 | |EX-7 C15| | |
        +-----------+ +-----------+ +-----------+

root@ltc-wspoon8:~# ./statedisable.sh
./statedisable.sh: line 10: /sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory
./statedisable.sh: line 11: /sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory

root@ltc-wspoon8:~# ./run_workload.sh

root@ltc-wspoon8:~# ./scom_addr_p9.sh 0x1001080c 7
EQ[ 1]: 0x1101080c
EX[ 3]: 0x11010c0c
 C[ 7]: 0x3701080c
root@ltc-wspoon8:~# ./skiboot/external/xscom-utils/getscom -c 0x8 0x11010c0c
0000000000000000

root@ltc-wspoon8:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x11010c0c 0c00000000000000
0c00000000000000

We see the following call traces in the kernel and there is no MCE recovered messages which was the expected output.

Ubuntu 18.04 LTS ltc-wspoon8 hvc0

ltc-wspoon8 login: [ 191.741142] Severe Machine check interrupt [Not recovered]
[ 191.741160] NIP [c000000000181b08]: osq_lock+0xb8/0x210
[ 191.741161] Initiator: CPU
[ 191.741163] Error type: UE [Load/Store]
[ 191.741166] opal: Hardware platform error: Unrecoverable Machine Check exception
[ 191.741172] CPU: 123 PID: 11888 Comm: find Tainted: G M 4.15.0-20-generic #21-Ubuntu
[ 191.741174] NIP: c000000000181b08 LR: c000000000cfa740 CTR: c000000000497f90
[ 191.741177] REGS: c000000007963d80 TRAP: 0200 Tainted: G M (4.15.0-20-generic)
[ 191.741178] MSR: 9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002882 XER: 00000000
[ 191.741188] CFAR: c000000000181b54 DAR: 00002018faf69194 DSISR: 00008000 SOFTE: 1
[ 191.741188] GPR00: c000000000cfa740 c000201857d47a30 c0000000016eae00 c0000000015c6b2c
[ 191.741188] GPR04: 0000000000000000 0000000000000000 c0000000017807c0 c000000007a20000
[ 191.741188] GPR08: c0002018faf69180 c0002018fb5e9180 c0002018faf69180 0000000000000000
[ 191.741188] GPR12: 0000000084002888 c000000007a74900 00000d02693c2b80 0000000000000000
[ 191.741188] GPR16: 0000000000000000 ffffffffffffff9c 00007fffc6e73f68 00000d02693e9510
[ 191.741188] GPR20: 0000000000000001 0000000000000000 fffffffffffffff6 0000000000000000
[ 191.741188] GPR24: c000201857d47c90 c0002018cdad201c fffffffffffff000 0000000000000004
[ 191.741188] GPR28: 0000000000000002 c0000000015c6b2c 0000000000000001 c0000000015c6b20
[ 191.741219] NIP [c000000000181b08] osq_lock+0xb8/0x210
[ 191.741224] LR [c000000000cfa740] __mutex_lock.isra.0+0x440/0x6e0
[ 191.741225] Call Trace:
[ 191.741229] [c000201857d47a30] [c000000000cfa338] __mutex_lock.isra.0+0x38/0x6e0 (unreliable)
[ 191.741234] [c000201857d47ac0] [c000000000497fe0] kernfs_iop_permission+0x50/0xb0
[ 191.741238] [c000201857d47b00] [c0000000003e43f4] __inode_permission+0x1a4/0x270
[ 191.741241] [c000201857d47b50] [c0000000003e8bcc] link_path_walk+0x62c/0x6c0
[ 191.741243] [c000201857d47bf0] [c0000000003eacbc] path_openat+0xac/0x3e0
[ 191.741247] [c000201857d47c70] [c0000000003ec570] do_filp_open+0x80/0x120
[ 191.741253] [c000201857d47da0] [c0000000003cfae8] do_sys_open+0x248/0x3f0
[ 191.741257] [c000201857d47e30] [c00000000000b184] system_call+0x58/0x6c
[ 191.741259] Instruction dump:
[ 191.741261] 81490010 2faa0000 409e0160 782a0464 e94a0080 714a0004 40820068 3cc20009
[ 191.741267] 38c659c0 60420000 e9490008 e8e60000 <814a0014> 394affff 7d4a07b4 1d4a0b00
[ 191.743669] Severe Machine check interrupt [Recovered]
[ 191.743706] NIP [c000000000181b3c]: osq_lock+0xec/0x210
[ 191.743740] Initiator: CPU
[ 191.743766] Error type: UE [Load/Store]
[ 191.743811] WARNING: CPU: 97 PID: 11965 at /build/linux-0zaMZw/linux-4.15.0/kernel/sched/core.c:1189 set_task_cpu+0x240/0x250
[ 191.743885] Modules linked in: binfmt_misc ofpart cmdlinepart idt_89hpesx at24 opal_prd powernv_flash ipmi_powernv ipmi_devintf mtd vmx_crypto uio_pdrv_genirq ipmi_msghandler uio ibmpowernv sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_core nouveau ast i2c_algo_bit mlx5_core ttm drm_kms_helper syscopyarea sysfillrect uas sysimgblt fb_sys_fops usb_storage ahci mlxfw crct10dif_vpmsum crc32c_vpmsum drm tg3 libahci devlink
[ 191.744292] CPU: 97 PID: 11965 Comm: find Tainted: G M 4.15.0-20-generic #21-Ubuntu
[ 191.744350] NIP: c00000000014d6e0 LR: c00000000014e30c CTR: c00000000015a240
[ 191.744401] REGS: c00020185d9eb1e0 TRAP: 0700 Tainted: G M (4.15.0-20-generic)
[ 191.744458] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28008284 XER: 00000000
[ 191.744516] CFAR: c00000000014d54c SOFTE: 0
[ 191.744516] GPR00: c00000000014e30c c00020185d9eb460 c0000000016eae00 c000001f14647300
[ 191.744516] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 191.744516] GPR08: c000000001721ee0 0000000000000000 0000000000000000 9000000000001003
[ 191.744516] GPR12: 0000000028008224 c000000007a62b00 000003b558292b80 0000000000000000
[ 191.744516] GPR16: 0000000000000000 ffffffffffffff9c 00007fffdde92a38 000003b5582bbef0
[ 191.744516] GPR20: 0000000000000001 0000000000000000 fffffffffffffff6 c00020185d9eb5e0
[ 191.744516] GPR24: c000001f14647728 c00000000171dd78 c0000000011d8580 0000000000000000
[ 191.744516] GPR28: 0000000000000004 0000000000000000 0000000000000000 c000001f14647300
[ 191.749630] NIP [c00000000014d6e0] set_task_cpu+0x240/0x250
[ 191.749709] LR [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[ 191.749804] Call Trace:
[ 191.749846] [c00020185d9eb460] [c0000000011d8580] runqueues+0x0/0xc00 (unreliable)
[ 191.749943] [c00020185d9eb4a0] [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[ 191.750072] [c00020185d9eb520] [c0000000001725d8] autoremove_wake_function+0x28/0x70
[ 191.750199] [c00020185d9eb550] [c000000000171b60] __wake_up_common+0xd0/0x200
[ 191.750316] [c00020185d9eb5c0] [c000000000171d4c] __wake_up_common_lock+0xbc/0x110
[ 191.750444] [c00020185d9eb650] [c00000000018ea40] wake_up_klogd_work_func+0x60/0xc0
[ 191.750573] [c00020185d9eb680] [c000000000295d10] irq_work_run_list+0xb0/0x100
[ 191.750713] [c00020185d9eb6c0] [c000000000024ab4] __timer_interrupt+0x254/0x260
[ 191.750841] [c00020185d9eb710] [c000000000024d08] timer_interrupt+0x98/0xe0
[ 191.750949] [c00020185d9eb740] [c000000000009014] decrementer_common+0x114/0x120
[ 191.751079] --- interrupt: 901 at osq_lock+0xec/0x210
[ 191.751079] LR = __mutex_lock.isra.0+0x440/0x6e0
[ 191.751255] [c00020185d9eba30] [c000000000cfa338] __mutex_lock.isra.0+0x38/0x6e0 (unreliable)
[ 191.751402] [c00020185d9ebac0] [c000000000497fe0] kernfs_iop_permission+0x50/0xb0
[ 191.751530] [c00020185d9ebb00] [c0000000003e43f4] __inode_permission+0x1a4/0x270
[ 191.751658] [c00020185d9ebb50] [c0000000003e8bcc] link_path_walk+0x62c/0x6c0
[ 191.751785] [c00020185d9ebbf0] [c0000000003eacbc] path_openat+0xac/0x3e0
[ 191.751894] [c00020185d9ebc70] [c0000000003ec570] do_filp_open+0x80/0x120
[ 191.752003] [c00020185d9ebda0] [c0000000003cfae8] do_sys_open+0x248/0x3f0
[ 191.752112] [c00020185d9ebe30] [c00000000000b184] system_call+0x58/0x6c
[ 191.752229] Instruction dump:
[ 191.752299] 7faa3670 7d4a0194 57a706be 7d4a07b4 794a1f24 7d28502a 7d293c36 71290001
[ 191.752441] 4082fe80 60000000 60000000 60420000 <0fe00000> 4bfffe6c 60000000 60420000
[ 191.752584] ---[ end trace 032f502244013ba3 ]---
[ 309.237017153,0] OPAL: Reboot requested due to Platform error.
[ 309.237089038,3] OPAL: Reboot requested due to Platform error.[ 309.237145569,5] Software initiated checkstop disabled.
[ 309.237200666,5] OPAL: Reboot request...
[ 309.247531874,5] Unable to log error

Stack trace output:
 [ 191.749804] Call Trace:
[ 191.749846] [c00020185d9eb460] [c0000000011d8580] runqueues+0x0/0xc00 (unreliable)
[ 191.749943] [c00020185d9eb4a0] [c00000000014e30c] try_to_wake_up+0x1bc/0x660
[ 191.750072] [c00020185d9eb520] [c0000000001725d8] autoremove_wake_function+0x28/0x70
[ 191.750199] [c00020185d9eb550] [c000000000171b60] __wake_up_common+0xd0/0x200
[ 191.750316] [c00020185d9eb5c0] [c000000000171d4c] __wake_up_common_lock+0xbc/0x110
[ 191.750444] [c00020185d9eb650] [c00000000018ea40] wake_up_klogd_work_func+0x60/0xc0
[ 191.750573] [c00020185d9eb680] [c000000000295d10] irq_work_run_list+0xb0/0x100
[ 191.750713] [c00020185d9eb6c0] [c000000000024ab4] __timer_interrupt+0x254/0x260
[ 191.750841] [c00020185d9eb710] [c000000000024d08] timer_interrupt+0x98/0xe0
[ 191.750949] [c00020185d9eb740] [c000000000009014] decrementer_common+0x114/0x120
[ 191.751079] --- interrupt: 901 at osq_lock+0xec/0x210
[ 191.751079] LR = __mutex_lock.isra.0+0x440/0x6e0
[ 191.751255] [c00020185d9eba30] [c000000000cfa338] __mutex_lock.isra.0+0x38/0x6e0 (unreliable)
[ 191.751402] [c00020185d9ebac0] [c000000000497fe0] kernfs_iop_permission+0x50/0xb0
[ 191.751530] [c00020185d9ebb00] [c0000000003e43f4] __inode_permission+0x1a4/0x270
[ 191.751658] [c00020185d9ebb50] [c0000000003e8bcc] link_path_walk+0x62c/0x6c0
[ 191.751785] [c00020185d9ebbf0] [c0000000003eacbc] path_openat+0xac/0x3e0
[ 191.751894] [c00020185d9ebc70] [c0000000003ec570] do_filp_open+0x80/0x120
[ 191.752003] [c00020185d9ebda0] [c0000000003cfae8] do_sys_open+0x248/0x3f0
[ 191.752112] [c00020185d9ebe30] [c00000000000b184] system_call+0x58/0x6c

== Comment: #1 - PAVAMAN SUBRAMANIYAM <> - 2018-04-25 02:03:31 ==
I had a discussion with Mahesh about this bug and he has suggested to try out with the Patch which has been posted upstream in the below link:

http://patchwork.ozlabs.org/patch/902735/

== Comment: #8 - PAVAMAN SUBRAMANIYAM <> - 2018-06-01 03:09:16 ==
Can we have the patch http://patchwork.ozlabs.org/patch/902735/ which is in upstream to be merged to Ubuntu 18.04 release.

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-167176 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
tags: added: p9 triage-g
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 75ecfb49516c53. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1774964

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (7.0 KiB)

------- Comment From <email address hidden> 2018-06-05 01:26 EDT-------
I have downloaded the test kernel from the link and installed the same.

root@ltc-wspoon8:~# dpkg -i linux-modules-4.15.0-22-generic_4.15.0-22.25~lp1774964_ppc64el.deb
Selecting previously unselected package linux-modules-4.15.0-22-generic.
(Reading database ... 74250 files and directories currently installed.)
Preparing to unpack linux-modules-4.15.0-22-generic_4.15.0-22.25~lp1774964_ppc64el.deb ...
Unpacking linux-modules-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...
Setting up linux-modules-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...

root@ltc-wspoon8:~# dpkg -i linux-image-unsigned-4.15.0-22-generic_4.15.0-22.25~lp1774964_ppc64el.deb
Selecting previously unselected package linux-image-unsigned-4.15.0-22-generic.
(Reading database ... 80003 files and directories currently installed.)
Preparing to unpack linux-image-unsigned-4.15.0-22-generic_4.15.0-22.25~lp1774964_ppc64el.deb ...
Unpacking linux-image-unsigned-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...
Setting up linux-image-unsigned-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...
I: /boot/vmlinux is now a symlink to vmlinux-4.15.0-22-generic
I: /boot/initrd.img is now a symlink to initrd.img-4.15.0-22-generic
Processing triggers for linux-image-unsigned-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-22-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/kdump-tools:
kdump-tools: Generating /var/lib/kdump/initrd.img-4.15.0-22-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-22-generic
Found initrd image: /boot/initrd.img-4.15.0-22-generic
Found linux image: /boot/vmlinux-4.15.0-20-generic
Found initrd image: /boot/initrd.img-4.15.0-20-generic
done

root@ltc-wspoon8:~# dpkg -i linux-modules-extra-4.15.0-22-generic_4.15.0-22.25~lp1774964_ppc64el.deb
(Reading database ... 80006 files and directories currently installed.)
Preparing to unpack linux-modules-extra-4.15.0-22-generic_4.15.0-22.25~lp1774964_ppc64el.deb ...
Unpacking linux-modules-extra-4.15.0-22-generic (4.15.0-22.25~lp1774964) over (4.15.0-22.25~lp1774964) ...
Setting up linux-modules-extra-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...
Processing triggers for linux-image-unsigned-4.15.0-22-generic (4.15.0-22.25~lp1774964) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-22-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/kdump-tools:
kdump-tools: Generating /var/lib/kdump/initrd.img-4.15.0-22-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-22-generic
Found initrd image: /boot/initrd.img-4.15.0-22-generic
Found linux image: /boot/vmlinux-4.15.0-20-generic
Found initrd image: /boot/initrd.img-4.15.0-20-generic
d...

Read more...

no longer affects: linux (Ubuntu Cosmic)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The commit that fixes this bug is in Ubuntu-4.15.0-24 as commit:
9b185721376e powerpc/mce: Fix a bug where mce loops on memory UE.

Changing status to "Fix Released"

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.