[OPAL] Assert fail: core/mem_region.c:447:lock_held_by_me(&region->free_list_lock)

Bug #1762913 reported by bugproxy on 2018-04-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
linux (Ubuntu)
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

== Comment: #0 - Application Cdeadmin <email address hidden> - 2018-04-03 05:20:56 ==

== Comment: #1 - Application Cdeadmin <> - 2018-04-05 01:30:56 ==
------- Comment From pridhiviraj 2018-04-05 01:30:25 EDT -------
The disto needs 47712a921bb781caf69fca9eae43be19968816cb this level of commits need to be backported as suggested by Nick piggin.

== Comment: #2 - PAWAN K. SINGH <> - 2018-04-06 03:35:49 ==
which is ----
>>>

From 47712a921bb781caf69fca9eae43be19968816cb Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <email address hidden>
Date: Wed, 17 Jan 2018 22:47:22 +1000
Subject: [PATCH] powerpc/watchdog: remove arch_trigger_cpumask_backtrace

The powerpc NMI IPIs may not be recoverable if they are taken in
some sections of code, and also there have been and still are issues
with taking NMIs (in KVM guest code, in firmware, etc) which makes them
a bit dangerous to use.

Generic code like softlockup detector and rcu stall detectors really
hammer on trigger_*_backtrace, which has lead to further problems
because we've implemented it with the NMI.

So stop providing NMI backtraces for now. Importantly, the powerpc code
uses NMI IPIs in crash/debug, and the SMP hardlockup watchdog. So if the
softlockup and rcu hang detection traces are not being printed because
the CPU is stuck with interrupts off, then the hard lockup watchdog
should get it with the NMI IPI.

Fixes: 2104180a5369 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <email address hidden>
Signed-off-by: Michael Ellerman <email address hidden>
---
 arch/powerpc/include/asm/nmi.h | 4 ----
 arch/powerpc/kernel/watchdog.c | 22 ----------------------
 2 files changed, 26 deletions(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index e97f586..9c80939 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -4,10 +4,6 @@

 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
-extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
- bool exclude_self);
-#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
-
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
 #endif
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 87da80c..3963baa 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -393,25 +393,3 @@ int __init watchdog_nmi_probe(void)
  }
  return 0;
 }
-
-static void handle_backtrace_ipi(struct pt_regs *regs)
-{
- nmi_cpu_backtrace(regs);
-}
-
-static void raise_backtrace_ipi(cpumask_t *mask)
-{
- unsigned int cpu;
-
- for_each_cpu(cpu, mask) {
- if (cpu == smp_processor_id())
- handle_backtrace_ipi(NULL);
- else
- smp_send_nmi_ipi(cpu, handle_backtrace_ipi, 1000000);
- }
-}
-
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
-{
- nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
-}
--
2.7.4

>>>

CVE References

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-166371 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Joseph Salisbury (jsalisbury) wrote :

The Bionic request has been sent. However, because Artful is a stable release, it requires testing before submitting a SRU request.

I built an Artful test kernel with commit 47712a921bb781caf69fca9eae43be19968816cb. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1762913

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Seth Forshee (sforshee) on 2018-04-12
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

------- Comment From <email address hidden> 2018-04-17 15:21 EDT-------
(In reply to comment #9)
> The Bionic request has been sent. However, because Artful is a stable
> release, it requires testing before submitting a SRU request.
>
> I built an Artful test kernel with commit
> 47712a921bb781caf69fca9eae43be19968816cb. The test kernel can be downloaded
> from:
> http://kernel.ubuntu.com/~jsalisbury/lp1762913
>
> Can you test this kernel and see if it resolves this bug?
>
> Note, to test this kernel, you need to install both the linux-image and
> linux-image-extra .deb packages.
>
> Thanks in advance!

Hi
I have tested the above artful test kernel, which works fine. But developer mentioned the fixes really needs in 4.15 bionic kernel(where it has support to send NMI IPI's which causes this bug - r13 clobber bug). In 4.13 kernel there is no support for that, hence those fixes will not have any impact.

Thanks
Pridhiviraj

Joseph Salisbury (jsalisbury) wrote :

I removed the Artful bug task per comment #4.

no longer affects: linux (Ubuntu Artful)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (35.7 KiB)

This bug was fixed in the package linux - 4.15.0-19.20

---------------
linux (4.15.0-19.20) bionic; urgency=medium

  * linux: 4.15.0-19.20 -proposed tracker (LP: #1766021)

  * Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers (LP: #1765232)
    - Revert "blk-mq: simplify queue mapping & schedule with each possisble CPU"
    - Revert "genirq/affinity: assign vectors to all possible CPUs"

linux (4.15.0-18.19) bionic; urgency=medium

  * linux: 4.15.0-18.19 -proposed tracker (LP: #1765490)

  * [regression] Ubuntu 18.04:[4.15.0-17-generic #18] KVM Guest Kernel:
    meltdown: rfi/fallback displacement flush not enabled bydefault (kvm)
    (LP: #1765429)
    - powerpc/pseries: Fix clearing of security feature flags

  * signing: only install a signed kernel (LP: #1764794)
    - [Packaging] update to Debian like control scripts
    - [Packaging] switch to triggers for postinst.d postrm.d handling
    - [Packaging] signing -- switch to raw-signing tarballs
    - [Packaging] signing -- switch to linux-image as signed when available
    - [Config] signing -- enable Opal signing for ppc64el
    - [Packaging] printenv -- add signing options

  * [18.04 FEAT] Sign POWER host/NV kernels (LP: #1696154)
    - [Packaging] signing -- add support for signing Opal kernel binaries

  * Please cherrypick s390 unwind fix (LP: #1765083)
    - s390/compat: fix setup_frame32

  * Ubuntu 18.04 installer does not detect any IPR based HDD/RAID array [S822L]
    [ipr] (LP: #1751813)
    - d-i: move ipr to storage-core-modules on ppc64el

  * drivers/gpu/drm/bridge/adv7511/adv7511.ko missing (LP: #1764816)
    - SAUCE: (no-up) rename the adv7511 drm driver to adv7511_drm

  * Miscellaneous Ubuntu changes
    - [Packaging] Add linux-oem to rebuild test blacklist.

linux (4.15.0-17.18) bionic; urgency=medium

  * linux: 4.15.0-17.18 -proposed tracker (LP: #1764498)

  * Eventual OOM with profile reloads (LP: #1750594)
    - SAUCE: apparmor: fix memory leak when duplicate profile load

linux (4.15.0-16.17) bionic; urgency=medium

  * linux: 4.15.0-16.17 -proposed tracker (LP: #1763785)

  * [18.04] [bug] CFL-S(CNP)/CNL GPIO testing failed (LP: #1757346)
    - [Config]: Set CONFIG_PINCTRL_CANNONLAKE=y

  * [Ubuntu 18.04] USB Type-C test failed on GLK (LP: #1758797)
    - SAUCE: usb: typec: ucsi: Increase command completion timeout value

  * Fix trying to "push" an already active pool VP (LP: #1763386)
    - SAUCE: powerpc/xive: Fix trying to "push" an already active pool VP

  * hisi_sas: Revert and replace SAUCE patches w/ upstream (LP: #1762824)
    - Revert "UBUNTU: SAUCE: scsi: hisi_sas: export device table of v3 hw to
      userspace"
    - Revert "UBUNTU: SAUCE: scsi: hisi_sas: config for hip08 ES"
    - scsi: hisi_sas: modify some register config for hip08
    - scsi: hisi_sas: add v3 hw MODULE_DEVICE_TABLE()

  * Realtek card reader - RTS5243 [VEN_10EC&DEV_5260] (LP: #1737673)
    - misc: rtsx: Move Realtek Card Reader Driver to misc
    - updateconfigs for Realtek Card Reader Driver
    - misc: rtsx: Add support for RTS5260
    - misc: rtsx: Fix symbol clashes

  * Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in
    ./include/linux/net_dim.h (LP: #1...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-23 09:11 EDT-------
is there a patch available for this issue?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-23 09:21 EDT-------
If the comment about the fix, (**bug was fixed in the package linux - 4.15.0-19.20**) is referring to skiboot kernel, then the fix should be available in the latest pnor 518 already. The skiboot kernel level for pnor 518 is **linux-4.16.7-openpower2-p3436ea7**.
is there another fix we need to close this issue?

------- Comment From <email address hidden> 2018-05-23 14:18 EDT-------
Tested with latest kernel, works fine. So closing it. Thanks.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments