Watchdog CPU:19 Hard LOCKUP when kernel crash was triggered

Bug #1790636 reported by bugproxy on 2018-09-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Canonical Kernel Team
Bionic
Critical
Unassigned

Bug Description

== Comment: #0 - Michael Ranweiler - 2018-09-04 01:38:39 ==

Problem:
-----------
Kernel panics and Hardlockup messages when kernel crash was triggered to test kdump.

There are two patches for a problem during a kernel crash:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de6e5d38417e6cdb005843db420a2974993d36ff

And:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0beffc4f4c658fde86d52c837e784326e9cc875

Only the first was marked for stable (and is currently present in 18.04.1), but both should really be included to fix the problems in the commit log. Can you include the second one, too?

bugproxy (bugproxy) on 2018-09-04
tags: added: architecture-ppc64le bugnameltc-171059 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Manoj Iyer (manjo) on 2018-09-04
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
Manoj Iyer (manjo) wrote :

The bug description says that the 1st patch is in bionic (18.04.1) but I was not able to find that commit in bionic (master and master-next). The two patches are in 4.17 and 4.18 stable trees.

-- linux-4.17.y --
ab830707c443 powerpc: smp_send_stop do not offline stopped CPUs

-- linux-4.18.y --
c0beffc4f4c6 powerpc/powernv: Fix opal_event_shutdown() called with interrupts disabled

Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Critical
Manoj Iyer (manjo) wrote :

http://kernel.ubuntu.com/~kamal/lp1790636-lp1790602-ppc/

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
* If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
* If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: New → In Progress
Mike Ranweiler (mranweil) wrote :

Manoj - you are correct, neither is in, and having both in would be very helpful. I confused that with a different commit.

tags: added: kernel-da-key
Manoj Iyer (manjo) on 2018-09-10
Changed in linux (Ubuntu):
status: New → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed

------- Comment From <email address hidden> 2018-09-12 02:26 EDT-------
This bug is difficult to recreate - I don't have a number for about how often it happens. I ran through about 10 cycles (sysrq c) and I haven't been able to recreate it on the test kernel.

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Mike Ranweiler (mranweil) wrote :

I tried the -proposed kernel and it looks good - no regressions and after another 10 runs no recreates of the problem. Thank you.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (23.5 KiB)

This bug was fixed in the package linux - 4.15.0-36.39

---------------
linux (4.15.0-36.39) bionic; urgency=medium

  * CVE-2018-14633
    - iscsi target: Use hex2bin instead of a re-implementation

  * CVE-2018-17182
    - mm: get rid of vmacache_flush_all() entirely

linux (4.15.0-35.38) bionic; urgency=medium

  * linux: 4.15.0-35.38 -proposed tracker (LP: #1791719)

  * device hotplug of vfio devices can lead to deadlock in vfio_pci_release
    (LP: #1792099)
    - SAUCE: vfio -- release device lock before userspace requests

  * L1TF mitigation not effective in some CPU and RAM combinations
    (LP: #1788563)
    - x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
    - x86/speculation/l1tf: Fix off-by-one error when warning that system has too
      much RAM
    - x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+

  * CVE-2018-15594
    - x86/paravirt: Fix spectre-v2 mitigations for paravirt guests

  * CVE-2017-5715 (Spectre v2 s390x)
    - KVM: s390: implement CPU model only facilities
    - s390: detect etoken facility
    - KVM: s390: add etoken support for guests
    - s390/lib: use expoline for all bcr instructions
    - s390: fix br_r1_trampoline for machines without exrl
    - SAUCE: s390: use expoline thunks for all branches generated by the BPF JIT

  * Ubuntu18.04.1: cpuidle: powernv: Fix promotion from snooze if next state
    disabled (performance) (LP: #1790602)
    - cpuidle: powernv: Fix promotion from snooze if next state disabled

  * Watchdog CPU:19 Hard LOCKUP when kernel crash was triggered (LP: #1790636)
    - powerpc: hard disable irqs in smp_send_stop loop
    - powerpc: Fix deadlock with multiple calls to smp_send_stop
    - powerpc: smp_send_stop do not offline stopped CPUs
    - powerpc/powernv: Fix opal_event_shutdown() called with interrupts disabled

  * Security fix: check if IOMMU page is contained in the pinned physical page
    (LP: #1785675)
    - vfio/spapr: Use IOMMU pageshift rather than pagesize
    - KVM: PPC: Check if IOMMU page is contained in the pinned physical page

  * Missing Intel GPU pci-id's (LP: #1789924)
    - drm/i915/kbl: Add KBL GT2 sku
    - drm/i915/whl: Introducing Whiskey Lake platform
    - drm/i915/aml: Introducing Amber Lake platform
    - drm/i915/cfl: Add a new CFL PCI ID.

  * CVE-2018-15572
    - x86/speculation: Protect against userspace-userspace spectreRSB

  * Support Power Management for Thunderbolt Controller (LP: #1789358)
    - thunderbolt: Handle NULL boot ACL entries properly
    - thunderbolt: Notify userspace when boot_acl is changed
    - thunderbolt: Use 64-bit DMA mask if supported by the platform
    - thunderbolt: Do not unnecessarily call ICM get route
    - thunderbolt: No need to take tb->lock in domain suspend/complete
    - thunderbolt: Use correct ICM commands in system suspend
    - thunderbolt: Add support for runtime PM

  * random oopses on s390 systems using NVMe devices (LP: #1790480)
    - s390/pci: fix out of bounds access during irq setup

  * [Bionic] Spectre v4 mitigation (Speculative Store Bypass Disable) support
    for arm64 using SMC firmware call to set a hardware chicken bit
    (LP: #1787993) // CVE-2018...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers