Offlined CPUs of a core fail to come up online on POWER9 DD1 (Ubuntu 17.04)

Bug #1685792 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Unassigned
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Zesty
Fix Released
High
Joseph Salisbury

Bug Description

== Comment: #0 - Ranjal G. Shenoy <email address hidden> - 2017-04-20 08:29:13 ==
Power9 DD1 has a hardware issue due to which in core whose all threads are offlined , when any of the CPUs are onlined, they come online with an incorrect PACA thereby resulting in a crash.

The following fixes sent upstream need to applied to the 17.04 kernel to get CPU-Hotplug working correctly on a Power9 DD1 system.

1) powerpc/linux.git next, commit a7cd88da9704 ("powernv: Move CPU-Offline idle state invocation from smp.c to idle.c").

2) powerpc/linux.git next, the commit 900612315788 ("powerpc/powernv/smp: Add busy-wait loop as fall back for CPU-Hotplug").

3) powerpc/linux.git, commit f3b3f28493d932 ("powerpc/powernv/idle: Don't override default/deepest directly in kernel").

4) powerpc/linux.git next, commit 17ed4c8f81da ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1").

These patches have been backported onto the 17.04 Kernel tagged Ubuntu-4.10.0-19.21.

== Comment: #1 - Ranjal G. Shenoy <email address hidden> - 2017-04-20 08:29:54 ==

== Comment: #2 - Ranjal G. Shenoy <email address hidden> - 2017-04-20 08:30:37 ==

== Comment: #3 - Ranjal G. Shenoy <email address hidden> - 2017-04-20 08:31:15 ==

Revision history for this message
bugproxy (bugproxy) wrote : [1/4] powernv: Move CPU-Offline idle state invocation from smp.c to idle.c

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-153589 severity-high targetmilestone-inin1704
Revision history for this message
bugproxy (bugproxy) wrote : [2/4] powernv:smp: Add busy-wait loop as fall back for CPU-Hotplug

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : [3/4]: powernv:idle: Don't override default/deepest directly in kernel

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : [4/4] powernv: Recover correct PACA on wakeup from a stop on P9 DD1

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
tags: added: kernel-da-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with these four patches. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1685792/

Manoj Iyer (manjo)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-05-04 09:01 EDT-------
(In reply to comment #10)
> I built a test kernel with these four patches. The test kernel can be
> downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1685792/

I have verified that the CPU-Hotplug functions correctly on this kernel on a machine booted with upstream skiboot.

It can be observed that the CPUs were put into stop1 on a hotplug.
============================================================
root@ubuntu:/home/ego# dmesg | grep -i "deep" | grep stop
[ 0.702036] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300331,mask=0x00000000003003ff
[ 0.702185] cpuidle-powernv: Requested Level (RL) value of first deep stop = 0xf
============================================================

Changed in linux (Ubuntu):
status: New → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to submit and SRU for these four patches. However, the fourth patch attached to this bug seems to have changes since it landed in mainline.

The fourth patch posted in comment #4 does apply cleanly to Ubuntu 17.04. However, a cherry pick of 17ed4c8f81da from upstream 4.12-rc1 fails. The cherry pick fails because your patch has:

+ OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);

However, the mainline version has changed to:
+ DEFINE(PACA_SIBLING_PACA_PTRS,

Did you perform a backport of the fourth patch in the version you attached? If so, we can apply it to 17.04 that way, but I just want to confirm.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-05-16 02:51 EDT-------
Hello,

(In reply to comment #12)
> I'd like to submit and SRU for these four patches. However, the fourth
> patch attached to this bug seems to have changes since it landed in mainline.
>
> The fourth patch posted in comment #4 does apply cleanly to Ubuntu 17.04.
> However, a cherry pick of 17ed4c8f81da from upstream 4.12-rc1 fails. The
> cherry pick fails because your patch has:
>
> + OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
>
> However, the mainline version has changed to:
> + DEFINE(PACA_SIBLING_PACA_PTRS,
>
> Did you perform a backport of the fourth patch in the version you attached?
> If so, we can apply it to 17.04 that way, but I just want to confirm.

Yes, the attached patches are backports. Especially the fourth patch as it didn't apply cleanly precisely due to the reason you have alluded to.

Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Manoj Iyer (manjo)
tags: added: ubuntu-17.04
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-06-19 14:12 EDT-------
Making the comment public:

------

Hi Kleber,

As mentioned in Comment #11, I have verified that the fixes integrated into the Kernel provided in Comment #10 fixes the problem.

Is there anything else that you would like me to verify ?

tags: added: verification-done-zesty
removed: verification-needed-zesty
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.7 KiB)

This bug was fixed in the package linux - 4.10.0-26.30

---------------
linux (4.10.0-26.30) zesty; urgency=low

  * linux: 4.10.0-26.30 -proposed tracker (LP: #1700528)

  * CVE-2017-1000364
    - Revert "UBUNTU: SAUCE: mm: Only expand stack if guard area is hit"
    - Revert "mm: do not collapse stack gap into THP"
    - Revert "mm: enlarge stack guard gap"
    - mm: larger stack guard gap, between vmas
    - mm: fix new crash in unmapped_area_topdown()
    - Allow stack to grow up to address space limit

linux (4.10.0-25.29) zesty; urgency=low

  * linux: 4.10.0-25.29 -proposed tracker (LP: #1699028)

  * CVE-2017-1000364
    - SAUCE: mm: Only expand stack if guard area is hit

  * CVE-2017-9074
    - ipv6: Prevent overrun when parsing v6 header options
    - ipv6: Check ip6_find_1stfragopt() return value properly.

  * [Zesty] QDF2400 ARM64 server - NMI watchdog: BUG: soft lockup - CPU#8 stuck
    for 22s! (LP: #1680549)
    - iommu/dma: Stop getting dma_32bit_pfn wrong
    - iommu/dma: Implement PCI allocation optimisation
    - iommu/dma: Convert to address-based allocation
    - iommu/dma: Clean up MSI IOVA allocation
    - iommu/dma: Plumb in the per-CPU IOVA caches
    - iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range

  * Zesty update to 4.10.17 stable release (LP: #1692898)
    - xen: adjust early dom0 p2m handling to xen hypervisor behavior
    - target: Fix compare_and_write_callback handling for non GOOD status
    - target/fileio: Fix zero-length READ and WRITE handling
    - iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
    - usb: xhci: bInterval quirk for TI TUSB73x0
    - usb: host: xhci: print correct command ring address
    - USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
    - USB: Proper handling of Race Condition when two USB class drivers try to
      call init_usb_class simultaneously
    - USB: Revert "cdc-wdm: fix "out-of-sync" due to missing notifications"
    - staging: vt6656: use off stack for in buffer USB transfers.
    - staging: vt6656: use off stack for out buffer USB transfers.
    - staging: gdm724x: gdm_mux: fix use-after-free on module unload
    - staging: wilc1000: Fix problem with wrong vif index
    - staging: comedi: jr3_pci: fix possible null pointer dereference
    - staging: comedi: jr3_pci: cope with jiffies wraparound
    - usb: misc: add missing continue in switch
    - usb: gadget: legacy gadgets are optional
    - usb: Make sure usb/phy/of gets built-in
    - usb: hub: Fix error loop seen after hub communication errors
    - usb: hub: Do not attempt to autosuspend disconnected devices
    - x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
    - selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug
    - x86, pmem: Fix cache flushing for iovec write < 8 bytes
    - um: Fix PTRACE_POKEUSER on x86_64
    - perf/x86: Fix Broadwell-EP DRAM RAPL events
    - KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
    - KVM: arm/arm64: fix races in kvm_psci_vcpu_on
    - arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses
    - block: fix blk_integrity_register to use templ...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
importance: Undecided → High
status: New → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.