Power9 DD 2.2 needs HMI fixup backport of upstream patch(d075745d893c78730e4a3b7a60fca23c2f764081) into kernel

Bug #1751834 reported by bugproxy on 2018-02-26
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Seth Forshee

Bug Description

== Comment: #0 - Satheesh Rajendran <email address hidden> - 2018-02-23 06:55:23 ==
---Problem Description---
Power9 DD 2.2 needs HMI fixup backport of upstream patch(d075745d893c78730e4a3b7a60fca23c2f764081) into kernel

---uname output---
4.15.0-10-generic

Machine Type = power9 boston 2.2 (pvr 004e 1202)

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 commit d075745d893c78730e4a3b7a60fca23c2f764081
Author: Paul Mackerras <email address hidden>
Date: Wed Jan 17 20:51:13 2018 +1100

    KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9

Dependancy commit, looks like already present.
commit 5080332c2c893118dbc18755f35c8b0131cf0fc4
Author: Michael Neuling <email address hidden>
Date: Fri Sep 15 15:25:48 2017 +1000

    powerpc/64s: Add workaround for P9 vector CI load issue

Contact Information = <email address hidden>

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for <email address hidden>:
-Attach sysctl -a output output to the bug.

CVE References

bugproxy (bugproxy) on 2018-02-26
tags: added: architecture-ppc64le bugnameltc-165074 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Andrew Cloke (andrew-cloke) wrote :

When you say "Power9 DD 2.2 needs HMI fixup backport of..." in the original bug description. Is this patchset required to boot P9 DD2.2?

Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g

------- Comment From <email address hidden> 2018-02-27 00:21 EDT-------
(In reply to comment #5)
> When you say "Power9 DD 2.2 needs HMI fixup backport of..." in the original
> bug description. Is this patchset required to boot P9 DD2.2?

We need this patch for KVM guest testing on DD 2.2 revision. Good explanation on why this is required is in commit pointed.

==========================================================_
Hypervisor maintenance interrupts (HMIs) are generated by various
causes, signalled by bits in the hypervisor maintenance exception
register (HMER). In most cases calling OPAL to handle the interrupt
is the correct thing to do, but the "debug trigger" HMIs signalled by
PPC bit 17 (bit 46) of HMER are used to invoke software workarounds
for hardware bugs, and OPAL does not have any code to handle this
cause. The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips
to work around a hardware bug in executing vector load instructions to
cache inhibited memory. In POWER9 DD2.2 chips, it is generated when
conditions are detected relating to threads being in TM (transactional
memory) suspended mode when the core SMT configuration needs to be
reconfigured.

The kernel currently has code to detect the vector CI load condition,
but only when the HMI occurs in the host, not when it occurs in a
guest. If a HMI occurs in the guest, it is always passed to OPAL, and
then we always re-sync the timebase, because the HMI cause might have
been a timebase error, for which OPAL would re-sync the timebase, thus
removing the timebase offset which KVM applied for the guest. Since
we don't know what OPAL did, we don't know whether to subtract the
timebase offset from the timebase, so instead we re-sync the timebase.

This adds code to determine explicitly what the cause of a debug
trigger HMI will be. This is based on a new device-tree property
under the CPU nodes called ibm,hmi-special-triggers, if it is
present, or otherwise based on the PVR (processor version register).
The handling of debug trigger HMIs is pulled out into a separate
function which can be called from the KVM guest exit code. If this
function handles and clears the HMI, and no other HMI causes remain,
then we skip calling OPAL and we proceed to subtract the guest
timebase offset from the timebase.

The overall handling for HMIs that occur in the host (i.e. not in a
KVM guest) is largely unchanged, except that we now don't set the flag
for the vector CI load workaround on DD2.2 processors.

This also removes a BUG_ON in the KVM code. BUG_ON is generally not
useful in KVM guest entry/exit code since it is difficult to handle
the resulting trap gracefully.
==========================================================_

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-02-27 01:52 EDT-------
Ubuntu,

we need this patch to be in bionic kernel asap. Without which we are having hardlock issues with KVM on DD 2.2.

Changed in ubuntu-power-systems:
status: New → Triaged
tags: added: kernel-key
Seth Forshee (sforshee) on 2018-02-27
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Seth Forshee (sforshee)
importance: Undecided → Medium
status: New → In Progress
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: Triaged → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-05 23:31 EDT-------
Any updates on which build would this patch be included? this is currently gating KVM testing on DD 2.2.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-06 09:43 EDT-------
This will be added to canonical bug scrub list for 3/08.

Manoj Iyer (manjo) on 2018-03-06
Changed in ubuntu-power-systems:
status: Fix Committed → Triaged
tags: added: triage-r
removed: triage-g
Manoj Iyer (manjo) on 2018-03-06
Changed in linux (Ubuntu):
importance: Medium → Critical
Diane Brent (drbrent) wrote :

Manoj, Frank - what is availability outlook for this?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-08 10:20 EDT-------
target availability is week of 3/11.

Manoj Iyer (manjo) on 2018-03-12
Changed in ubuntu-power-systems:
status: Triaged → Fix Committed
tags: added: triage-g
removed: triage-r
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-12 23:52 EDT-------
Tested on `4.15.0-12-generic #13-Ubuntu` and found KVM tests are running fine.

This bz can be closed.

Regards,
-Satheesh.

Launchpad Janitor (janitor) wrote :
Download full text (40.0 KiB)

This bug was fixed in the package linux - 4.15.0-12.13

---------------
linux (4.15.0-12.13) bionic; urgency=medium

  * linux: 4.15.0-12.13 -proposed tracker (LP: #1754059)

  * CONFIG_EFI=y on armhf (LP: #1726362)
    - [Config] CONFIG_EFI=y on armhf, reconcile secureboot EFI settings

  * ppc64el: Support firmware disable of RFI flush (LP: #1751994)
    - powerpc/pseries: Support firmware disable of RFI flush
    - powerpc/powernv: Support firmware disable of RFI flush

  * [Feature] CFL/CNL (PCH:CNP-H): New GPIO Commit added (GPIO Driver needed)
    (LP: #1751714)
    - gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation
    - pinctrl: intel: Allow custom GPIO base for pad groups
    - pinctrl: cannonlake: Align GPIO number space with Windows

  * [Feature] Add xHCI debug device support in the driver (LP: #1730832)
    - usb: xhci: Make some static functions global
    - usb: xhci: Add DbC support in xHCI driver
    - [Config] USB_XHCI_DBGCAP=y for commit mainline dfba2174dc42.

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * [Feature] [Graphics]Whiskey Lake (Coffelake-U 4+2) new PCI Device ID adds
    (LP: #1742561)
    - drm/i915/cfl: Adding more Coffee Lake PCI IDs.

  * [Bug] [USB Function][CFL-CNL PCH]Stall Error and USB Transaction Error in
    trace, Disable of device-initiated U1/U2 failed and rebind failed: -517
    during suspend/resume with usb storage. (LP: #1730599)
    - usb: Don't print a warning if interface driver rebind is deferred at resume

  * retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
    - [Packaging] retpoline -- elide %cs:0xNNNN constants on i386
    - [Config] retpoline -- clean up i386 retpoline files

  * hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add io_mem_pfn
    callback") (LP: #1738334)
    - drm/ttm: add ttm_bo_io_mem_pfn to check io_mem_pfn

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * [18.04 FEAT] Automatically detect layer2 setting in the qeth device driver
    (LP: #1747639)
    - s390/diag: add diag26c support for VNIC info
    - s390/qeth: support early setup for z/VM NICs

  * Bionic update to v4.15.7 stable release (LP: #1752317)
    - netfilter: drop outermost socket lock in getsockopt()
    - arm64: mm: don't write garbage into TTBR1_EL1 register
    - kconfig.h: Include compiler types to avoid missed struct attributes
    - MIPS: boot: Define __ASSEMBLY__ for its.S build
    - xtensa: fix high memory/reserved memory collision
    - scsi: ibmvfc: fix misde...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Manoj Iyer (manjo) on 2018-03-19
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers