TM (Hardware Transactional Memory) instructions halt application on baremetal POWER9 DD2.1

Bug #1799388 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Medium
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Medium
Canonical Kernel Team
Bionic
Fix Released
Medium
Canonical Kernel Team
Cosmic
Fix Released
Medium
Canonical Kernel Team

Bug Description

== Comment: #0 - Michael Ranweiler <email address hidden> - 2018-10-15 08:55:55 ==
+++ This bug was initially created as a clone of Bug #167756 +++

---Problem Description---
TM (Hardware Transactional Memory) instructions halt application on baremetal POWER9 DD2.1

Contact Information = <email address hidden>

---Additional Hardware Info---
POWER9 DD2.1 (pvr 004e 1201) baremetal. Witherspoon.

Machine Type = POWER9 DD2.1 (pvr 004e 1201) baremetal

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 Currently TM (Hardware Transactional Memory) on baremetal POWER9 DD2.1 (pvr 004e 1201) with the latest firmware [1] (Firmware level [2] was also tested with the same result) and running Linux from Linus' upstream tree [3] or powerpc/next branch [4] is not working as expected. Once a TM instruction is executed in a process (any instruction), the process halts completely and has to be killed.

A simple test case is to execute a 'tend.' which accordingly to POWER8 behaviour and the ISA v3.00 if executed "in Non-transactional state is treated as a no-op." [5]. It means that the instruction in the test case below should be treated simply as a 'nop' instruction and the process would terminate normally but instead it halts and the process never terminates:

root@io83:~/gromero# cat t.c
int main() { asm ("tend.;"); }
root@io83:~/gromero# make t
make: 't' is up to date.
root@io83:~/gromero# ./t
^C
<CTRL-C was pressed to kill the process since PC got stuck at 'tend.' instruction forever>

Ubuntu stock kernel 4.15.0-20-generic was also tested with the same result.

I confirmed with Erich Hauptli (FW team) that FW stack levels [1] contain all the TM fixes we have at the moment.

Thus, that issue affects any application that uses TM on a baremetal POWER9 DD2.1 (pvr 004e 1201).

[1]
IBM-witherspoon-ibm-OP9_v1.19_1.160
    op-build-v1.21.2-255-g6ad1636-dirty
    buildroot-2017.11-5-g65679be
    skiboot-v5.10.5-op910-1
    hostboot-ed53939
    linux-4.14.24-openpower1-paed97e8
    petitboot-v1.6.6-p41a158a
    machine-xml-22224af
    occ-ef5d466
    hostboot-binaries-6a92b6d
    capp-ucode-p9-dd2-v3
    sbe-7e02c23

[2]
open-power-witherspoon-v1.22-82-gebe1295-dirty
 buildroot-2018.02.1-6-ga8d11267c2
 skiboot-v6.0-rc1
 hostboot-d9bf361-p6755b85
 occ-f741c41
 linux-4.16.7-openpower1-p945838d
 petitboot-v1.7.1-pd695626
 machine-xml-7cd20a6
 hostboot-binaries-53aece6
 capp-ucode-p9-dd2-v4
 sbe-8e0105e
 hcode-hw050318a.op920

[3] https://github.com/torvalds/linux.git (master)
[4] https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git (next)
[5] Power ISA - Book II, p. 894

Stack trace output:
 no

Oops output:
 no

Userspace tool common name: any tool using TM instruction set

Userspace rpm: not relevant

The userspace tool has the following bit modes: 64-bit

System Dump Info:
  The system is not configured to capture a system dump.

Userspace tool obtained from project website: na

*Additional Instructions for <email address hidden>:
-Attach sysctl -a output output to the bug.
-Attach ltrace and strace of userspace application.

The following patch fixes this issue:

http://patchwork.ozlabs.org/patch/968375/

Author: Michael Neuling <email address hidden>
Date: Tue Sep 11 13:07:56 2018 +1000

    powerpc/tm: Fix HFSCR bit for no suspend case

    Currently on P9N DD2.1 we end up taking infinite TM facility
    unavailable exceptions on the first TM usage by userspace.

    In the special case of TM no suspend (P9N DD2.1), Linux is told TM is
    off via CPU dt-ftrs but told to (partially) use it via
    OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] will be off from
    dt-ftrs but we need to turn it on for the no suspend case.

    This patch fixes this by enabling HFSCR TM in this case.

    Cc: <email address hidden> # 4.15+
    Signed-off-by: Michael Neuling <email address hidden>

== Comment: #2 - Michael Ranweiler <email address hidden> - 2018-10-15 13:52:13 ==
This is in the powerpc -next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=dd9a8c5a87395b6f05552c3b44e42fdc95760552

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-172351 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Frank Heimes (fheimes)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Changed in linux (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit dd9a8c5a87395b6f0555. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1799388

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marking as incomplete while awaiting feedback from IBM on the test kernel.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in linux (Ubuntu Bionic):
status: In Progress → Incomplete
Changed in linux (Ubuntu Cosmic):
status: In Progress → Incomplete
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Lowering to medium priority due to lack of activity.

Changed in ubuntu-power-systems:
importance: High → Medium
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-01-04 01:24 EDT-------
Michael,

Can you please verify above test kernel and update your comments ASAP

Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Cosmic):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Manoj Iyer (manjo) wrote :

This patch was picked up in Cosmic http://bugs.launchpad.net/bugs/1810820 via updates, and should be available in the latest bionic-hwe kernel. Could you please test with the bionic-hwe kernel to see if this is still an issue?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-02-04 01:35 EDT-------
I ran the 4.18.0-14 kernel and on my dd2.2 hardware (two boxes) and ran some tests - I didn't see any regressions from this, looks good.

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Bionic):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Cosmic):
status: Incomplete → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.