[LTC Test] Ubuntu 18.04: tm_sigreturn failed on P8 compat mode 16.04.04 guest

Bug #1771439 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Medium
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Xenial
Fix Released
Medium
Joseph Salisbury

Bug Description

== SRU Justification ==
IBM is seeing tm_sigreturn test failures on P8 and P9 hosts. The bad thing
exception is being raised when executing the following line:

c00000000004fcfc: b0 04 03 e8 ld r0,1200(r3)
-> c00000000004fd00: a6 23 02 7c mtspr 130,r0

Which is basically restoring TEXASR in the thread.

ISA says "These registers can be written only when in Non-transactional
state" and the MSR is set to be transactional (suspended):

MSR: 8000000300201033 [ME][RI][IR][DR][LE][SF][HTM][TSU]

That explains why they are getting the "bad thing exception". A mtspr is
being called with a transaction suspended.

This test failure is fixed by upstream commit 78a3e8889b4b.
Upstream commit 78a3e8889b4b is in mainline as of 4.8-rc5.

== Fix ==
78a3e8889b4b ("powerpc: signals: Discard transaction state from signal frames")

== Regression Potential ==
Low. Specific to powerpc.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

This test fails in the same way on a P8 host, so it is nothing to do with P9.

There have been many TM bugs fixed upstream since 4.4. I would suggest starting with commit 044215d145a7 ("powerpc/tm: Fix illegal TM state in signal handler", 2017-08-22) and see if that helps.

The bad thing exception is being raised when executing the following line:

         c00000000004fcfc: b0 04 03 e8 ld r0,1200(r3)
  -> c00000000004fd00: a6 23 02 7c mtspr 130,r0

Which is basically restoring TEXASR in the thread.

ISA says "These registers can be written only when in Non-transactional state" and the MSR is set to be transactional (suspended):

MSR: 8000000300201033 [ME][RI][IR][DR][LE][SF][HTM][TSU]

That explains why we are getting the "bad thing exception". A mtspr is being called with a transaction suspended.

I think we need the following commit to have this fixed:

commit 78a3e8889b4b6b99775ed954696ff3e017f5d19b
Author: Cyril Bur <email address hidden>
Date: Tue Aug 23 10:46:17 2016 +1000

    powerpc: signals: Discard transaction state from signal frames

    Userspace can begin and suspend a transaction within the signal
    handler which means they might enter sys_rt_sigreturn() with the
    processor in suspended state.

    sys_rt_sigreturn() wants to restore process context (which may have
    been in a transaction before signal delivery). To do this it must
    restore TM SPRS. To achieve this, any transaction initiated within the
    signal frame must be discarded in order to be able to restore TM SPRs
    as TM SPRs can only be manipulated non-transactionally..
    >From the PowerPC ISA:
      TM Bad Thing Exception [Category: Transactional Memory]
       An attempt is made to execute a mtspr targeting a TM register in
       other than Non-transactional state.

    Not doing so results in a TM Bad Thing:
    [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
    [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
    [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
    [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
    [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
     nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
     xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
     ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
     uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
     scsi_transport_sas bnx2x ipr mdio libcrc32c
    [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
    [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
    [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
    [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0)
    [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000
    [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
    GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
    GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
    GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
    GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
    GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
    [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
    [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
    [12045.223630] Call Trace:
    [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
    [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
    [12045.223806] Instruction dump:
    [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
    [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
    [12045.224074] ---[ end trace cb8002ee240bae76 ]---

    It isn't clear exactly if there is really a use case for userspace
    returning with a suspended transaction, however, doing so doesn't (on
    its own) constitute a bad frame. As such, this patch simply discards
    the transactional state of the context calling the sigreturn and
    continues.

    Reported-by: Laurent Dufour <email address hidden>
    Signed-off-by: Cyril Bur <email address hidden>
    Tested-by: Laurent Dufour <email address hidden>
    Reviewed-by: Laurent Dufour <email address hidden>
    Acked-by: Simon Guo <email address hidden>
    Signed-off-by: Benjamin Herrenschmidt <email address hidden>

diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
index ba0a2a4..e32fdbb 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -167,6 +167,8 @@ signal will be rolled back anyway.
 For signals taken in non-TM or suspended mode, we use the
 normal/non-checkpointed stack pointer.

+Any transaction initiated inside a sighandler and suspended on return
+from the sighandler to the kernel will get reclaimed and discarded.

 Failure cause codes used by kernel
 ==================================
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index b6aa378..a7daf74 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1226,7 +1226,21 @@ long sys_rt_sigreturn(int r3, int r4, int r5, int r6, int r7, int r8,
   (regs->gpr[1] + __SIGNAL_FRAMESIZE + 16);
  if (!access_ok(VERIFY_READ, rt_sf, sizeof(*rt_sf)))
   goto bad;
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ /*
+ * If there is a transactional state then throw it away.
+ * The purpose of a sigreturn is to destroy all traces of the
+ * signal frame, this includes any transactional state created
+ * within in. We only check for suspended as we can never be
+ * active in the kernel, we are active, there is nothing better to
+ * do than go ahead and Bad Thing later.
+ * The cause is not important as there will never be a
+ * recheckpoint so it's not user visible.
+ */
+ if (MSR_TM_SUSPENDED(mfmsr()))
+ tm_reclaim_current(0);
+
  if (__get_user(tmp, &rt_sf->uc.uc_link))
   goto bad;
  uc_transact = (struct ucontext __user *)(uintptr_t)tmp;
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 7e49984..70409bb 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -676,7 +676,21 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, unsigned long r5,
  if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
   goto badframe;
  set_current_blocked(&set);
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ /*
+ * If there is a transactional state then throw it away.
+ * The purpose of a sigreturn is to destroy all traces of the
+ * signal frame, this includes any transactional state created
+ * within in. We only check for suspended as we can never be
+ * active in the kernel, we are active, there is nothing better to
+ * do than go ahead and Bad Thing later.
+ * The cause is not important as there will never be a
+ * recheckpoint so it's not user visible.
+ */
+ if (MSR_TM_SUSPENDED(mfmsr()))
+ tm_reclaim_current(0);
+
  if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
   goto badframe;
  if (MSR_TM_ACTIVE(msr)) {

== Breno Leitao <email address hidden> ==
That is exactly the commit id that solves the problem.

I was able to cherry pick 78a3e8889b4b6b99775ed954696ff3e017f5d19b on top of Ubuntu-4.4.0-124.148 and now the code works fine.

1604 ? sudo dmesg -c > /dev/null
1604 ? ./tm-sigreturn
test: tm_sigreturn
tags: git_version:v4.17-rc5-0-g67b8d5c
success: tm_sigreturn
1604 ? dmesg
1604 ?

Revision history for this message
bugproxy (bugproxy) wrote : guest sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043
Revision history for this message
bugproxy (bugproxy) wrote : host sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
tags: added: triage-g
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 78a3e8889b4b6b99775ed954696ff3e017f5d19b. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1771439

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.0 KiB)

------- Comment From <email address hidden> 2018-05-17 00:01 EDT-------
Tested with `4.4.0-124-generic #149~lp1771439`:

selftests: tm-resched-dscr
========================================
test: tm_resched_dscr
tags: git_version:unknown
Binding to cpu 8
main test running as pid 2792
Check DSCR TM context switch: OK
success: tm_resched_dscr
ok 1..1 selftests: tm-resched-dscr [PASS]
selftests: tm-syscall
========================================
test: tm_syscall
tags: git_version:unknown
Testing transactional syscalls for 10 seconds...
5464615 active and suspended transactions behaved correctly.
(There were 1565 transaction retries.)
success: tm_syscall
ok 1..2 selftests: tm-syscall [PASS]
selftests: tm-signal-msr-resv
========================================
test: tm_signal_msr_resv
tags: git_version:unknown
success: tm_signal_msr_resv
ok 1..3 selftests: tm-signal-msr-resv [PASS]
selftests: tm-signal-stack
========================================
test: tm_signal_stack
tags: git_version:unknown
success: tm_signal_stack
ok 1..4 selftests: tm-signal-stack [PASS]
selftests: tm-vmxcopy
========================================
test: tm_vmxcopy
tags: git_version:unknown
success: tm_vmxcopy
ok 1..5 selftests: tm-vmxcopy [PASS]
selftests: tm-fork
========================================
test: tm_fork
tags: git_version:unknown
success: tm_fork
ok 1..6 selftests: tm-fork [PASS]
selftests: tm-tar
========================================
Starting, 10000 loops
test: tm_tar
tags: git_version:unknown
success: tm_tar
ok 1..7 selftests: tm-tar [PASS]
selftests: tm-tmspr
========================================
test: tm_tmspr
tags: git_version:unknown
success: tm_tmspr
ok 1..8 selftests: tm-tmspr [PASS]
selftests: tm-vmx-unavail
========================================
test: tm_vmx_unavail_test
tags: git_version:unknown
success: tm_vmx_unavail_test
ok 1..9 selftests: tm-vmx-unavail [PASS]
selftests: tm-unavailable
========================================
test: tm_unavailable_test
tags: git_version:unknown
Checking if FP/VEC registers are sane after a FP unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VEC unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VSX unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
result: success
success: tm_unavailable_test
ok 1..10 selftests: tm-unavailable [PASS]
selftests: tm-trap
========================================
test: tm_trap_test
tags: git_version:unknown
Little-Endian machine detected. Checking if endianness flips inadvertently on trap in TM... no.
success: tm_trap_test
ok 1..11 selftests: tm-trap [PASS]
selftests: tm-signal-context-chk-gpr
========================================
test: tm_signal_context_chk_gpr
tags: git_version:unknown
success: tm_signal_context_chk_gpr
ok 1..12...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
bugproxy (bugproxy)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (15.1 KiB)

This bug was fixed in the package linux - 4.4.0-128.154

---------------
linux (4.4.0-128.154) xenial; urgency=medium

  * linux: 4.4.0-128.154 -proposed tracker (LP: #1772960)

  * CVE-2018-3639 (x86)
    - x86/cpu: Make alternative_msr_write work for 32-bit code
    - x86/bugs: Fix the parameters alignment and missing void
    - KVM: SVM: Move spec control call after restore of GS
    - x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
    - x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
    - x86/cpufeatures: Disentangle SSBD enumeration
    - x86/cpu/AMD: Fix erratum 1076 (CPB bit)
    - x86/cpufeatures: Add FEATURE_ZEN
    - x86/speculation: Handle HT correctly on AMD
    - x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
    - x86/speculation: Add virtualized speculative store bypass disable support
    - x86/speculation: Rework speculative_store_bypass_update()
    - x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
    - x86/bugs: Expose x86_spec_ctrl_base directly
    - x86/bugs: Remove x86_spec_ctrl_set()
    - x86/bugs: Rework spec_ctrl base and mask logic
    - x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
    - KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
    - x86/bugs: Rename SSBD_NO to SSB_NO
    - KVM: VMX: Expose SSBD properly to guests.

  * [i915_bpo] Fix flickering issue after panel change (LP: #1770565)
    - drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
    - drm/i915: Name the "iboost bit"
    - drm/i915: Program iboost settings for HDMI/DVI on SKL
    - drm/i915: Move bxt_ddi_vswing_sequence() call into intel_ddi_pre_enable()
      for HDMI
    - drm/i915: Explicitly use ddi buf trans entry 9 for hdmi
    - drm/i915: Split DP/eDP/FDI and HDMI/DVI DDI buffer programming apart
    - drm/i915: Get the iboost setting based on the port type
    - drm/i915: Simplify intel_ddi_get_encoder_port()
    - drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
    - drm/i915: KBL - Recommended buffer translation programming for DisplayPort
    - drm/i915: Ignore OpRegion panel type except on select machines

  * [SRU][Bionic/Artful] fix false positives in W+X checking (LP: #1769696)
    - init: fix false positives in W+X checking

  * [Ubuntu 16.04] kernel: fix rwlock implementation (LP: #1761674)
    - SAUCE: (no-up) s390: fix rwlock implementation

  * linux < 4.11: unable to use netfilter logging from non-init namespaces
    (LP: #1766573)
    - netfilter: allow logging from non-init namespaces

  * [LTC Test] Ubuntu 18.04: tm_sigreturn failed on P8 compat mode 16.04.04
    guest (LP: #1771439)
    - powerpc: signals: Discard transaction state from signal frames

  * QCA9377 requires more IRAM banks for its new firmware (LP: #1748345)
    - ath10k: update the IRAM bank number for QCA9377

  * i915/kbl_dmc_ver1.bin failed with error -2 package 1.157.17 kernel
    4.4.0-116-generic (LP: #1752536)
    - ubuntu: i915_bpo - Add MODULE_FIRMWARE for Geminilake's DMC

  * Xenial update to 4.4.131 stable release (LP: #1768825)
    - ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS
    - ext4: set h_journal if there is a failure...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.