Activity log for bug #1771439

Date Who What changed Old value New value Message
2018-05-15 21:40:36 bugproxy bug added bug
2018-05-15 21:40:38 bugproxy tags architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043
2018-05-15 21:40:56 bugproxy attachment added guest sosreport https://bugs.launchpad.net/bugs/1771439/+attachment/5139985/+files/sosreport-ubuntu1604-20180510221859.tar.xz
2018-05-15 21:41:20 bugproxy attachment added host sosreport https://bugs.launchpad.net/bugs/1771439/+attachment/5139986/+files/sosreport-ltc-boston17-20180510231913.tar.xz
2018-05-15 21:41:21 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2018-05-15 21:41:24 bugproxy affects ubuntu linux (Ubuntu)
2018-05-16 05:18:24 Frank Heimes tags architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 triage-g
2018-05-16 05:18:59 Frank Heimes bug task added ubuntu-power-systems
2018-05-16 05:19:05 Frank Heimes ubuntu-power-systems: status New Triaged
2018-05-16 05:19:16 Frank Heimes ubuntu-power-systems: importance Undecided Medium
2018-05-16 05:19:36 Frank Heimes ubuntu-power-systems: assignee Canonical Kernel Team (canonical-kernel-team)
2018-05-16 19:58:29 Joseph Salisbury linux (Ubuntu): importance Undecided Medium
2018-05-16 19:58:32 Joseph Salisbury linux (Ubuntu): status New Triaged
2018-05-16 20:13:18 Joseph Salisbury linux (Ubuntu): assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Joseph Salisbury (jsalisbury)
2018-05-16 20:13:22 Joseph Salisbury linux (Ubuntu): status Triaged In Progress
2018-05-16 20:13:27 Joseph Salisbury nominated for series Ubuntu Xenial
2018-05-16 20:13:27 Joseph Salisbury bug task added linux (Ubuntu Xenial)
2018-05-16 20:13:34 Joseph Salisbury linux (Ubuntu Xenial): status New In Progress
2018-05-16 20:13:38 Joseph Salisbury linux (Ubuntu Xenial): importance Undecided Medium
2018-05-16 20:13:45 Joseph Salisbury linux (Ubuntu Xenial): assignee Joseph Salisbury (jsalisbury)
2018-05-17 04:17:04 Frank Heimes ubuntu-power-systems: status Triaged In Progress
2018-05-18 14:26:04 Joseph Salisbury description This test fails in the same way on a P8 host, so it is nothing to do with P9. There have been many TM bugs fixed upstream since 4.4. I would suggest starting with commit 044215d145a7 ("powerpc/tm: Fix illegal TM state in signal handler", 2017-08-22) and see if that helps. The bad thing exception is being raised when executing the following line: c00000000004fcfc: b0 04 03 e8 ld r0,1200(r3) -> c00000000004fd00: a6 23 02 7c mtspr 130,r0 Which is basically restoring TEXASR in the thread. ISA says "These registers can be written only when in Non-transactional state" and the MSR is set to be transactional (suspended): MSR: 8000000300201033 [ME][RI][IR][DR][LE][SF][HTM][TSU] That explains why we are getting the "bad thing exception". A mtspr is being called with a transaction suspended. I think we need the following commit to have this fixed: commit 78a3e8889b4b6b99775ed954696ff3e017f5d19b Author: Cyril Bur <cyrilbur@gmail.com> Date: Tue Aug 23 10:46:17 2016 +1000 powerpc: signals: Discard transaction state from signal frames Userspace can begin and suspend a transaction within the signal handler which means they might enter sys_rt_sigreturn() with the processor in suspended state. sys_rt_sigreturn() wants to restore process context (which may have been in a transaction before signal delivery). To do this it must restore TM SPRS. To achieve this, any transaction initiated within the signal frame must be discarded in order to be able to restore TM SPRs as TM SPRs can only be manipulated non-transactionally.. >From the PowerPC ISA: TM Bad Thing Exception [Category: Transactional Memory] An attempt is made to execute a mtspr targeting a TM register in other than Non-transactional state. Not doing so results in a TM Bad Thing: [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable] [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033) [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1] [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34 [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000 [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000 [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0) [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000 [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033 GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0 GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000 GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000 GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0 [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0 [12045.223630] Call Trace: [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0 [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108 [12045.223806] Instruction dump: [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 [12045.224074] ---[ end trace cb8002ee240bae76 ]--- It isn't clear exactly if there is really a use case for userspace returning with a suspended transaction, however, doing so doesn't (on its own) constitute a bad frame. As such, this patch simply discards the transactional state of the context calling the sigreturn and continues. Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Acked-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index ba0a2a4..e32fdbb 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -167,6 +167,8 @@ signal will be rolled back anyway. For signals taken in non-TM or suspended mode, we use the normal/non-checkpointed stack pointer. +Any transaction initiated inside a sighandler and suspended on return +from the sighandler to the kernel will get reclaimed and discarded. Failure cause codes used by kernel ================================== diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index b6aa378..a7daf74 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1226,7 +1226,21 @@ long sys_rt_sigreturn(int r3, int r4, int r5, int r6, int r7, int r8, (regs->gpr[1] + __SIGNAL_FRAMESIZE + 16); if (!access_ok(VERIFY_READ, rt_sf, sizeof(*rt_sf))) goto bad; + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * If there is a transactional state then throw it away. + * The purpose of a sigreturn is to destroy all traces of the + * signal frame, this includes any transactional state created + * within in. We only check for suspended as we can never be + * active in the kernel, we are active, there is nothing better to + * do than go ahead and Bad Thing later. + * The cause is not important as there will never be a + * recheckpoint so it's not user visible. + */ + if (MSR_TM_SUSPENDED(mfmsr())) + tm_reclaim_current(0); + if (__get_user(tmp, &rt_sf->uc.uc_link)) goto bad; uc_transact = (struct ucontext __user *)(uintptr_t)tmp; diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 7e49984..70409bb 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -676,7 +676,21 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, unsigned long r5, if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set))) goto badframe; set_current_blocked(&set); + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * If there is a transactional state then throw it away. + * The purpose of a sigreturn is to destroy all traces of the + * signal frame, this includes any transactional state created + * within in. We only check for suspended as we can never be + * active in the kernel, we are active, there is nothing better to + * do than go ahead and Bad Thing later. + * The cause is not important as there will never be a + * recheckpoint so it's not user visible. + */ + if (MSR_TM_SUSPENDED(mfmsr())) + tm_reclaim_current(0); + if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) goto badframe; if (MSR_TM_ACTIVE(msr)) { == Breno Leitao <brenohl@br.ibm.com> == That is exactly the commit id that solves the problem. I was able to cherry pick 78a3e8889b4b6b99775ed954696ff3e017f5d19b on top of Ubuntu-4.4.0-124.148 and now the code works fine. 1604 ? sudo dmesg -c > /dev/null 1604 ? ./tm-sigreturn test: tm_sigreturn tags: git_version:v4.17-rc5-0-g67b8d5c success: tm_sigreturn 1604 ? dmesg 1604 ? == SRU Justification == IBM is seeing tm_sigreturn test failures on P8 and P9 hosts. The bad thing exception is being raised when executing the following line: c00000000004fcfc: b0 04 03 e8 ld r0,1200(r3) -> c00000000004fd00: a6 23 02 7c mtspr 130,r0 Which is basically restoring TEXASR in the thread. ISA says "These registers can be written only when in Non-transactional state" and the MSR is set to be transactional (suspended): MSR: 8000000300201033 [ME][RI][IR][DR][LE][SF][HTM][TSU] That explains why they are getting the "bad thing exception". A mtspr is being called with a transaction suspended. This test failure is fixed by upstream commit 78a3e8889b4b. Upstream commit 78a3e8889b4b is in mainline as of 4.8-rc5. == Fix == 78a3e8889b4b ("powerpc: signals: Discard transaction state from signal frames") == Regression Potential == Low. Specific to powerpc. == Test Case == A test kernel was built with this patch and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. This test fails in the same way on a P8 host, so it is nothing to do with P9. There have been many TM bugs fixed upstream since 4.4. I would suggest starting with commit 044215d145a7 ("powerpc/tm: Fix illegal TM state in signal handler", 2017-08-22) and see if that helps. The bad thing exception is being raised when executing the following line:          c00000000004fcfc: b0 04 03 e8 ld r0,1200(r3)   -> c00000000004fd00: a6 23 02 7c mtspr 130,r0 Which is basically restoring TEXASR in the thread. ISA says "These registers can be written only when in Non-transactional state" and the MSR is set to be transactional (suspended): MSR: 8000000300201033 [ME][RI][IR][DR][LE][SF][HTM][TSU] That explains why we are getting the "bad thing exception". A mtspr is being called with a transaction suspended. I think we need the following commit to have this fixed: commit 78a3e8889b4b6b99775ed954696ff3e017f5d19b Author: Cyril Bur <cyrilbur@gmail.com> Date: Tue Aug 23 10:46:17 2016 +1000     powerpc: signals: Discard transaction state from signal frames     Userspace can begin and suspend a transaction within the signal     handler which means they might enter sys_rt_sigreturn() with the     processor in suspended state.     sys_rt_sigreturn() wants to restore process context (which may have     been in a transaction before signal delivery). To do this it must     restore TM SPRS. To achieve this, any transaction initiated within the     signal frame must be discarded in order to be able to restore TM SPRs     as TM SPRs can only be manipulated non-transactionally..     >From the PowerPC ISA:       TM Bad Thing Exception [Category: Transactional Memory]        An attempt is made to execute a mtspr targeting a TM register in        other than Non-transactional state.     Not doing so results in a TM Bad Thing:     [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]     [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)     [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]     [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV     [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE      nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4      xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter      ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm      uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure      scsi_transport_sas bnx2x ipr mdio libcrc32c     [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34     [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000     [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000     [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0)     [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000     [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033     GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0     GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000     GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000     GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000     GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000     GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000     GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000     GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0     [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c     [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0     [12045.223630] Call Trace:     [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0     [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108     [12045.223806] Instruction dump:     [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8     [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020     [12045.224074] ---[ end trace cb8002ee240bae76 ]---     It isn't clear exactly if there is really a use case for userspace     returning with a suspended transaction, however, doing so doesn't (on     its own) constitute a bad frame. As such, this patch simply discards     the transactional state of the context calling the sigreturn and     continues.     Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>     Signed-off-by: Cyril Bur <cyrilbur@gmail.com>     Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>     Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>     Acked-by: Simon Guo <wei.guo.simon@gmail.com>     Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index ba0a2a4..e32fdbb 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -167,6 +167,8 @@ signal will be rolled back anyway.  For signals taken in non-TM or suspended mode, we use the  normal/non-checkpointed stack pointer. +Any transaction initiated inside a sighandler and suspended on return +from the sighandler to the kernel will get reclaimed and discarded.  Failure cause codes used by kernel  ================================== diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index b6aa378..a7daf74 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1226,7 +1226,21 @@ long sys_rt_sigreturn(int r3, int r4, int r5, int r6, int r7, int r8,    (regs->gpr[1] + __SIGNAL_FRAMESIZE + 16);   if (!access_ok(VERIFY_READ, rt_sf, sizeof(*rt_sf)))    goto bad; +  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * If there is a transactional state then throw it away. + * The purpose of a sigreturn is to destroy all traces of the + * signal frame, this includes any transactional state created + * within in. We only check for suspended as we can never be + * active in the kernel, we are active, there is nothing better to + * do than go ahead and Bad Thing later. + * The cause is not important as there will never be a + * recheckpoint so it's not user visible. + */ + if (MSR_TM_SUSPENDED(mfmsr())) + tm_reclaim_current(0); +   if (__get_user(tmp, &rt_sf->uc.uc_link))    goto bad;   uc_transact = (struct ucontext __user *)(uintptr_t)tmp; diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 7e49984..70409bb 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -676,7 +676,21 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, unsigned long r5,   if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))    goto badframe;   set_current_blocked(&set); +  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * If there is a transactional state then throw it away. + * The purpose of a sigreturn is to destroy all traces of the + * signal frame, this includes any transactional state created + * within in. We only check for suspended as we can never be + * active in the kernel, we are active, there is nothing better to + * do than go ahead and Bad Thing later. + * The cause is not important as there will never be a + * recheckpoint so it's not user visible. + */ + if (MSR_TM_SUSPENDED(mfmsr())) + tm_reclaim_current(0); +   if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))    goto badframe;   if (MSR_TM_ACTIVE(msr)) { == Breno Leitao <brenohl@br.ibm.com> == That is exactly the commit id that solves the problem. I was able to cherry pick 78a3e8889b4b6b99775ed954696ff3e017f5d19b on top of Ubuntu-4.4.0-124.148 and now the code works fine. 1604 ? sudo dmesg -c > /dev/null 1604 ? ./tm-sigreturn test: tm_sigreturn tags: git_version:v4.17-rc5-0-g67b8d5c success: tm_sigreturn 1604 ? dmesg 1604 ?
2018-05-23 15:09:43 Stefan Bader linux (Ubuntu Xenial): status In Progress Fix Committed
2018-05-23 16:22:51 Frank Heimes ubuntu-power-systems: status In Progress Fix Committed
2018-05-28 14:03:55 Brad Figg tags architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 triage-g architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 triage-g verification-needed-xenial
2018-05-28 16:29:48 bugproxy tags architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 triage-g verification-needed-xenial architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 triage-g verification-done-xenial
2018-06-11 15:09:13 Launchpad Janitor linux (Ubuntu Xenial): status Fix Committed Fix Released
2018-06-11 15:09:13 Launchpad Janitor cve linked 2017-5715
2018-06-11 15:09:13 Launchpad Janitor cve linked 2017-5753
2018-06-11 15:09:13 Launchpad Janitor cve linked 2018-3639
2018-06-11 15:09:13 Launchpad Janitor cve linked 2018-8087
2018-06-18 14:22:33 Manoj Iyer ubuntu-power-systems: status Fix Committed Fix Released
2018-07-19 19:10:09 Joseph Salisbury linux (Ubuntu): status In Progress Fix Released
2019-07-24 21:09:28 Brad Figg tags architecture-ppc64le bugnameltc-167739 severity-medium targetmilestone-inin16043 triage-g verification-done-xenial architecture-ppc64le bugnameltc-167739 cscc severity-medium targetmilestone-inin16043 triage-g verification-done-xenial