[LTC Test] Ubuntu 18.04: tm_trap_test failed on P8 compat mode guest

Bug #1762928 reported by bugproxy on 2018-04-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Joseph Salisbury
Bionic
Critical
Joseph Salisbury

Bug Description

---Problem Description---
tm_trap_test failed on P8 compat mode [16.04.04 daily build as well bionic ] guests on a P9 host running bionic final beta version.

Contact Information = <email address hidden>

---uname output---

16.04.04 Guest running in P8compat mode:

Linux guest 4.15.0-15-generic #16~16.04.1-Ubuntu SMP Thu Apr 5 12:18:22 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

18.04 guest running in P8compat mode:
Linux ubuntu 4.15.0-15-generic #16-Ubuntu SMP Wed Apr 4 13:57:51 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = boston-LC

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. Log into P9 Ubuntu 18.04 host, take console of guest `srikanth_ubuntu160404`
2. Run TM selftests
git clone --depth 1 https://github.com/torvalds/linux.git;cd linux/tools/testing/selftests/powerpc/;git log --oneline -1;make;make -C tm run_tests

We will observe one of the tm tests failing, mentioned below:

selftests: tm-trap
========================================
test: tm_trap_test
tags: git_version:c18bb39
Little-Endian machine detected. Checking if endianness flips inadvertently on trap in TM... yes!
failure: tm_trap_test
not ok 1..11 selftests: tm-trap [FAIL]

Expected result: All tm selftests should have been pass on the P8 compat guest.. giventhat we had TM workaround patches in latest Ubuntu BIONIC host kernel

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for <email address hidden>:
-Attach sysctl -a output output to the bug.

== Comment: #2 - SRIKANTH AITHAL <email address hidden> - 2018-04-10 01:32:00 ==

== Comment: #3 - SRIKANTH AITHAL <email address hidden> - 2018-04-10 01:33:41 ==

== Comment: #4 - SRIKANTH AITHAL <email address hidden> - 2018-04-10 01:34:02 ==

Please pick

 `1c200e63d055 ("powerpc/tm: Fix endianness flip on trap")`

ie. the commit that the test was written for

CVE References

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-166570 severity-critical targetmilestone-inin---

Default Comment by Bridge

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
affects: kernel-package (Ubuntu) → linux (Ubuntu)
tags: added: triage-g
Changed in ubuntu-power-systems:
status: New → Triaged
Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → Critical
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
bugproxy (bugproxy) on 2018-04-11
tags: added: targetmilestone-inin1804
removed: targetmilestone-inin---
Seth Forshee (sforshee) on 2018-04-12
Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
status: Confirmed → Fix Committed

------- Comment From <email address hidden> 2018-04-16 05:47 EDT-------
When are we getting patches included ?

Manoj Iyer (manjo) on 2018-04-16
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-22 08:44 EDT-------
(In reply to comment #15)
> When are we getting patches included ?

I just checked and this patch is already included in Ubuntu's tree starting at kernel Ubuntu-4.15.0-17.18, which is already on the proposed archive.

$ git tag --contains 46064a7b8975e551d59803cbaa73223fb38655ed
Ubuntu-4.15.0-16.17
Ubuntu-4.15.0-17.18
Ubuntu-4.15.0-18.19
Ubuntu-4.15.0-19.20

Default Comment by Bridge

Launchpad Janitor (janitor) wrote :
Download full text (35.7 KiB)

This bug was fixed in the package linux - 4.15.0-19.20

---------------
linux (4.15.0-19.20) bionic; urgency=medium

  * linux: 4.15.0-19.20 -proposed tracker (LP: #1766021)

  * Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers (LP: #1765232)
    - Revert "blk-mq: simplify queue mapping & schedule with each possisble CPU"
    - Revert "genirq/affinity: assign vectors to all possible CPUs"

linux (4.15.0-18.19) bionic; urgency=medium

  * linux: 4.15.0-18.19 -proposed tracker (LP: #1765490)

  * [regression] Ubuntu 18.04:[4.15.0-17-generic #18] KVM Guest Kernel:
    meltdown: rfi/fallback displacement flush not enabled bydefault (kvm)
    (LP: #1765429)
    - powerpc/pseries: Fix clearing of security feature flags

  * signing: only install a signed kernel (LP: #1764794)
    - [Packaging] update to Debian like control scripts
    - [Packaging] switch to triggers for postinst.d postrm.d handling
    - [Packaging] signing -- switch to raw-signing tarballs
    - [Packaging] signing -- switch to linux-image as signed when available
    - [Config] signing -- enable Opal signing for ppc64el
    - [Packaging] printenv -- add signing options

  * [18.04 FEAT] Sign POWER host/NV kernels (LP: #1696154)
    - [Packaging] signing -- add support for signing Opal kernel binaries

  * Please cherrypick s390 unwind fix (LP: #1765083)
    - s390/compat: fix setup_frame32

  * Ubuntu 18.04 installer does not detect any IPR based HDD/RAID array [S822L]
    [ipr] (LP: #1751813)
    - d-i: move ipr to storage-core-modules on ppc64el

  * drivers/gpu/drm/bridge/adv7511/adv7511.ko missing (LP: #1764816)
    - SAUCE: (no-up) rename the adv7511 drm driver to adv7511_drm

  * Miscellaneous Ubuntu changes
    - [Packaging] Add linux-oem to rebuild test blacklist.

linux (4.15.0-17.18) bionic; urgency=medium

  * linux: 4.15.0-17.18 -proposed tracker (LP: #1764498)

  * Eventual OOM with profile reloads (LP: #1750594)
    - SAUCE: apparmor: fix memory leak when duplicate profile load

linux (4.15.0-16.17) bionic; urgency=medium

  * linux: 4.15.0-16.17 -proposed tracker (LP: #1763785)

  * [18.04] [bug] CFL-S(CNP)/CNL GPIO testing failed (LP: #1757346)
    - [Config]: Set CONFIG_PINCTRL_CANNONLAKE=y

  * [Ubuntu 18.04] USB Type-C test failed on GLK (LP: #1758797)
    - SAUCE: usb: typec: ucsi: Increase command completion timeout value

  * Fix trying to "push" an already active pool VP (LP: #1763386)
    - SAUCE: powerpc/xive: Fix trying to "push" an already active pool VP

  * hisi_sas: Revert and replace SAUCE patches w/ upstream (LP: #1762824)
    - Revert "UBUNTU: SAUCE: scsi: hisi_sas: export device table of v3 hw to
      userspace"
    - Revert "UBUNTU: SAUCE: scsi: hisi_sas: config for hip08 ES"
    - scsi: hisi_sas: modify some register config for hip08
    - scsi: hisi_sas: add v3 hw MODULE_DEVICE_TABLE()

  * Realtek card reader - RTS5243 [VEN_10EC&DEV_5260] (LP: #1737673)
    - misc: rtsx: Move Realtek Card Reader Driver to misc
    - updateconfigs for Realtek Card Reader Driver
    - misc: rtsx: Add support for RTS5260
    - misc: rtsx: Fix symbol clashes

  * Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in
    ./include/linux/net_dim.h (LP: #1...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Breno Leitão (breno-leitao) wrote :

Joseph, Thanks for the fix.

We would like to have the fix backported to 16.04 kernels also, since this problem is on those kernel also. Could you please target this bug against artful and xenial also?

Thanks

bugproxy (bugproxy) on 2018-04-25
tags: added: severity-medium
removed: severity-critical
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Artful):
importance: Undecided → Critical
Changed in linux (Ubuntu Xenial):
importance: Undecided → Critical
Changed in linux (Ubuntu Artful):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I built Artful and Xenial test kernels with the back port. The test kernels can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1762928/xenial
http://kernel.ubuntu.com/~jsalisbury/lp1762928/artful

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

------- Comment From <email address hidden> 2018-04-26 03:37 EDT-------
(In reply to comment #26)
> I built Artful and Xenial test kernels with the back port. The test kernels
> can be downloaded from:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1762928/xenial
> http://kernel.ubuntu.com/~jsalisbury/lp1762928/artful
>
> Can you test this kernel and see if it resolves this bug?
>
> Note, to test this kernel, you need to install both the linux-image and
> linux-image-extra .deb packages.
>
> Thanks in advance!

Thanks for the packages..

1. artful: tested on latest ubuntu 18.04 host: results are PASS
2. xenial: given at link > http://kernel.ubuntu.com/~jsalisbury/lp1762928/xenial does not have ppc64el related packages ..

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-26 04:04 EDT-------
Adding to above.. in artful I see a new failure:

========================================
test: tm_unavailable_test
tags: git_version:69bfd470
Checking if FP/VEC registers are sane after a FP unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC corrupted! high = 0 low = 0
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC corrupted! high = 0 low = 0
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VEC unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP corrupted! high = 0 low = 0 VEC ok
If MSR.FP=1 MSR.VEC=0: FP corrupted! high = 0x2000 low = 0 VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VSX unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP corrupted! high = 0x2000 low = 0 VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC corrupted! high = 0 low = 0
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
result: failed!
failure: tm_unavailable_test
not ok 1..10 selftests: tm-unavailable [FAIL]
selftests: tm-trap
========================================

where as tm_trap is a PASS...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-26 08:10 EDT-------
(In reply to comment #28)
> Adding to above.. in artful I see a new failure:

Please open a different bug for this one, since this requires a new patch.
> where as tm_trap is a PASS...

Good. Let's consider the artful kernel good at this time and ready to be committed.

Thanks for the quick tests.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-26 08:47 EDT-------
(In reply to comment #29)
> (In reply to comment #28)
> > Adding to above.. in artful I see a new failure:
>
> Please open a different bug for this one, since this requires a new patch.

The possible new patch would be f48e91e87e67b56bef63393d1a02c6e22c1d7078, but we can treat it on this new bug.

Joseph Salisbury (jsalisbury) wrote :

ppc64el versions of the Xenial test kernel are now available here:

 http://kernel.ubuntu.com/~jsalisbury/lp1762928/xenial

Xenial required the following two prereq commits:
d96f234f47af ("powerpc: Avoid load hit store in setup_sigcontext()")
d11994314b2b ("powerpc: signals: Stop using current in signal code")

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-05 10:31 EDT-------
Yesterday, the decision was made at Padma's daily KVM meeting to only track System Firmware Mustfix issues using the LC GA1 Mustfix label since that is all that applies to the Supermicro team. The OS Kernel/KVM issues will be managed with a spreadsheet tracked by the KVM team and also in the internal slack channel. Removing the Mustfix label.

bugproxy (bugproxy) wrote :
Download full text (4.4 KiB)

------- Comment From <email address hidden> 2018-05-08 04:22 EDT-------
(In reply to comment #31)
> ppc64el versions of the Xenial test kernel are now available here:
>
> http://kernel.ubuntu.com/~jsalisbury/lp1762928/xenial
>
> Xenial required the following two prereq commits:
> d96f234f47af ("powerpc: Avoid load hit store in setup_sigcontext()")
> d11994314b2b ("powerpc: signals: Stop using current in signal code")

Test results:

selftests: tm-resched-dscr
========================================
test: tm_resched_dscr
tags: git_version:f142f08
Binding to cpu 8
main test running as pid 4949
Check DSCR TM context switch: OK
success: tm_resched_dscr
ok 1..1 selftests: tm-resched-dscr [PASS]
selftests: tm-syscall
========================================
test: tm_syscall
tags: git_version:f142f08
Testing transactional syscalls for 10 seconds...
5799695 active and suspended transactions behaved correctly.
(There were 1666 transaction retries.)
success: tm_syscall
ok 1..2 selftests: tm-syscall [PASS]
selftests: tm-signal-msr-resv
========================================
test: tm_signal_msr_resv
tags: git_version:f142f08
success: tm_signal_msr_resv
ok 1..3 selftests: tm-signal-msr-resv [PASS]
selftests: tm-signal-stack
========================================
test: tm_signal_stack
tags: git_version:f142f08
success: tm_signal_stack
ok 1..4 selftests: tm-signal-stack [PASS]
selftests: tm-vmxcopy
========================================
test: tm_vmxcopy
tags: git_version:f142f08
success: tm_vmxcopy
ok 1..5 selftests: tm-vmxcopy [PASS]
selftests: tm-fork
========================================
test: tm_fork
tags: git_version:f142f08
success: tm_fork
ok 1..6 selftests: tm-fork [PASS]
selftests: tm-tar
========================================
Starting, 10000 loops
test: tm_tar
tags: git_version:f142f08
success: tm_tar
ok 1..7 selftests: tm-tar [PASS]
selftests: tm-tmspr
========================================
test: tm_tmspr
tags: git_version:f142f08
success: tm_tmspr
ok 1..8 selftests: tm-tmspr [PASS]
selftests: tm-vmx-unavail
========================================
test: tm_vmx_unavail_test
tags: git_version:f142f08
success: tm_vmx_unavail_test
ok 1..9 selftests: tm-vmx-unavail [PASS]
selftests: tm-unavailable
========================================
test: tm_unavailable_test
tags: git_version:f142f08
Checking if FP/VEC registers are sane after a FP unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VEC unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VSX unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
result: success
success: tm_unavailable_test
ok 1..10 selftests: tm-unavailable [PASS]
selftests: tm-trap
========================================
test: tm_trap_test
tags: git_version:f142f08
Little-Endian ma...

Read more...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-10 11:57 EDT-------
Ok, let's consider this problem fixed for now, and we can open another bug for the different problem, which will require a different patch.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-15 14:03 EDT-------
This next problem was created as https://bugzilla.linux.ibm.com/show_bug.cgi?id=167739 and there is already a fix for it. Closing this problem, and tracking the other fix in bug#167739.

no longer affects: linux (Ubuntu Artful)
no longer affects: linux (Ubuntu Xenial)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.