in Ubuntu16.10: Hit on Call traces and system goes down when transactional memory tests are running in 32TB Brazos system

Bug #1606786 reported by bugproxy on 2016-07-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Tim Gardner
Yakkety
Undecided
Unassigned

Bug Description

== Comment: #0 - Praveen K. Pandey <email address hidden> - 2016-07-16 14:21:39 ==
---Problem Description---
 In Ubuntu16.10 Call traces and unrecoverable exception occurs when Transactional memory tests are executed in 32TB system.

Steps to reproduce:

1- Download tm tests from the link - 'http://ozlabs.au.ibm.com/~mikey/tm-test-le.tar.gz'

2- untar tm-test-le.tar.gz; make clean; make; ./runtests.sh

The system gives below call traces and goes down. [Not able to do ssh and the console gets hung]

LOG:

Running ./htm_demo
Starting 6144 worker threads, 50000 loops
------------------------------------------------------
------------------------------------------------------
[ 438.064162] CFAR: c000000000008d18 SOFTE: 0
PACATMSCRATCH: 8000000200009033
GPR00: c000000000111028 c00005f82e36f4b0 c0000000015b5d00 0000000000000000
GPR04: c000000000b28863 0000000000000001 c000000001755d00 0000000100008748
GPR08: 00000000000004f0 00000000f5257d14 000001fab9800000 0000000000000005
GPR12: 0000000000000500 c00000000bc9dd00 00003ffcf5fa0000 0000000000000100
GPR16: 0000000000000000 c00001faba7898e8 0000000000000001 c00003f9865bf804
GPR20: 0000000000000003 c0000000001638b0 0000000000000001 000000661a113a10
GPR24: 0000000000000000 c00005f82e9f2060 000001fab9800000 c0000000015eaa60
GPR28: c000000000f9e900 000000000000009e c0000000015eaa60 c00001faba79e900
[ 438.064303] NIP [c0000000000a1ba4] kvmppc_interrupt_hv+0x28/0x15c
[ 438.064315] LR [c000000000111028] trigger_load_balance+0x58/0x340
[ 438.064323] Call Trace:
[ 438.064330] [c00005f82e36f4b0] [c000000000111028] trigger_load_balance+0x58/0x340 (unreliable)
[ 438.064346] [c00005f82e36f4f0] [c0000000000f9ee4] scheduler_tick+0x104/0x180
[ 438.064360] [c00005f82e36f550] [c00000000014c128] update_process_times+0x78/0xa0
[ 438.064378] [c00005f82e36f580] [c000000000163818] tick_sched_handle.isra.6+0x48/0xe0
[ 438.064392] [c00005f82e36f5c0] [c000000000163914] tick_sched_timer+0x64/0xd0
[ 438.064404] [c00005f82e36f600] [c00000000014cbd4] __hrtimer_run_queues+0x124/0x450
[ 438.064418] [c00005f82e36f690] [c00000000014dbfc] hrtimer_interrupt+0xec/0x2c0
[ 438.064431] [c00005f82e36f750] [c00000000001f5bc] __timer_interrupt+0x8c/0x290
[ 438.064444] [c00005f82e36f7a0] [c00000000001f970] timer_interrupt+0xa0/0xe0
[ 438.064456] [c00005f82e36f7d0] [c000000000002714] decrementer_common+0x114/0x180
[ 438.064471] --- interrupt: 901 at _raw_spin_lock_irqsave+0xac/0x130
[ 438.064471] LR = _raw_spin_lock_irqsave+0x9c/0x130
[ 438.064486] [c00005f82e36fac0] [000000000000009e] 0x9e (unreliable)
[ 438.064499] [c00005f82e36fb00] [c00000000010fe0c] load_balance+0x80c/0xa90
[ 438.064511] [c00005f82e36fc40] [c00000000011036c] rebalance_domains+0x2dc/0x3b0
[ 438.064525] [c00005f82e36fcf0] [c0000000000beb98] __do_softirq+0x188/0x3e0
[ 438.064537] [c00005f82e36fde0] [c0000000000bf068] irq_exit+0xc8/0x100
[ 438.064549] [c00005f82e36fe00] [c00000000001f974] timer_interrupt+0xa4/0xe0
[ 438.064561] [c00005f82e36fe30] [c000000000002714] decrementer_common+0x114/0x180
[ 438.064572] Instruction dump:
[ 438.064578] 7c892378 48000160 f92d0850 892d0858 2c090004 418210b4 2c090001 e92d0850
[ 438.064598] 4182f2f8 39200004 992d0858 e92d0860 <f8090c18> f8290c20 f8490c28 f8690c30
[ 438.064621] ---[ end trace a8961db98dfe068b ]---
[ 438.072474]
[ 440.072525] Kernel panic - not syncing: Fatal exception in interrupt
[ 440.073966] Unrecoverable exception 4100 at c0000000000a1ba4
[ 440.073992] Oops: Unrecoverable exception, sig: 6 [#3]
[ 440.074001] SMP NR_CPUS=2048 NUMA pSeries
[ 440.074013] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic sunrpc autofs4 ses enclosure ipr
[ 440.074052] CPU: 291 PID: 404588 Comm: htm_demo Tainted: G D L 4.4.0-30-generic #49-Ubuntu
[ 440.074067] task: c00005f82da42a20 ti: c00005f83031c000 task.ti: c00005f83031c000
[ 440.074079] NIP: c0000000000a1ba4 LR: c000000000aedfdc CTR: 0000000000000000
[ 440.074091] REGS: c00005f83031f750 TRAP: 4100 Tainted: G D L (4.4.0-30-generic)
[ 440.074104] MSR: 8000000200001031 <SF,ME,IR,DR,LE> CR: 48004884 XER: 00000000
[ 440.074130] CFAR: c000000000008d18 SOFTE: 1
PACATMSCRATCH: 8000000200009033
GPR00: c000000000118d14 c00005f83031f9d0 c0000000015b5d00 0000000000000001
GPR04: c00005f83031fad8 0000000000000082 0000000000000000 f00000017e1bff80
GPR08: 0000000000000000 00000000eac0c6e6 0000000000000000 c00005f83031c000
GPR12: 0000000000000500 c00000000bcecc80 00003ffe0ffa0000 0000000000000000
GPR16: 0000000000000003 0000000000000322 0000000010002570 c00005f832062068
GPR20: c00005f857610080 0000000000000000 c00005f832062000 c00005f85754c080
GPR24: 0000000000000054 f00000017e1bff80 0000000000000000 c000000000ae97b0
GPR28: c00005f9afce5f40 0000000000000000 0000000000000001 c00005f9afce5f40
[ 440.074292] NIP [c0000000000a1ba4] kvmppc_interrupt_hv+0x28/0x15c
[ 440.074306] LR [c000000000aedfdc] _raw_spin_lock_irqsave+0x9c/0x130
[ 440.074315] Call Trace:
[ 440.074324] [c00005f83031f9d0] [c00005f83031fa10] 0xc00005f83031fa10 (unreliable)
[ 440.074341] [c00005f83031fa10] [c000000000118d14] finish_wait+0x54/0xb0
[ 440.074360] [c00005f83031fa50] [c000000000ae90bc] __wait_on_bit+0xac/0x170
[ 440.074380] [c00005f83031faa0] [c00000000022f070] wait_on_page_bit_killable+0xf0/0x110
[ 440.074396] [c00005f83031fb10] [c00000000022f18c] __lock_page_or_retry+0xfc/0x120
[ 440.074411] [c00005f83031fb50] [c00000000022f4fc] filemap_fault+0x34c/0x500
[ 440.074426] [c00005f83031fbd0] [c0000000003b23f0] ext4_filemap_fault+0x50/0x80
[ 440.074447] [c00005f83031fc10] [c00000000026e944] __do_fault+0x84/0x160
[ 440.074461] [c00005f83031fcb0] [c000000000274578] handle_mm_fault+0xd78/0x1980
[ 440.074476] [c00005f83031fd80] [c000000000af04f4] do_page_fault+0x354/0x7f0
[ 440.074494] [c00005f83031fe30] [c000000000008664] handle_page_fault+0x10/0x30
[ 440.074505] Instruction dump:
[ 440.074513] 7c892378 48000160 f92d0850 892d0858 2c090004 418210b4 2c090001 e92d0850
[ 440.074537] 4182f2f8 39200004 992d0858 e92d0860 <f8090c18> f8290c20 f8490c28 f8690c30
[ 440.074564] ---[ end trace a8961db98dfe068c ]---
[ 440.076808] Unrecoverable exception 4100 at c0000000000a1ba4
[ 440.083422] pstore: pstore dump routine blocked in Panic path, may corrupt error record
[ 440.083424]
[ 440.083428] Oops: Unrecoverable exception, sig: 6 [#4]
[ 440.083432] SMP NR_CPUS=2048 NUMA pSeries
[ 440.083450] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic sunrpc autofs4 ses enclosure ipr
[ 440.083457] CPU: 1370 PID: 405427 Comm: htm_demo Tainted: G D L 4.4.0-30-generic #49-Ubuntu
[ 440.083460] task: c000194911f30b50 ti: c000194912214000 task.ti: c000194912214000
[ 440.083462] NIP: c0000000000a1ba4 LR: c000000000aedfdc CTR: 0000000000000000
[ 440.083464] REGS: c000194912217750 TRAP: 4100 Tainted: G D L (4.4.0-30-generic)
[ 440.083473] MSR: 8000000200001031 <SF,ME,IR,DR,LE> CR: 48004884 XER: 20000000
[ 440.083503] CFAR: c000000000008d18 SOFTE: 1
[ 440.083503] PACATMSCRATCH: 8000000200009033
[ 440.083503] GPR00: c000000000118b28 c0001949122179d0 c0000000015b5d00 0000000000000001
[ 440.083503] GPR04: c000194912217ad8 0000000000000082 0000000000000082 0000000000000002
[ 440.083503] GPR08: c000000000ae5d00 00000000f5257d14 0000000000000000 c000194912214000
[ 440.083503] GPR12: 0000000000000500 c00000000bf6d700 00003ffc6efa0000 0000000000000000
[ 440.083503] GPR16: 0000000000000003 0000000000000664 0000000010002570 c00005f832062068
[ 440.083503] GPR20: c00005f857610080 0000000000000000 c00005f832062000 c00005f85754c080
[ 440.083503] GPR24: 0000000000000054 f00000017e1bff80 c00005f87fa62ac8 c000000000ae97b0
[ 440.083503] GPR28: c00005f9afce5f40 0000000000000000 0000000000000001 c00005f9afce5f40
[ 440.083514] NIP [c0000000000a1ba4] kvmppc_interrupt_hv+0x28/0x15c
[ 440.083520] LR [c000000000aedfdc] _raw_spin_lock_irqsave+0x9c/0x130
[ 440.083521] Call Trace:
[ 440.083527] [c0001949122179d0] [0000000200000001] 0x200000001 (unreliable)
[ 440.083533] [c000194912217a10] [c000000000118b28] prepare_to_wait+0x48/0xf0
[ 440.083542] [c000194912217a50] [c000000000ae9070] __wait_on_bit+0x60/0x170
[ 440.083548] [c000194912217aa0] [c00000000022f070] wait_on_page_bit_killable+0xf0/0x110
[ 440.083553] [c000194912217b10] [c00000000022f18c] __lock_page_or_retry+0xfc/0x120
[ 440.083557] [c000194912217b50] [c00000000022f4fc] filemap_fault+0x34c/0x500
[ 440.083561] [c000194912217bd0] [c0000000003b23f0] ext4_filemap_fault+0x50/0x80
[ 440.083567] [c000194912217c10] [c00000000026e944] __do_fault+0x84/0x160
[ 440.083571] [c000194912217cb0] [c000000000274578] handle_mm_fault+0xd78/0x1980
[ 440.083575] [c000194912217d80] [c000000000af04f4] do_page_fault+0x354/0x7f0
[ 440.083584] [c000194912217e30] [c000000000008664] handle_page_fault+0x10/0x30
[ 440.083597] Instruction dump:
[ 440.083603] 7c892378 48000160 f92d0850 892d0858 2c090004 418210b4 2c090001 e92d0850
[ 440.083607] 4182f2f8 39200004 992d0858 e92d0860 <f8090c18> f8290c20 f8490c28 f8690c30
[ 440.083611] ---[ end trace a8961db98dfe068d ]---
[ 440.091723]
[ 440.091985] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Regards
Praveen

== Comment: #2 - Michael Neuling <email address hidden> - 2016-07-25 22:56:05 ==
So the same issues as bz142887 are going to hit everywhere as this was an upstream bug. You'll need these two patches.

https://git.kernel.org/powerpc/c/6bcb80143e792becfd2b9cc6a339ce523e4e2219
https://git.kernel.org/powerpc/c/190ce8693c23eae09ba5f303a83bf2fbeb6478b1

bugproxy (bugproxy) on 2016-07-27
tags: added: architecture-ppc64le bugnameltc-143823 severity-critical targetmilestone-inin1610
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → kernel-package (Ubuntu)
Gary Gaydos (gmgaydos) on 2016-07-29
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Tim Gardner (timg-tpi) wrote :

Merged in 4.8

Changed in linux (Ubuntu Yakkety):
assignee: Taco Screen team (taco-screen-team) → nobody
status: New → Fix Released
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Download full text (3.6 KiB)

------- Comment From <email address hidden> 2016-08-18 02:11 EDT-------
(In reply to comment #7)
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-xenial' to
> 'verification-done-xenial'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

Hi Canonical ,

Thanks For Fix , need one help regarding proposed build .

1- I added deb http://archive.ubuntu.com/ubuntu/ yakkety-proposed restricted main multiverse universe in /etc/apt/source.list
2- added file /etc/apt/preferences.d/proposed-updates

but still not getting any proposed packages please help me on this .

LOG:

root@ltc-system:~# apt-get update
Get:1 http://us.ports.ubuntu.com/ubuntu-ports yakkety InRelease [247 kB]
Get:2 http://archive.ubuntu.com/ubuntu yakkety-proposed InRelease [95.7 kB]
Hit:3 http://ports.ubuntu.com/ubuntu-ports yakkety-security InRelease
Hit:4 http://us.ports.ubuntu.com/ubuntu-ports yakkety-updates InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports yakkety-backports InRelease
Get:6 http://us.ports.ubuntu.com/ubuntu-ports yakkety/universe ppc64el Packages [7,509 kB]
Ign:7 http://archive.ubuntu.com/ubuntu yakkety-proposed/main ppc64el Packages
Get:8 http://archive.ubuntu.com/ubuntu yakkety-proposed/main Translation-en [63.7 kB]
Ign:9 http://archive.ubuntu.com/ubuntu yakkety-proposed/multiverse ppc64el Packages
Get:10 http://archive.ubuntu.com/ubuntu yakkety-proposed/multiverse Translation-en [3,732 B]
Ign:11 http://archive.ubuntu.com/ubuntu yakkety-proposed/universe ppc64el Packages
Get:12 http://archive.ubuntu.com/ubuntu yakkety-proposed/universe Translation-en [274 kB]
Ign:7 http://archive.ubuntu.com/ubuntu yakkety-proposed/main ppc64el Packages
Ign:9 http://archive.ubuntu.com/ubuntu yakkety-proposed/multiverse ppc64el Packages
Ign:11 http://archive.ubuntu.com/ubuntu yakkety-proposed/universe ppc64el Packages
Ign:7 http://archive.ubuntu.com/ubuntu yakkety-proposed/main ppc64el Packages
Ign:9 http://archive.ubuntu.com/ubuntu yakkety-proposed/multiverse ppc64el Packages
Ign:11 http://archive.ubuntu.com/ubuntu yakkety-proposed/universe ppc64el Packages
Err:7 http://archive.ubuntu.com/ubuntu yakkety-proposed/main ppc64el Packages
404 Not Found [IP: 91.189.88.161 80]
Ign:9 http://archive.ubuntu.com/ubuntu yakkety-proposed/multiverse ppc64el Packages
Ign:11 http://archive.ubuntu.com/ubuntu yakkety-proposed/universe ppc64el Packages
Fetched 8,193 kB in 7s (1,125 kB/s)
Reading package lists... Done
E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/yakkety-proposed/main/binary-ppc64el/Packages 404 Not Found [IP: 91.189.88.161 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.

root@ltc-system:~# apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly ...

Read more...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-08-18 10:53 EDT-------
Hi Praveen,

Please see

https://wiki.ubuntu.com/ppc64el/CommonQuestions#How_to_enable_the_-proposed_repository_in_ubuntu

The repo for ppc64el is different than for x86.

Thanks, Gary

------- Comment (attachment only) From <email address hidden> 2016-08-22 03:16 EDT-------

bugproxy (bugproxy) on 2016-08-22
tags: added: verification-done
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (13.4 KiB)

This bug was fixed in the package linux - 4.4.0-36.55

---------------
linux (4.4.0-36.55) xenial; urgency=low

  [ Stefan Bader ]

  * Release Tracking Bug
    - LP: #1612305

  * I2C touchpad does not work on AMD platform (LP: #1612006)
    - SAUCE: pinctrl/amd: Remove the default de-bounce time

  * CVE-2016-5696
    - tcp: make challenge acks less predictable

linux (4.4.0-35.54) xenial; urgency=low

  [ Stefan Bader ]

  * Release Tracking Bug
    - LP: #1611215

  * [i915_bpo] Sync with v4.7 (LP: #1609742)
    - SAUCE: i915_bpo: Sync with v4.7

  * s390/cio: fix reset of channel measurement block (LP: #1609415)
    - s390/cio: allow to reset channel measurement block

  * in Ubuntu16.10: Hit on Call traces and system goes down when transactional
    memory tests are running in 32TB Brazos system (LP: #1606786)
    - powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
    - powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint()

  * Power Menu does not display after press the Power Button (LP: #1609204)
    - intel-vbtn: new driver for Intel Virtual Button
    - [config] enable CONFIG_INTEL_VBTN=m

  * OptiPlex 7450 AIO hangs when rebooting (LP: #1608762)
    - x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk

  * virtualbox+usb 3.0 breaks boot, -28 kernel works (LP: #1604058)
    - SAUCE: xhci: Fix soft lockup in xhci_pci_probe path when XHCI_STATE_HALTED

  * linux-kernel: Freeing IRQ from IRQ context (LP: #1597908)
    - block: defer timeouts to a workqueue

  * Tunnel offload indications not stripped from encapsulated packets, causing
    performance overhead (LP: #1602755)
    - tunnels: Remove encapsulation offloads on decap.

  * lm-sensors is throwing "ERROR: Can't get value of subfeature temp1_input:
    I/O error" for be2net driver (LP: #1607387)
    - be2net: perform temperature query in adapter regardless of its interface
      state

  * Dell dock MAC Address pass through doesn't work in Ubuntu (LP: #1579984)
    - r8152: Add support for setting pass through MAC address on RTL8153-AD

  * vmxnet3 LRO IPv6 performance issues (stalling TCP) (LP: #1605494)
    - Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets

  * ISST-LTE:pVM:monklp5:Ubuntu16.04.1:system crashed at
    lpfc_sli4_scmd_to_wqidx_distr (LP: #1597974)
    - SAUCE: lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from
      lpfc_send_taskmgmt()

  * Backport cxlflash shutdown patch to Xenial SRU (LP: #1605405)
    - SAUCE: cxlflash: Verify problem state area is mapped before notifying
      shutdown

  * Xenial update to v4.4.16 stable release (LP: #1607404)
    - mac80211: fix fast_tx header alignment
    - mac80211: mesh: flush mesh paths unconditionally
    - mac80211_hwsim: Add missing check for HWSIM_ATTR_SIGNAL
    - mac80211: Fix mesh estab_plinks counting in STA removal case
    - EDAC, sb_edac: Fix rank lookup on Broadwell
    - IB/cm: Fix a recently introduced locking bug
    - IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs
    - powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
    - powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
    - usb: dwc2: fix reg...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Tim Gardner (timg-tpi) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial

------- Comment From <email address hidden> 2016-10-04 04:27 EDT-------
Hi

verified this bug on Ubuntu16.10(4.8 ) it works fine .

LOG:

root@ltc-brazos1:~/tm-test-le# ./runtests.sh
Running ./htm_fork
PASSED!
Running ./htm_vmxcopy
PASSED!
Running ./htm_dscr
Starting, 10000 loops
PASSED!
Running ./htm_dscr
Starting, 10000 loops
PASSED!
Running ./htm_tar
Starting, 10000 loops
PASSED!
Running ./htm_fpunavailable
Thread doing stuff!
Entering FP function
Transaction done, ret = 0, TEXASR 0x11c000001,
TFIAR 0x100029e9 f = 1337.000000.
FAIL, wrong number of aborts.
Done.
Test failed code:0 ./htm_fpunavailable
root@ltc-brazos1:~/tm-test-le#

Regards
Praveen

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments