NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests

Bug #1483343 reported by bugproxy
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
High
Canonical Kernel Team
Vivid
Expired
High
Unassigned
linux-lts-xenial (Ubuntu)
Expired
High
Unassigned

Bug Description

---Problem Description---
NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests

---uname output---
Linux alp15 3.19.0-18-generic #18~14.04.1-Ubuntu SMP Wed May 20 09:40:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = P8

---Steps to Reproduce---
Install a P8 Power VM LPAR with Ubuntu 14.04.2 ISO.
Then install the Ubuntu 14.04.3 kernel on the same and reboot.
Then compile and build the LTP latest test suites on the same.

root@alp15:~# tar -xvf ltp-full-20150420.tar.bz2
root@alp15:~# cd ltp-full-20150420/
root@alp15:~/ltp-full-20150420# ls
aclocal.m4 configure execltp.in install-sh Makefile README runltplite.sh testcases utils
autom4te.cache configure.ac IDcheck.sh lib Makefile.release README.kernel_config runtest testscripts ver_linux
config.guess COPYING include ltpmenu missing runalltests.sh scenario_groups TODO VERSION
config.sub doc INSTALL m4 pan runltp scripts tools
root@alp15:~/ltp-full-20150420# ./configure
root@alp15:~/ltp-full-20150420# make
root@alp15:~/ltp-full-20150420# make install

root@alp15:/opt/ltp/testcases/bin# ./lock_torture.sh
lock_torture 1 TINFO : estimate time 6.00 min
lock_torture 1 TINFO : spin_lock: running 60 sec...

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034386] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 21s! [lock_torture_wr:2337]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034389] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [lock_torture_wr:2331]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034394] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lock_torture_wr:2339]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034396] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lock_torture_wr:2346]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034398] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 21s! [lock_torture_wr:2334]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034410] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [lock_torture_wr:2321]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034412] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [lock_torture_wr:2333]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.038386] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [lock_torture_wr:2327]

Stack trace output:
 root@alp15:~# dmesg | more
[ 1717.146881] lock_torture_wr R running task
[ 1717.146881]
[ 1717.146885] 0 2555 2 0x00000804
[ 1717.146887] Call Trace:
[ 1717.146894] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860 (unreliable)
[ 1717.146899] [c000000c7551b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0
[ 1717.146904] [c000000c7551b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146909] [c000000c7551b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0
[ 1717.146913] [c000000c7551b9b0] [c000000000002758] decrementer_common+0x158/0x180
[ 1717.146922] --- interrupt: 901 at _raw_write_lock+0x68/0xc0
[ 1717.146922] LR = torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146927] [c000000c7551bca0] [c000000c7551bcd0] 0xc000000c7551bcd0 (unreliable)
[ 1717.146934] [c000000c7551bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146939] [c000000c7551bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture]
[ 1717.146944] [c000000c7551bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.146948] [c000000c7551be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
[ 1717.146951] Task dump for CPU 10:
[ 1717.146953] lock_torture_wr R running task 0 2537 2 0x00000804
[ 1717.146957] Call Trace:
[ 1717.146961] [c000000c7557b820] [c000000c7557b860] 0xc000000c7557b860 (unreliable)
[ 1717.146966] [c000000c7557b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0
[ 1717.146970] [c000000c7557b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146975] [c000000c7557b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0
[ 1717.146979] [c000000c7557b9b0] [c000000000002758] decrementer_common+0x158/0x180
[ 1717.146988] --- interrupt: 901 at _raw_write_lock+0x68/0xc0
[ 1717.146988] LR = torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146993] [c000000c7557bca0] [c000000c7557bcd0] 0xc000000c7557bcd0 (unreliable)
[ 1717.147000] [c000000c7557bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.147006] [c000000c7557bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture]
[ 1717.147013] [c000000c7557bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.147017] [c000000c7557be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
[ 1717.147020] Task dump for CPU 17:
[ 1717.147021] Task dump for CPU 2:
[ 1717.147028] lock_torture_wr R
[ 1717.147028] lock_torture_wr R running task
[ 1717.147033] running task 0 2547 2 0x00000804
[ 1717.147042] 0 2533 2 0x00000804
[ 1717.147044] Call Trace:
[ 1717.147045] Call Trace:
[ 1717.147053] [c000000c732a3820] [c000000c7f688448] 0xc000000c7f688448
[ 1717.147056] [c000000c7555f820] [c000000c7fa48448] 0xc000000c7fa48448
[ 1717.147059] (unreliable)
[ 1717.147063] (unreliable)
[ 1717.147063]
[ 1717.147067]
[ 1717.147072] Task dump for CPU 18:
[ 1717.147073] Task dump for CPU 7:
[ 1717.147077] lock_torture_wr R running task
[ 1717.147082] lock_torture_wr R 0 2555 2 0x00000804
[ 1717.147088] running task
[ 1717.147088] Call Trace:
[ 1717.147096] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860
[ 1717.147096] 0 2559 2 0x00000804
[ 1717.147102] Call Trace:
[ 1717.147105] (unreliable)

It is possible that we are missing this commit that fixes a deadlock during these tests:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=f548d99ef4f5ec8f7080e88ad07c44d16d058ddc

will check the Ubuntu source shortly as see if this is the case and we can suggest building a kernel to see if it helps.

The apt-get source linux-image- on the test system didn't pull down the sources but the kernel being used is close to the one used for vivid (3.19.0-25.26) so I pulled down the git source tree for it with git clone git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git and the resulting source shows that the patch for the commit mentioned is not applied.

As I basically understand it, the problem that was fixed is that while torture_rwlock_read_lock_irq() acquires a read lock on the lock called:

torture_rwlock

anything that calls the counterpart torture_rwlock_read_unlock_irq() to relinquish the read lock instead ends doing a write_unlock_irqrestore() on the torture_rwlock() in essence leaving the read lock. So when the locktorture module calls something like torture_rwlock_write_lock() as we see in the bug description, it will block indefinitely as there is at least one lock reader.

I'll go ahead and mirror this since I pretty confident this is the issue (also should affect Vivid).

We'll have to figure out how to get the sources for the LTS kernel to build a test kernel as well.

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs are attached

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-126476 severity-critical targetmilestone-inin14043
Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Luciano Chavez (lnx1138)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
tags: added: kernel-da-key
Changed in linux (Ubuntu Vivid):
importance: Undecided → High
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a cherry-pick of commit f548d99ef4f5ec8f7080e88ad07c44d16d058ddc.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1483343/

Can you test this kernel and see if it resolves this bug?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (6.3 KiB)

------- Comment From <email address hidden> 2015-08-11 10:13 EDT-------
I have installed this kernel on a Power NV Hardware which is installed with Ubuntu 14.04.3.
Then executed the tests again on the same OS.

root@p7pite26os:~# uname -a
Linux p7pite26os 3.19.0-26-generic #27 SMP Mon Aug 10 20:05:24 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
root@p7pite26os:~# cat /etc/issue
Ubuntu 14.04.3 LTS \n \l

root@p7pite26os:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.3 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.3 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

root@p7pite26os:~# dmesg | grep NMI
[ 4824.652595] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [lock_torture_wr:29508]
[ 4824.652603] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [lock_torture_wr:29402]
[ 4824.656608] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [lock_torture_wr:29374]
[ 4824.656610] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [lock_torture_wr:29468]
[ 4824.656616] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [lock_torture_wr:29450]
[ 4824.656619] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [lock_torture_re:29411]
[ 4824.656633] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [lock_torture_re:29387]
[ 4824.660592] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [lock_torture_re:29403]
[ 4824.660603] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 23s! [lock_torture_re:29407]
[ 4824.660605] NMI watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [lock_torture_wr:29376]
[ 4824.660610] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [lock_torture_re:29423]
[ 4824.664587] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [lock_torture_wr:29476]
[ 4824.664589] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 23s! [lock_torture_wr:29466]
[ 4824.664591] NMI watchdog: BUG: soft lockup - CPU#18 stuck for 23s! [lock_torture_wr:29494]
[ 4824.664622] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [lock_torture_wr:29386]
[ 4824.668596] NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [lock_torture_re:29457]
[ 4824.668599] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [lock_torture_re:29377]
[ 4824.668616] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [lock_torture_re:29409]
[ 4824.668623] NMI watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [lock_torture_wr:29356]
[ 4824.672600] NMI watchdog: BUG: soft lockup - CPU#33 stuck for 23s! [lock_torture_wr:29414]
[ 4824.672631] NMI watchdog: BUG: soft lockup - CPU#32 stuck for 23s! [lock_torture_wr:29396]
[ 4824.672650] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [lock_torture_re:29441]
[ 4824.672657] NMI watchdog: BUG: soft lockup - CPU#29 stuck for 23s! [lock_torture_re:29455]
[ 4824.676598] NMI watchdog: BUG: soft lockup - CPU#37 stuck for 23s! [lock_torture_wr:29368]
[ 4824.676666] NMI watchdog: BUG: soft lockup - CPU#34 stuck for 23s! [lock_torture_wr:29442]
[ 4824.684608] NMI watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [lock_torture_re:29379]
[ 4824.684626] NMI watchdog: BUG: s...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs are attached when tests are done with new kernel provided

------- Comment (attachment only) From <email address hidden> 2015-08-11 10:15 EDT-------

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.2 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-rc6-unstable/

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-08-11 16:52 EDT-------
(In reply to comment #18)
> Created attachment 100943 [details]
> dmesg logs are attached when tests are done with new kernel provided

So does the system lockup as before? Also, the dmesg attached doesn't show any softlockups.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@pavsubra, I also don't see any soft lockups in the dmesg. However, you mentioned them in comment #4, which appears to the be test kernel I build with a cherry-pick of commit: f548d99

If that commit does in fact stop the soft lockups, we can send an SRU request to have that commit added to the stable kernels.

Also, I requested testing of the latest mainline kernel in comment #6 in case the test kernel does in fact still exhibit the bug. That will tell us if another commit from mainline is also needed.

Thanks in advance.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-12-10 08:00 EDT-------
I have installed Ubuntu 14.04.3 ISO on PowerNV 8284-22A P8 Hardware.
Then executed the LTP lock_torture tests again on the same.

root@powerkvmpok002:~# uname -a
Linux powerkvmpok002 3.19.0-39-generic #44~14.04.1-Ubuntu SMP Wed Dec 2 10:00:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
root@powerkvmpok002:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.3 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.3 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

root@powerkvmpok002:/opt/ltp/testcases/bin# ./lock_torture.sh
./lock_torture.sh: 55: ./lock_torture.sh: tst_kvercmp: not found

[ 3173.326014] INFO: rcu_sched self-detected stall on CPU
[ 3173.326017] INFO: rcu_sched self-detected stall on CPU
[ 3173.326019] INFO: rcu_sched self-detected stall on CPU
[ 3173.326022] INFO: rcu_sched self-detected stall on CPU
[ 3173.326023] INFO: rcu_sched self-detected stall on CPU
[ 3173.326044] INFO: rcu_sched self-detected stall on CPU {
[ 3173.326045] INFO: rcu_sched self-detected stall on CPU
[ 3173.326054] INFO: rcu_sched self-detected stall on CPU
[ 3180.118019] NMI watchdog: BUG: soft lockup - CPU#69 stuck for 21s! [lock_torture_wr:25460]
[ 3184.102014] NMI watchdog: BUG: soft lockup - CPU#47 stuck for 24s! [lock_torture_wr:25357]
[ 3184.122017] NMI watchdog: BUG: soft lockup - CPU#74 stuck for 22s! [lock_torture_wr:25466]
[ 3184.246010] NMI watchdog: BUG: soft lockup - CPU#155 stuck for 22s! [lock_torture_wr:25527]
[ 3204.090023] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [lock_torture_wr:25484]
[ 3208.106009] NMI watchdog: BUG: soft lockup - CPU#53 stuck for 21s! [lock_torture_wr:25482]
[ 3212.110015] NMI watchdog: BUG: soft lockup - CPU#60 stuck for 22s! [lock_torture_wr:25422]
[ 3212.142006] NMI watchdog: BUG: soft lockup - CPU#90 stuck for 21s! [lock_torture_wr:25395]
[ 3212.222010] NMI watchdog: BUG: soft lockup - CPU#144 stuck for 23s! [lock_torture_wr:25415]
[ 3216.090008] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 24s! [lock_torture_wr:25495]
[ 3216.118032] NMI watchdog: BUG: soft lockup - CPU#67 stuck for 21s! [lock_torture_wr:25353]
[ 3216.198006] NMI watchdog: BUG: soft lockup - CPU#130 stuck for 21s! [lock_torture_wr:25380]
[ 3224.094005] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [lock_torture_wr:25330]
[ 3228.101998] NMI watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [lock_torture_wr:25455]
[ 3228.162005] NMI watchdog: BUG: soft lockup - CPU#110 stuck for 22s! [lock_torture_wr:25490]

The issue is still occurring with the latest Ubuntu 14.04.3 OS kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@pavsubra, Were you ever able to test the mainline kernel to see if this bug still exists or not? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/

Also, I'll build another test kernel using the latest lts-backport-vivid source and a cherry pick of commit: f548d99ef4. It may be that more than just that one commit is needed.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a V2 test kernel with a cherry-pick of commit f548d99ef4f5ec8f7080e88ad07c44d16d058ddc.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1483343/

Can you test this kernel and see if it resolves this bug?

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (6.1 KiB)

------- Comment From <email address hidden> 2015-12-11 06:16 EDT-------
I have downloaded the mainline wily kernel as suggested and executed the scenario again.

root@powerkvmpok002:~/mainline# wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb
--2015-12-11 00:28:48-- http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb
Resolving kernel.ubuntu.com (kernel.ubuntu.com)... 91.189.94.216
Connecting to kernel.ubuntu.com (kernel.ubuntu.com)|91.189.94.216|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56391678 (54M) [application/x-debian-package]
Saving to: ?linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb?

100%[===================================================================================================================>] 56,391,678 846KB/s in 66s

2015-12-11 00:29:54 (832 KB/s) - ?linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb? saved [56391678/56391678]

root@powerkvmpok002:~/mainline# dpkg -i linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb
Selecting previously unselected package linux-image-4.4.0-040400rc4-generic.
(Reading database ... 137006 files and directories currently installed.)
Preparing to unpack linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb ...
Done.
Unpacking linux-image-4.4.0-040400rc4-generic (4.4.0-040400rc4.201512061930) ...
Setting up linux-image-4.4.0-040400rc4-generic (4.4.0-040400rc4.201512061930) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
update-initramfs: Generating /boot/initrd.img-4.4.0-040400rc4-generic
W: Possible missing firmware /lib/firmware/ql2700_fw.bin for module qla2xxx
W: Possible missing firmware /lib/firmware/ql8300_fw.bin for module qla2xxx
W: Possible missing firmware /lib/firmware/ql2600_fw.bin for module qla2xxx
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic
Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic
Found linux image: /boot/vmlinux-3.19.0-39-generic
Found initrd image: /boot/initrd.img-3.19.0-39-generic
Found Debian GNU/Linux (8.2) on /dev/sdb2
Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system
done

root@powerkvmpok002:~# uname -a
Linux powerkvmpok002 4.4.0-040400rc4-generic #201512061930 SMP Mon Dec 7 01:14:51 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

root@powerkvmpok002:~# cd /opt/ltp/testcases/bin/
root@powerkvmpok002:/opt/ltp/tes...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (5.1 KiB)

------- Comment From <email address hidden> 2015-12-11 08:36 EDT-------
I have downloaded the test kernel as suggested and executed the scenario again.

root@powerkvmpok002:~/lp1483343# dpkg -i linux-image-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb
Selecting previously unselected package linux-image-3.19.0-41-generic.
(Reading database ... 142221 files and directories currently installed.)
Preparing to unpack linux-image-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb ...
Done.
Unpacking linux-image-3.19.0-41-generic (3.19.0-41.46~14.04.2) ...
Setting up linux-image-3.19.0-41-generic (3.19.0-41.46~14.04.2) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
update-initramfs: Generating /boot/initrd.img-3.19.0-41-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic
Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic
Found linux image: /boot/vmlinux-3.19.0-41-generic
Found initrd image: /boot/initrd.img-3.19.0-41-generic
Found linux image: /boot/vmlinux-3.19.0-39-generic
Found initrd image: /boot/initrd.img-3.19.0-39-generic
Found Debian GNU/Linux (8.2) on /dev/sdb2
Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system
done

root@powerkvmpok002:~/lp1483343# dpkg -i linux-image-extra-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb
Selecting previously unselected package linux-image-extra-3.19.0-41-generic.
(Reading database ... 143108 files and directories currently installed.)
Preparing to unpack linux-image-extra-3.19.0-41-generic_3.19.0-41.46~14.04.2_ppc64el.deb ...
Unpacking linux-image-extra-3.19.0-41-generic (3.19.0-41.46~14.04.2) ...
Setting up linux-image-extra-3.19.0-41-generic (3.19.0-41.46~14.04.2) ...
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
update-initramfs: Generating /boot/initrd.img-3.19.0-41-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.19.0-41-generic /boot/vmlinux-3.19.0-41-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic
Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic
Found linux image: /boot/vmlinux-3.19.0-41-generic
Found initrd image: /boot/initrd.img-3.19.0-41-generic
Found linux image: /boot/vmlinux-3.19.0-39-generic
Found initrd image: /boot/initrd.img-3.19.0-39-generic
Found Debian GNU/Linux (8.2) on /dev/sdb2
Found unknown Linux distribution on /dev/...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for running those two tests. The results would indicate this bug still exists in the mainline kernel and commit f548d99ef4 is not the fix for this bug.

Was there a prior kernel that did not exhibit the bug? If that is the case, and this is a regression, we can perform a kernel bisect to identify the commit that introduced the bug.

tags: added: kernel-bug-exists-upstream
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-05 17:08 EDT-------
Re-posting most recent comment so Canonical can see it:

Similar behaviour is observed in ubuntu 14.04.4 testing.

runltp test execution stopped after executing rcu_torture test

Fllowing Call trace is observed in "dmesg" output

[63437.060777] INFO: rcu_sched self-detected stall on CPU { 6} (t=11284398 jiffies g=511721 c=511720 q=147264)
[63437.060796] Task dump for CPU 6:
[63437.060798] lock_torture_wr R running task 0 32581 2 0x00000804
[63437.060803] Call Trace:
[63437.060806] [c000000306ec7560] [c0000000000f55f0] sched_show_task+0xe0/0x180 (unreliable)
[63437.060812] [c000000306ec75d0] [c000000000133c58] rcu_dump_cpu_stacks+0xe8/0x160
[63437.060817] [c000000306ec7620] [c0000000001384a4] rcu_check_callbacks+0x634/0x960
[63437.060822] [c000000306ec7750] [c00000000013fec8] update_process_times+0x58/0xa0
[63437.060826] [c000000306ec7780] [c000000000156438] tick_sched_handle.isra.15+0x48/0xe0
[63437.060831] [c000000306ec77c0] [c000000000156534] tick_sched_timer+0x64/0xc0
[63437.060836] [c000000306ec7800] [c000000000140954] __hrtimer_run_queues+0x124/0x350
[63437.060841] [c000000306ec7890] [c00000000014174c] hrtimer_interrupt+0xec/0x290
[63437.060846] [c000000306ec7950] [c00000000001f3c0] __timer_interrupt+0x90/0x230
[63437.060851] [c000000306ec79a0] [c00000000001f710] timer_interrupt+0xa0/0xe0
[63437.060855] [c000000306ec79d0] [c000000000002714] decrementer_common+0x114/0x180
[63437.060861] --- interrupt: 901 at udelay+0x44/0x60
[63437.060861] LR = torture_rwlock_write_delay+0x5c/0x90 [locktorture]
[63437.060868] [c000000306ec7cc0] [d000000004010bc8] torture_rwlock_write_delay+0x28/0x90 [locktorture] (unreliable)
[63437.060874] [c000000306ec7cf0] [d0000000040105c0] lock_torture_writer+0xe0/0x210 [locktorture]
[63437.060880] [c000000306ec7d80] [c0000000000e0a74] kthread+0x114/0x140
[63437.060884] [c000000306ec7e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you review comment #14? Per comment #12, this bug still exists upstream. It would be good to know if this is a regression, so we can perform a bisect and identify the commit that introduced this bug.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-07 06:46 EDT-------
We are not sure if this is a regression since this was a new test scenario we were trying to execute on Ubuntu 14.04.3 OS kernel.
So, I think this issue could have existed prior to Ubuntu 14.04.3 kernels.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Just to confirm this is not a regression, can you test the following kernel:

3.13.0-10.30: https://launchpad.net/ubuntu/+source/linux/3.13.0-10.30/+build/5613571

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs are attached when tests are done with new kernel provided

------- Comment (attachment only) From <email address hidden> 2015-08-11 10:15 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.9 KiB)

------- Comment From <email address hidden> 2016-01-08 08:03 EDT-------
I have installed the proposed kernel which you have mentioned on the P8 Hardware.

root@powerkvmpok002:~/5613571# dpkg -i linux-image-3.13.0-10-generic_3.13.0-10.30_ppc64el.deb
Selecting previously unselected package linux-image-3.13.0-10-generic.
(Reading database ... 155249 files and directories currently installed.)
Preparing to unpack linux-image-3.13.0-10-generic_3.13.0-10.30_ppc64el.deb ...
Done.
Unpacking linux-image-3.13.0-10-generic (3.13.0-10.30) ...
Setting up linux-image-3.13.0-10-generic (3.13.0-10.30) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.13.0-10-generic /boot/vmlinux-3.13.0-10-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.13.0-10-generic /boot/vmlinux-3.13.0-10-generic
update-initramfs: Generating /boot/initrd.img-3.13.0-10-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 3.13.0-10-generic /boot/vmlinux-3.13.0-10-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.13.0-10-generic /boot/vmlinux-3.13.0-10-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic
Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic
Found linux image: /boot/vmlinux-3.19.0-41-generic
Found initrd image: /boot/initrd.img-3.19.0-41-generic
Found linux image: /boot/vmlinux-3.19.0-39-generic
Found initrd image: /boot/initrd.img-3.19.0-39-generic
Found linux image: /boot/vmlinux-3.13.0-10-generic
Found initrd image: /boot/initrd.img-3.13.0-10-generic
Found Debian GNU/Linux (8.2) on /dev/sdb2
Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system
done

root@powerkvmpok002:~/5613571# dpkg -i linux-image-extra-3.13.0-10-generic_3.13.0-10.30_ppc64el.deb
Selecting previously unselected package linux-image-extra-3.13.0-10-generic.
(Reading database ... 155936 files and directories currently installed.)
Preparing to unpack linux-image-extra-3.13.0-10-generic_3.13.0-10.30_ppc64el.deb ...
Unpacking linux-image-extra-3.13.0-10-generic (3.13.0-10.30) ...
Setting up linux-image-extra-3.13.0-10-generic (3.13.0-10.30) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
initrd.img(/boot/initrd.img-3.13.0-10-generic
) points to /boot/initrd.img-3.13.0-10-generic
(/boot/initrd.img-3.13.0-10-generic) -- doing nothing at /var/lib/dpkg/info/linux-image-extra-3.13.0-10-generic.postinst line 491.
vmlinux(/boot/vmlinux-3.13.0-10-generic
) points to /boot/vmlinux-3.13.0-10-generic
(/boot/vmlinux-3.13.0-10-generic) -- doing nothing at /var/lib/dpkg/info/linux-image-extra-3.13.0-10-generic.postinst line 491.
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.13.0-10-generic /boot/vmlinux-3.13.0-10-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.13.0-10-generic /boot/vmlinux-3.13.0-10-generic
update-initramfs: Generating /boot/initrd.img-3.13.0-10-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 3.13.0-10-generic /boo...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

There was a 3.18 based kernel for Vivid during development. That kernel can be downloaded from:
https://launchpad.net/ubuntu/+source/linux/3.18.0-8.9/+build/6690705

Can you test that one as well?

Also, is it possible for you to attach the lock_torture.sh script to the bug? We may be able to modify it, so it will run with a 3.13 kernel.

Thanks in advance.

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (4.1 KiB)

------- Comment From <email address hidden> 2016-01-11 01:01 EDT-------
I have installed 3.18 kernel provided in the link:
https://launchpad.net/ubuntu/+source/linux/3.18.0-8.9/+build/6690705

root@powerkvmpok002:~/6690705# dpkg -i linux-image-3.18.0-8-generic_3.18.0-8.9_ppc64el.deb
Selecting previously unselected package linux-image-3.18.0-8-generic.
(Reading database ... 164194 files and directories currently installed.)
Preparing to unpack linux-image-3.18.0-8-generic_3.18.0-8.9_ppc64el.deb ...
Done.
Unpacking linux-image-3.18.0-8-generic (3.18.0-8.9) ...
Setting up linux-image-3.18.0-8-generic (3.18.0-8.9) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 3.18.0-8-generic /boot/vmlinux-3.18.0-8-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 3.18.0-8-generic /boot/vmlinux-3.18.0-8-generic
update-initramfs: Generating /boot/initrd.img-3.18.0-8-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 3.18.0-8-generic /boot/vmlinux-3.18.0-8-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.18.0-8-generic /boot/vmlinux-3.18.0-8-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic
Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic
Found linux image: /boot/vmlinux-3.19.0-41-generic
Found initrd image: /boot/initrd.img-3.19.0-41-generic
Found linux image: /boot/vmlinux-3.19.0-39-generic
Found initrd image: /boot/initrd.img-3.19.0-39-generic
Found linux image: /boot/vmlinux-3.18.0-8-generic
Found initrd image: /boot/initrd.img-3.18.0-8-generic
Found linux image: /boot/vmlinux-3.13.0-10-generic
Found initrd image: /boot/initrd.img-3.13.0-10-generic
Found Debian GNU/Linux (8.2) on /dev/sdb2
Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system
done

root@powerkvmpok002:~# uname -a
Linux powerkvmpok002 3.18.0-8-generic #9-Ubuntu SMP Mon Jan 5 22:52:15 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

Then executed the lock_torture tests again on the 3.18 Vivid kernel.

root@powerkvmpok002:~# cd /opt/ltp/testcases/bin/
root@powerkvmpok002:/opt/ltp/testcases/bin# ./lock_torture.sh
./lock_torture.sh: 55: ./lock_torture.sh: tst_kvercmp: not found

[ 148.177770] NMI watchdog: BUG: soft lockup - CPU#100 stuck for 23s! [lock_torture_wr:2994]
[ 148.177773] NMI watchdog: BUG: soft lockup - CPU#101 stuck for 23s! [lock_torture_wr:2986]
[ 148.177775] NMI watchdog: BUG: soft lockup - CPU#99 stuck for 23s! [lock_torture_wr:2960]
[ 148.177780] NMI watchdog: BUG: soft lockup - CPU#102 stuck for 23s! [lock_torture_wr:2999]
[ 148.177783] NMI watchdog: BUG: soft lockup - CPU#103 stuck for 23s! [lock_torture_wr:2995]
[ 148.181769] NMI watchdog: BUG: soft lockup - CPU#104 stuck for 23s! [lock_torture_wr:2987]
[ 148.185771] NMI watchdog: BUG: soft lockup - CPU#107 stuck for 23s! [lock_torture_wr:2990]
[ 148.185774] NMI watchdog: BUG: soft lockup - CPU#106 stuck for 23s! [lock_torture_wr:2992]
[ 148.189769] NMI watchdog: BUG: soft lockup - CPU#108 stuck for 23s! [lock_torture_wr:2984]
[ 148.189773] NMI wat...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : Attaching lock_torture.sh script file

------- Comment (attachment only) From <email address hidden> 2016-01-11 01:06 EDT-------

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I modified the lock_torture script to run with the 3.13 or newer kernel. The new script is attached to the bug. Can you test the kernel mentioned in comment 18 with this new script?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-01-16 05:05 EDT-------
I have copied lock_torture script on to the P8 server with the 3.13 kernel installed.
Then executed the test script again in-order to re-create the scenario.

root@powerkvmpok002:/opt/ltp/testcases/bin# ./lock_torture.sh
lock_torture 1 TCONF : Test requires locktorture module
lock_torture 1 TINFO : cleanup

root@powerkvmpok002:/opt/ltp/testcases/bin# modprobe locktorture
modprobe: FATAL: Module locktorture not found.

As can be seen above the locktorture module is required in 3.13 kernel.
I think, this module is not supported in the 3.13 kernel.

bugproxy (bugproxy)
tags: added: severity-high targetmilestone-inin14044
removed: severity-critical targetmilestone-inin14043
tags: added: kernel-key
tags: removed: kernel-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The 4.6-rc1 test kernel is now available. Can you test this kernel to see if it also has this issue? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-rc1-wily/

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-06 07:38 EDT-------
I have installed the 4.6-rc1 test kernel and executed the tests on a Bare Metal Open Power Hardware.
I could not see the NMI softlockup call traces occurring.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update. Can you also test the v4.5 upstream kernel? It can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

If the 4.5 kernel has the bug, I can perform a "Reverse" bisect between v4.5 and v4.6-rc1 to find the commit in 4.5-rc1 that fixes the bug.

Thanks!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-19 01:25 EDT-------
I don't have any machine currently for repeatedly testing with different upstream kernels as and when they are released.
Since the test steps are quite simple and straight forward, I think it would be better to re-create the issue at your end with different kernels as required.

Revision history for this message
Fabio C. Barrionuevo (luzfcb) wrote :

I am using Ubuntu 16.04 Beta Final, with all available updates installed (21-04-2016) and this bug occurs repeatedly with my Dell Latitude 3450 (core i7-5500U, 8GB RAM, Nvidia 830m, Wireless 7265).

This bug occurred with kernel:

* 4.4.0-18.34

* 4.4.0-21.37

the occurrence of this bug has decreased considerably after I install the kernel 4.5.0-040500 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

following this tutorial
http://ubuntuhandbook.org/index.php/2016/03/upgrade-linux-kernel-4-5-ubuntu/

Revision history for this message
t2d (t2d) wrote :
Download full text (5.7 KiB)

It seems we have the same problem with latest LTS kernel

# uname -a
Linux dc01ram1rls 4.4.0-18-generic #34~14.04.1-Ubuntu SMP Thu Apr 7 18:31:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty

The kernel was installed with
# apt-get install --install-recommends linux-generic-lts-xenial

Errors in /var/log/kern.log look like:

Apr 24 03:31:58 dc01ram1rls kernel: [280174.661115] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [rsync:10
4799]
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669494] Modules linked in: ip6table_filter ip6_tables iptable_filter ip
_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc dm_crypt intel_rapl ipmi_ssif x86_pkg_temp_ther
mal intel_powerclamp ipmi_devintf coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x8
6_64 lrw gf128mul glue_helper ablk_helper sb_edac cryptd dcdbas edac_core input_leds mei_me lpc_ich mei ipmi_si 825
0_fintek shpchp ipmi_msghandler acpi_power_meter mac_hid parport_pc ppdev lp parport igb dca ptp hid_generic usbhid
 uas hid usb_storage ahci pps_core megaraid_sas i2c_algo_bit libahci wmi fjes
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669522] CPU: 4 PID: 104799 Comm: rsync Not tainted 4.4.0-18-generic #34
~14.04.1-Ubuntu
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669523] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.3.6 06/0
8/2015
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669525] task: ffff88041ebea940 ti: ffff880402580000 task.ti: ffff880402
580000
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669526] RIP: 0010:[<ffffffff810c4e50>] [<ffffffff810c4e50>] native_que
ued_spin_lock_slowpath+0x160/0x170
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669531] RSP: 0018:ffff8804025839f8 EFLAGS: 00000202
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669532] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 00000000000000
01
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669533] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffffff81fce3
20
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669533] RBP: ffff8804025839f8 R08: 0000000000000101 R09: ffff880f9da4b7
a4
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669534] R10: ffff880855937000 R11: 000000000000008c R12: 00000000000000
00
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669535] R13: ffff880f9e2799f0 R14: ffff880f9e2799c0 R15: 00000000000000
00
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669536] FS: 00007f0980b0b740(0000) GS:ffff88085ec80000(0000) knlGS:000
0000000000000
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669538] CR2: 0000000002c0dd08 CR3: 0000000405421000 CR4: 00000000001426
e0
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669539] Stack:
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669539] ffff880402583a08 ffffffff81180407 ffff880402583a18 ffffffff817
ee550
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669541] ffff880402583a80 ffffffff8125ad4a 0000000000000000 ffff8804025
83a68
Apr 24 03:31:58 dc01ram1rls kernel: [280174.669542] ffffffff812332ab ffff880f9c044140 ...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs are attached

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs are attached when tests are done with new kernel provided

------- Comment (attachment only) From <email address hidden> 2015-08-11 10:15 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Attaching lock_torture.sh script file

------- Comment (attachment only) From <email address hidden> 2016-01-11 01:06 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : lock_torture-3.13-kernel.sh

Default Comment by Bridge

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll try to reproduce the bug, so we can perform a reverse bisect and identify the fix for this bug.

Does this bug only happen on ppc64le? According to comment #31, this may also be happening on x86_64?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-05-30 01:57 EDT-------
(In reply to comment #52)
> I'll try to reproduce the bug, so we can perform a reverse bisect and
> identify the fix for this bug.
>
> Does this bug only happen on ppc64le? According to comment #31, this may
> also be happening on x86_64?
I can confirm only for ppl64le, not sure about x86_64 as we have not tested on x86_64 but based on LP comment#31 it seems the issue happens on x86_64 as well.

Revision history for this message
Fabio C. Barrionuevo (luzfcb) wrote :

after the last (2016-05-29) ubuntu packages update, i unable to start Ubuntu 16.04 with any of the installed kernels:

4.5.0-040500-generic ( from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/ )
4.4.0-22-generic
4.4.0-21-generic

I can confirm that this problem also occurs with x86_64

Until the moment, I find two things:

I can only start the ubuntu 16.04 if:

1 - Disable Multi Core support on BIOS

or

2 - add "acpi=off" on grub start line on system start

I think it's a problem with the processor (bug) or a bug with the identify/access the ACPI features

I really want to find a definitive solution to this problem.

If you need more information or need me to run a program, script, or compile the kernel to test (the last time I did it was in Ubuntu 8.04).

please tell me what needs to be run and how, I'll be happy to help.

I've attached the kernel log que contains the problem traceback

this is lscpu from the system started after disable Multi Core support on BIOS:

fabio@luuzfcb:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Stepping: 4
CPU MHz: 799.968
CPU max MHz: 3000.0000
CPU min MHz: 500.0000
BogoMIPS: 4788.86
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts

no longer affects: linux-lts-xenial (Ubuntu Vivid)
Changed in linux-lts-xenial (Ubuntu):
importance: Undecided → High
status: New → In Progress
status: In Progress → Triaged
Revision history for this message
Fabio C. Barrionuevo (luzfcb) wrote :

on my Dell Latitude 3450, Core i7-5500U, some new discoveries:

i can start my system if i:

* enable Multi Core support on BIOS
* and disable Hyperthreading on BIOS

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Fabio, are you seeing the soft lockup when running the lock_torture.sh script from LTP? Or are you just seeing similar error messages in your syslog?

I'm attempting to reproduce this bug with lock_torture.sh, but have not been able to so far. However, there are some differences:

- My system is and amd64.
- I'm using the LTP src from: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483343
- I'm testing the latest Xenial 16.04 kernel.

I'll try using older kernels and the same LTP tar ball you used as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

One other question, do you used the default values for lock_torture.sh?

Revision history for this message
Fabio C. Barrionuevo (luzfcb) wrote :

@Joseph, i see similar error messages in your syslog.

Dell replace the motherboard of my notebook (Latitude 3450), and consequently processor Core i7--5500U @ 2.4GHz, Nvidia GeForce 830M 2GB are replaced.

After this fact, none of problems appeared again.

then, by inference, I think the problems occurred due to failure in the manufacturing process,

if you still want me to perform any test, please explain me step by step how to run it.

Sorry the noise.

Changed in linux (Ubuntu Vivid):
status: In Progress → Incomplete
bugproxy (bugproxy)
tags: added: targetmilestone-inin1610
removed: targetmilestone-inin14044
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-10-06 01:24 EDT-------
Externalizing comment 61 and 62
-----
The Same test is causing problem in ubuntu kernel version 4.8.0-17-generic.
Got below call traces:
[48755.933755] 112-...: (2 ticks this GP) idle=b95/140000000000000/0 softirq=11738/11738 fqs=2829432
[48755.933979] 118-...: (5 ticks this GP) idle=011/140000000000000/0 softirq=6041/6041 fqs=2829432
[48755.934264] (detected by 29, t=11947979 jiffies, g=5047, c=5046, q=1984242)
[48692.907932] Call Trace:
[48692.907934] [c0000017e8a83980] [c0000000000d57f8] irq_exit+0x98/0x100 (unreliable)
[48692.907936] [c0000017e8a839a0] [c000000000024810] timer_interrupt+0xa0/0xe0
[48692.907938] [c0000017e8a839d0] [c000000000002814] decrementer_common+0x114/0x180
[48692.907942] --- interrupt: 901 at _cond_resched+0x0/0xa0
LR = stutter_wait+0x8c/0x130 [torture]
[48692.907945] [c0000017e8a83cc0] [d0000000158e0348] torture_rwlock_write_lock_irq+0x30/0x60 [locktorture] (unreliable)
[48692.907948] [c0000017e8a83cf0] [d0000000158e0a88] lock_torture_writer+0xc0/0x270 [locktorture]
[48692.907950] [c0000017e8a83d80] [c0000000000fd120] kthread+0x110/0x130
[48692.907954] [c0000017e8a83e30] [c0000000000098f0] ret_from_kernel_thread+0x5c/0x6c

This call traces keeps displaying from the time we execute the script and system is in hanged state.
---

Issue is seen in ubuntu 16.10 as well.

---

Canonical,

May we have some attention on this issue.

Thanks,
Chandan

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Without hardware it is difficult for us to reproduce these kinds of issues if they are unique to ppc64el. I suggest that you are going to need IBM developer assistance. Also, the attached lock torture script is incomplete in that it requires 'test.sh' which does not exist.

Revision history for this message
bugproxy (bugproxy) wrote : kern.log.1

Default Comment by Bridge

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in linux (Ubuntu Vivid):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux-lts-xenial (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-02-06 01:36 EDT-------
Hi,

Ran runltp on 17.04, and ended up with continuous call traces on console.

---uname output---
Linux ltc-garri1 4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

---Call Trace---
[332850.158228] Call Trace:
[332850.158253] [c000001fcec4fb50] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
[332850.159375] [c000001fcec4fbb0] [c000000000b193a0] __schedule+0x300/0x990
[332850.159872] [c000001fcec4fc90] [c000000000b19e88] _cond_resched+0x78/0xa0
[332850.160475] [c000001fcec4fcc0] [d00000001c810278] torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]
[332850.161157] [c000001fcec4fcf0] [d00000001c810988] lock_torture_writer+0xc0/0x270 [locktorture]
[332850.170046] [c000001fcec4fd80] [c0000000000f53c0] kthread+0x110/0x130
[332850.173108] [c000001fcec4fe30] [c00000000000c0e8] ret_from_kernel_thread+0x5c/0x74
[332850.174169] Task dump for CPU 10:
[332850.174202] lock_torture_wr R running task 0 37019 2 0x00000804
[332850.174734] Call Trace:
[332850.175218] [c000001fcf3ab980] [c0000000000d0008] irq_exit+0xe8/0x120 (unreliable)
[332850.175813] [c000001fcf3ab9a0] [c000000000024e10] timer_interrupt+0xa0/0xe0
[332850.176438] --- interrupt: d00000001c6008c0 at finish_task_switch+0xb0/0x340
[332850.176438] LR = stutter_wait+0x8c/0x130 [torture]
[332850.177950] [c000001fcf3ab9d0] [c000000000009514] decrementer_common+0x114/0x180 (unreliable)
[332850.178517] --- interrupt: 901 at torture_shutdown_init+0x2a4/0x770 [torture]
[332850.178517] LR = stutter_wait+0x8c/0x130 [torture]
[332850.179557] [c000001fcf3abcc0] [d00000001c810278] torture_rwlock_write_lock_irq+0x30/0x60 [locktorture] (unreliable)
[332850.180675] [c000001fcf3abcf0] [d00000001c810988] lock_torture_writer+0xc0/0x270 [locktorture]
[332850.181646] [c000001fcf3abd80] [c0000000000f53c0] kthread+0x110/0x130
[332850.194048] [c000001fcf3abe30] [c00000000000c0e8] ret_from_kernel_thread+0x5c/0x74
[332850.196658] Task dump for CPU 12:
[332850.198840] lock_torture_wr R running task 0 36921 2 0x00000804
[332850.201377] Call Trace:
[332850.201402] [c000001fc3747890] [c00000000016127c] hrtimer_interrupt+0x11c/0x2a0 (unreliable)
[332850.206296] [c000001fc3747a60] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
[332850.208647] [c000001fc3747ac0] [c000000000b193a0] __schedule+0x300/0x990
[332850.211007] [c000001fc3747b00] [c00000000015bc78] lock_timer_base+0xa8/0x100
[332850.214641] [c000001fc3747b60] [c00000000015bde8] try_to_del_timer_sync+0x68/0xa0
[332850.215236] [c000001fc3747ba0] [c00000000015be90] del_timer_sync+0x70/0x90
[332850.215817] [c000001fc3747bd0] [c000000000b1de14] schedule_timeout+0x234/0x470
[332850.216369] [c000001fc3747cc0] [d00000001c810278] torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]
[332850.217397] [c000001fc3747cf0] [d00000001c810988] lock_torture_writer+0xc0/0x270 [locktorture]
[332850.218433] [c000001fc3747d80] [c0000000000f53c0] kthread+0x110/0x130
[332850.219013] [c000001fc3747e30] [c00000000000c0e8] ret_from_kernel_thread+0x5c/0x74

Thanks,
Pooja

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (4.9 KiB)

------- Comment From <email address hidden> 2017-02-06 03:07 EDT-------
(In reply to comment #71)
> Hi,
>
> Ran runltp on 17.04, and ended up with continuous call traces on console.
>
> ---uname output---
> Linux ltc-garri1 4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC
> 2017 ppc64le ppc64le ppc64le GNU/Linux
>
> ---Call Trace---
> [332850.158228] Call Trace:
> [332850.158253] [c000001fcec4fb50] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
> [332850.159375] [c000001fcec4fbb0] [c000000000b193a0] __schedule+0x300/0x990
> [332850.159872] [c000001fcec4fc90] [c000000000b19e88] _cond_resched+0x78/0xa0
> [332850.160475] [c000001fcec4fcc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]

This one is weird, because torture_rwlock_write_lock_irq will acquire a lock with irq disabled, so it will not get any change to reschedule..

And the asm code for /lib/modules/4.9.0-15-generic/kernel/kernel/locking/locktorture.ko:

0000000000000248 <torture_rwlock_write_lock_irq>:
248: f8 ff 4c e8 ld r2,-8(r12)
248: R_PPC64_ENTRY *ABS*
24c: 14 62 42 7c add r2,r2,r12
250: a6 02 08 7c mflr r0
254: 01 00 00 48 bl 254 <torture_rwlock_write_lock_irq+0xc>
254: R_PPC64_REL24 _mcount
258: a6 02 08 7c mflr r0
25c: f8 ff e1 fb std r31,-8(r1)
260: 10 00 01 f8 std r0,16(r1)
264: d1 ff 21 f8 stdu r1,-48(r1)
268: 00 00 22 3d addis r9,r2,0
268: R_PPC64_TOC16_HA .toc
26c: 00 00 e9 eb ld r31,0(r9)
26c: R_PPC64_TOC16_LO_DS .toc
270: 30 00 7f 38 addi r3,r31,48
274: 01 00 00 48 bl 274 <torture_rwlock_write_lock_irq+0x2c>
274: R_PPC64_REL24 _raw_write_lock_irqsave
278: 00 00 00 60 nop
27c: 18 00 3f e9 ld r9,24(r31)
280: 40 00 69 f8 std r3,64(r9)
284: 00 00 60 38 li r3,0
288: 30 00 21 38 addi r1,r1,48

so torture_rwlock_write_lock_irq+0x30/0x60 is

278: 00 00 00 60 nop

right after the return of call to _raw_write_lock_irqsave, which should be fixed up by the instruction that recover the toc, IIUC. How that end up being a call to _cond_resched() is still unknown.

Besides, Pooja, I cannot reproduce this in a PowerKVM guest with 16 VCPUs. What's your setup for the machine that hit the problem? And how many times you need to run the testcase in order to hit one?

Regards,
Boqun

> [332850.161157] [c000001fcec4fcf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.170046] [c000001fcec4fd80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.173108] [c000001fcec4fe30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
> [332850.174169] Task dump for CPU 10:
> [332850.174202] lock_torture_wr R running task 0 37019 2
> 0x00000804
> [332850.174734] Call Trace:
> [332850.175218] [c000001fcf3ab980] [c0000000000d0008] irq_exit+0xe8/0x120
> (unreliable)
> [332850.175813] [c000001fcf3ab9a0] [c000000000024e10]
> timer_interrupt+0xa0/0xe0
> [332850.176438] --- interrupt: d00000001c6008c0 at
> finish_task_switch+0xb0/0x340
> [332850.176438] LR = stutter_wait+0x8c...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : lock_torture-3.13-kernel.sh

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : kern.log.1

Default Comment by Bridge

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux-lts-xenial (Ubuntu) because there has been no activity for 60 days.]

Changed in linux-lts-xenial (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Vivid) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Vivid):
status: Incomplete → Expired
To post a comment you must log in.