Backport patch to abort syscalls in active transactions

Bug #1580557 reported by bugproxy on 2016-05-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Canonical Kernel Team

Bug Description

== Comment: #0 - Tulio Magno Quites Machado Filho - 2016-02-23 12:47:09 ==
---Problem Description---
This is happening on Ubuntu 14.04.3.
User is creating stack structure using C++ transactional memory extension:

    int Pop(int)
    {
        int ret = 0;
        __transaction_atomic
        {
                if(!stack_.empty())
                {
                        ret = stack_.top();
                        stack_.pop();
                } else
                        ret = -1;
        }
        return ret;
    }

While evaluating if(!stack_.empty()), this code calls a libitm function (GCC code), which calls malloc (glibc code) which ends up calling futex (a syscall).
A syscall inside a transaction is forbidden by the kernel, but there is nothing the user can do to avoid this syscall.

This will hang the user application inside the malloc(), which would be waiting for the futex to return.

Ubuntu 14.04 provides glibc 2.19, which is too old to know about HTM.
And this is probably happening with other libraries as well.

Backporting commit b4b56f9e would solve this issue.

---uname output---
Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 16:06:09 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

---Steps to Reproduce---
 Start a transaction, make a syscall.

== Comment: #1 - Wei Guo - 2016-02-24 02:33:03 ==
I already verfied that kernel with patch b4b56f9e (on Ubuntu14.04) will work.

== Comment: #2 - Wei Guo - 2016-02-26 04:20:37 ==
Backport patch for commit b4b56f9e is attached. The patch is based on tag Ubuntu-lts-3.19.0-25.26_14.04.1.

Tested based on Ubuntu 14.04.4 LTS ( 3.19.0-25-generic).

Download full text (3.5 KiB)

------- Comment From <email address hidden> 2016-02-26 13:27 EDT-------
This seems to be the rport reference problem w/ the lpfc driver,
which makes the rport not to be discovered when it's up again,
resolved by this commit [1],

(despite the host numbers being different than those in the multipath -l of the bug report, the timing of the devloss events and the path removal events do match precisely).

[root@iltuc4-bf var_logs]# grep sdz syslog.1
<...>
Dec 2 03:16:28 ilp1fc85apA4 multipathd: uevent 'remove' from '/devices/pci0003:00/0003:00:0e.5/host6/rport-6:0-6/target6:0:4/6:0:4:0/block/sdz'
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVNAME=/dev/sdz
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVPATH=/devices/pci0003:00/0003:00:0e.5/host6/rport-6:0-6/target6:0:4/6:0:4:0/block/sdz
Dec 2 03:16:28 ilp1fc85apA4 multipathd: sdz: remove path (uevent)
Dec 2 03:16:28 ilp1fc85apA4 multipathd: sdz: path removed from map mpath9

[root@iltuc4-bf var_logs]# grep sdz syslog.1
<...>
Dec 2 03:16:28 ilp1fc85apA4 multipathd: uevent 'remove' from '/devices/pci0001:00/0001:00:07.1/host2/rport-2:0-7/target2:0:5/2:0:5:0/block/sdak'
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVNAME=/dev/sdak
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVPATH=/devices/pci0001:00/0001:00:07.1/host2/rport-2:0-7/target2:0:5/2:0:5:0/block/sdak
Dec 2 03:16:28 ilp1fc85apA4 multipathd: sdak: remove path (uevent)
Dec 2 03:16:29 ilp1fc85apA4 multipathd: sdak: path removed from map mpath4

root@iltuc4-bf var_logs]# grep lpfc syslog.1
<...>
Dec 2 03:16:28 ilp1fc85apA4 kernel: [15294.574079] lpfc 0003:00:0e.4: 4:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:20:ef:26 NPort x5e00a0 Data: x0 x8 x3
Dec 2 03:16:28 ilp1fc85apA4 kernel: [15294.580629] lpfc 0003:00:0e.5: 5:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:40:ef:26 NPort x020040 Data: x0 x8 x3
Dec 2 03:16:28 ilp1fc85apA4 kernel: [15294.606688] lpfc 0001:00:07.1: 1:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:40:ef:26 NPort x020040 Data: x0 x8 x3
Dec 2 03:16:29 ilp1fc85apA4 kernel: [15294.974597] lpfc 0001:00:07.0: 0:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:30:ef:26 NPort x0b0000 Data: x0 x8 xa

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/lpfc?id=0290217ad830f2813bb9ed5f51af686c0c591f28

------- Comment From <email address hidden> 2016-03-03 09:57 EDT-------
Hi Bill Gao,

(In reply to comment #10)
> (In reply to comment #9)
>
> > Is it possible to do a non-scheduled/manual test before that?
>
> Yes, it is.

Great.

I've uploaded a test kernel with 2 patches (comment #4 plus a dependency) to
http://ausgsa.ibm.com/~mauricfo/public/bugs/bz133798/v1/

Can you please test whether they resolve the problem?
If they don't, please attach /var/log/syslog and dmesg output.

Thanks!

------- Comment From <email address hidden> 2016-04-25 12:46 EDT-------
Please test with this kernel:

http://ausgsa.ibm.com/~mauricfo/public/bugs/bz133798/v1/

Thanks!

------- Comment From <email address hidden> 2016-05-09 05:21 EDT-------
Kernel updated, the svc ccl case is in progress with 2 loops.

------- Comment From <email address hidden> 2016-05-10 21:14 EDT-------
Completed SVC CCL EI with 2 loops, didn't hit path missing p...

Read more...

tags: added: architecture-ppc64 bugnameltc-133798 severity-medium targetmilestone-inin14044
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-05-11 07:43 EDT-------
There is a problem in the bug bridge.. it seems to have merged 2 bug mirrorings.
I'll check this internally.

------- Comment From <email address hidden> 2016-05-11 07:48 EDT-------
I've manually created LP #1580560 for this bug.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1580557/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Medium
status: Confirmed → Triaged

Hi,

Apologies for the confusion w/ the bug bridge; it has been reported.

LP 1580557 and LP 1580560 are the same bug (previously marked as a dup, but removed later).

It seems the invalidation of this one didn't quite work.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers