Comment 1 for bug 1580557

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-26 13:27 EDT-------
This seems to be the rport reference problem w/ the lpfc driver,
which makes the rport not to be discovered when it's up again,
resolved by this commit [1],

(despite the host numbers being different than those in the multipath -l of the bug report, the timing of the devloss events and the path removal events do match precisely).

[root@iltuc4-bf var_logs]# grep sdz syslog.1
<...>
Dec 2 03:16:28 ilp1fc85apA4 multipathd: uevent 'remove' from '/devices/pci0003:00/0003:00:0e.5/host6/rport-6:0-6/target6:0:4/6:0:4:0/block/sdz'
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVNAME=/dev/sdz
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVPATH=/devices/pci0003:00/0003:00:0e.5/host6/rport-6:0-6/target6:0:4/6:0:4:0/block/sdz
Dec 2 03:16:28 ilp1fc85apA4 multipathd: sdz: remove path (uevent)
Dec 2 03:16:28 ilp1fc85apA4 multipathd: sdz: path removed from map mpath9

[root@iltuc4-bf var_logs]# grep sdz syslog.1
<...>
Dec 2 03:16:28 ilp1fc85apA4 multipathd: uevent 'remove' from '/devices/pci0001:00/0001:00:07.1/host2/rport-2:0-7/target2:0:5/2:0:5:0/block/sdak'
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVNAME=/dev/sdak
Dec 2 03:16:28 ilp1fc85apA4 multipathd: DEVPATH=/devices/pci0001:00/0001:00:07.1/host2/rport-2:0-7/target2:0:5/2:0:5:0/block/sdak
Dec 2 03:16:28 ilp1fc85apA4 multipathd: sdak: remove path (uevent)
Dec 2 03:16:29 ilp1fc85apA4 multipathd: sdak: path removed from map mpath4

root@iltuc4-bf var_logs]# grep lpfc syslog.1
<...>
Dec 2 03:16:28 ilp1fc85apA4 kernel: [15294.574079] lpfc 0003:00:0e.4: 4:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:20:ef:26 NPort x5e00a0 Data: x0 x8 x3
Dec 2 03:16:28 ilp1fc85apA4 kernel: [15294.580629] lpfc 0003:00:0e.5: 5:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:40:ef:26 NPort x020040 Data: x0 x8 x3
Dec 2 03:16:28 ilp1fc85apA4 kernel: [15294.606688] lpfc 0001:00:07.1: 1:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:40:ef:26 NPort x020040 Data: x0 x8 x3
Dec 2 03:16:29 ilp1fc85apA4 kernel: [15294.974597] lpfc 0001:00:07.0: 0:(0):0203 Devloss timeout on WWPN 50:05:07:68:02:30:ef:26 NPort x0b0000 Data: x0 x8 xa

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/lpfc?id=0290217ad830f2813bb9ed5f51af686c0c591f28

------- Comment From <email address hidden> 2016-03-03 09:57 EDT-------
Hi Bill Gao,

(In reply to comment #10)
> (In reply to comment #9)
>
> > Is it possible to do a non-scheduled/manual test before that?
>
> Yes, it is.

Great.

I've uploaded a test kernel with 2 patches (comment #4 plus a dependency) to
http://ausgsa.ibm.com/~mauricfo/public/bugs/bz133798/v1/

Can you please test whether they resolve the problem?
If they don't, please attach /var/log/syslog and dmesg output.

Thanks!

------- Comment From <email address hidden> 2016-04-25 12:46 EDT-------
Please test with this kernel:

http://ausgsa.ibm.com/~mauricfo/public/bugs/bz133798/v1/

Thanks!

------- Comment From <email address hidden> 2016-05-09 05:21 EDT-------
Kernel updated, the svc ccl case is in progress with 2 loops.

------- Comment From <email address hidden> 2016-05-10 21:14 EDT-------
Completed SVC CCL EI with 2 loops, didn't hit path missing problem.

------- Comment From <email address hidden> 2016-05-11 07:37 EDT-------
Hi Canonical,

The 2 upstream commits that resolve this problem are:

0290217ad830f2813bb9ed5f51af686c0c591f28 lpfc: Correct loss of target discovery after cable swap.
be6bb94100dc6803a530e20aad05360e6267f56b lpfc: Fix premature release of rpi bit in bitmask

Please pull them into 14.04.x.

Thanks!