FC Adapter (LPe32000-based) prints "iotag out of range", goes offline, and delays boot a lot (Ubuntu17.04/Emulex/lpfc))

Bug #1670490 reported by bugproxy on 2017-03-06
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Tim Gardner
Xenial
Undecided
Tim Gardner
Yakkety
Undecided
Tim Gardner
Zesty
Undecided
Tim Gardner

Bug Description

---Problem Description---
FC Adapter goes offline and produces the call traces while booting into OS, on assigning the LUNs to it.

---uname output---
Linux ltciofvtr-firestone1 4.9.0-12-generic #13-Ubuntu SMP Tue Jan 10 12:52:39 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

FC Redfish Adapter(32 GB) is going offline on assigning the Luns to it. Adapter shows online till pettitboot and even Luns are also vissible in pettitboot.
once selecting the OS from pettitboot and booting into it, it produces call traces and also few lpfc errors. attached the comple console logs FYR.
The FC switch to which the Redfish adapter is connected is a 16 GB switch and the adapter is of 32 GB. According to me this parameter should not have any concern as the adapter has to support backward compatibility.

---Steps to Reproduce---
1. install the adapter in a server. and connect it to FC switch (16 GB)
2. create zone and assign the LUNS to it from V7000.
3. reboot the OS

root@ltciofvtr-firestone1:~# ll /sys/class/fc_host/
total 0
drwxr-xr-x 2 root root 0 Jan 25 04:22 ./
drwxr-xr-x 72 root root 0 Jan 25 03:50 ../
lrwxrwxrwx 1 root root 0 Jan 25 04:13 host10 -> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.1/host10/fc_host/host10/
lrwxrwxrwx 1 root root 0 Jan 25 04:13 host6 -> ../../devices/pci0000:00/0000:00:00.0/0000:01:00.0/host6/fc_host/host6/
lrwxrwxrwx 1 root root 0 Jan 25 04:13 host8 -> ../../devices/pci0000:00/0000:00:00.0/0000:01:00.1/host8/fc_host/host8/
lrwxrwxrwx 1 root root 0 Jan 25 04:13 host9 -> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.0/host9/fc_host/host9/
root@ltciofvtr-firestone1:~#
root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host6/port_state
Offline
root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host8/port_state
Offline
root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host9/port_state
Online
root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host10/port_state
Online
root@ltciofvtr-firestone1:~#
root@ltciofvtr-firestone1:~# lspci -nn | grep -i fibre
0000:01:00.0 Fibre Channel [0c04]: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter [10df:e300] (rev 01)
0000:01:00.1 Fibre Channel [0c04]: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter [10df:e300] (rev 01)
0001:01:00.0 Fibre Channel [0c04]: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter [10df:e200] (rev 10)
0001:01:00.1 Fibre Channel [0c04]: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter [10df:e200] (rev 10)
root@ltciofvtr-firestone1:~#

Device driver error code:
 [ 537.317563] lpfc 0000:01:00.1: 1:0338 IOCB wait timeout error - no wake response Data x3c
[ 537.317755] lpfc 0000:01:00.1: 1:(0):0727 TMF FCP_LUN_RESET to TGT 1 LUN 0 failed (0, 0) iocb_flag x206
[ 537.317934] lpfc 0000:01:00.1: 1:(0):0713 SCSI layer issued Device Reset (1, 0) return x2007
[ 537.318005] lpfc 0000:01:00.1: 1:0372 iotag x0 is out off range: max iotag (x880)
[ 551.653563] lpfc 0000:01:00.0: 0:(0):0748 abort handler timed out waiting for abortng I/O (xri:x149) to complete: ret 0x2003, ID 1, LUN 1
[ 551.653795] lpfc 0000:01:00.0: 0:0372 iotag x0 is out off range: max iotag (x880)
[ 598.757557] lpfc 0000:01:00.1: 1:0338 IOCB wait timeout error - no wake response Data x3c
[ 598.757766] lpfc 0000:01:00.1: 1:(0):0727 TMF FCP_LUN_RESET to TGT 1 LUN 1 failed (0, 0) iocb_flag x206
[ 598.757946] lpfc 0000:01:00.1: 1:(0):0713 SCSI layer issued Device Reset (1, 1) return x2007
[ 598.758017] lpfc 0000:01:00.1: 1:0372 iotag x0 is out off range: max iotag (x880)
[ 613.093562] lpfc 0000:01:00.0: 0:(0):0748 abort handler timed out waiting for abortng I/O (xri:x14f) to complete: ret 0x2003, ID 1, LUN 0
[ 613.093630] INFO: task systemd-udevd:1148 blocked for more than 120 seconds.
[ 613.093631] Not tainted 4.9.0-12-generic #13-Ubuntu
[ 613.093631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 613.093632] systemd-udevd D 0 1148 1141 0x00040000

Stack trace output:
 [ 613.093633] Call Trace:
[ 613.093634] [c000001fd6b5b360] [ffffffffffffffff] 0xffffffffffffffff (unreliable)
[ 613.093636] [c000001fd6b5b530] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
[ 613.093637] [c000001fd6b5b590] [c000000000b188d8] __schedule+0x2f8/0x990
[ 613.093638] [c000001fd6b5b670] [c000000000b18fb8] schedule+0x48/0xc0
[ 613.093640] [c000001fd6b5b6a0] [c000000000b1d394] schedule_timeout+0x274/0x470
[ 613.093641] [c000001fd6b5b790] [c000000000b19f8c] wait_for_common+0xec/0x240
[ 613.093642] [c000001fd6b5b810] [c0000000000ea27c] flush_work+0x12c/0x270
[ 613.093643] [c000001fd6b5b8a0] [c0000000000eca20] __cancel_work_timer+0xc0/0x220
[ 613.093645] [c000001fd6b5b940] [c00000000059966c] disk_block_events+0xcc/0xe0
[ 613.093646] [c000001fd6b5b990] [c00000000037124c] __blkdev_get+0x9c/0x490
[ 613.093648] [c000001fd6b5ba00] [c000000000372830] blkdev_get+0x1a0/0x4a0
[ 613.093649] [c000001fd6b5bab0] [c0000000003167e0] do_dentry_open+0x2d0/0x470
[ 613.093651] [c000001fd6b5bb10] [c00000000032fee4] do_last+0x614/0x1070
[ 613.093652] [c000001fd6b5bc00] [c000000000330a1c] path_openat+0xdc/0x480
[ 613.093654] [c000001fd6b5bc80] [c00000000033268c] do_filp_open+0xec/0x160
[ 613.093655] [c000001fd6b5bdb0] [c00000000031841c] do_sys_open+0x1cc/0x380
[ 613.093656] [c000001fd6b5be30] [c00000000000bd84] system_call+0x38/0xe0
[ 613.093657] INFO: task systemd-udevd:1155 blocked for more than 120 seconds.
[ 613.093658] Not tainted 4.9.0-12-generic #13-Ubuntu
[ 613.093658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 613.093659] systemd-udevd D 0 1155 1141 0x00040002

Hi Canonical,

Can you please include these 2 commits to the lpfc driver on 17.04 and 16.04 HWE ?
They've just been applied on mainline as of 4.11-rc1.

The first resolves this problem, and the second prevents cache/DMA consistency problems which is likely to be hit in the future with this higher-performance adapter.

I already asked for both patches to be flagged for stable kernels.

[1] 8ea73db486cda442f0671f4bc9c03a76be398a28 lpfc: Correct WQ creation for pagesize
[2] 6b3b3bdb83b4ad51252d21bb13596db879e51850 lpfc: Add missing memory barrier

Thank you.

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/lpfc?id=8ea73db486cda442f0671f4bc9c03a76be398a28
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/lpfc?id=6b3b3bdb83b4ad51252d21bb13596db879e51850

CVE References

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-150924 severity-high targetmilestone-inin1704
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Download full text (7.1 KiB)

Leann,

A few driver patches for the Kernel team to look at.

                    Michael

On 03/06/2017 01:10 PM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> ---Problem Description---
> FC Adapter goes offline and produces the call traces while booting into OS, on assigning the LUNs to it.
>
> ---uname output---
> Linux ltciofvtr-firestone1 4.9.0-12-generic #13-Ubuntu SMP Tue Jan 10 12:52:39 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
>
> FC Redfish Adapter(32 GB) is going offline on assigning the Luns to it. Adapter shows online till pettitboot and even Luns are also vissible in pettitboot.
> once selecting the OS from pettitboot and booting into it, it produces call traces and also few lpfc errors. attached the comple console logs FYR.
> The FC switch to which the Redfish adapter is connected is a 16 GB switch and the adapter is of 32 GB. According to me this parameter should not have any concern as the adapter has to support backward compatibility.
>
> ---Steps to Reproduce---
> 1. install the adapter in a server. and connect it to FC switch (16 GB)
> 2. create zone and assign the LUNS to it from V7000.
> 3. reboot the OS
>
>
> root@ltciofvtr-firestone1:~# ll /sys/class/fc_host/
> total 0
> drwxr-xr-x 2 root root 0 Jan 25 04:22 ./
> drwxr-xr-x 72 root root 0 Jan 25 03:50 ../
> lrwxrwxrwx 1 root root 0 Jan 25 04:13 host10 -> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.1/host10/fc_host/host10/
> lrwxrwxrwx 1 root root 0 Jan 25 04:13 host6 -> ../../devices/pci0000:00/0000:00:00.0/0000:01:00.0/host6/fc_host/host6/
> lrwxrwxrwx 1 root root 0 Jan 25 04:13 host8 -> ../../devices/pci0000:00/0000:00:00.0/0000:01:00.1/host8/fc_host/host8/
> lrwxrwxrwx 1 root root 0 Jan 25 04:13 host9 -> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.0/host9/fc_host/host9/
> root@ltciofvtr-firestone1:~#
> root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host6/port_state
> Offline
> root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host8/port_state
> Offline
> root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host9/port_state
> Online
> root@ltciofvtr-firestone1:~# cat /sys/class/fc_host/host10/port_state
> Online
> root@ltciofvtr-firestone1:~#
> root@ltciofvtr-firestone1:~# lspci -nn | grep -i fibre
> 0000:01:00.0 Fibre Channel [0c04]: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter [10df:e300] (rev 01)
> 0000:01:00.1 Fibre Channel [0c04]: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter [10df:e300] (rev 01)
> 0001:01:00.0 Fibre Channel [0c04]: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter [10df:e200] (rev 10)
> 0001:01:00.1 Fibre Channel [0c04]: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter [10df:e200] (rev 10)
> root@ltciofvtr-firestone1:~#
>
>
> Device driver error code:
> [ 537.317563] lpfc 0000:01:00.1: 1:0338 IOCB wait timeout error - no wake response Data x3c
> [ 537.317755] lpfc 0000:01:00.1: 1:(0):0727 TMF FCP_LUN_RESET to TGT 1 LUN 0 failed (0, 0) iocb_flag x206
> [ 537.317934] lpfc 0000:01:00.1: 1:(0):0713 SCSI layer issued Device Reset (1, 0) return x2007
> [ 537.318005] lpfc 0000:01:00.1: 1:0...

Read more...

Tim Gardner (timg-tpi) on 2017-03-07
Changed in linux (Ubuntu Zesty):
assignee: Taco Screen team (taco-screen-team) → Tim Gardner (timg-tpi)
status: New → Fix Committed
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-13.15

---------------
linux (4.10.0-13.15) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1671614

  * ehci-platform needed in usb-modules udeb (LP: #1671589)
    - d-i: add ehci-platform to usb-modules

  * irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints
    (LP: #1671598)
    - irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints

  * iommu: Fix static checker warning in iommu_insert_device_resv_regions
    (LP: #1671599)
    - iommu: Fix static checker warning in iommu_insert_device_resv_regions

  * QDF2400: Fix panic introduced by erratum 1003 (LP: #1671602)
    - arm64: Avoid clobbering mm in erratum workaround on QDF2400

  * QDF2400 PCI ports require ACS quirk (LP: #1671601)
    - PCI: Add ACS quirk for Qualcomm QDF2400 and QDF2432

  * tty: pl011: Work around QDF2400 E44 stuck BUSY bit (LP: #1671600)
    - tty: pl011: Work around QDF2400 E44 stuck BUSY bit

  * CVE-2017-2636
    - tty: n_hdlc: get rid of racy n_hdlc.tbuf

  * Sync virtualbox to 5.1.16-dfsg-1 in zesty (LP: #1671470)
    - ubuntu: vbox -- Update to 5.1.16-dfsg-1

 -- Tim Gardner <email address hidden> Thu, 09 Mar 2017 06:16:24 -0700

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2017-03-23
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Launchpad Janitor (janitor) wrote :
Download full text (29.1 KiB)

This bug was fixed in the package linux - 4.4.0-75.96

---------------
linux (4.4.0-75.96) xenial; urgency=low

  * linux: 4.4.0-75.96 -proposed tracker (LP: #1684441)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.4.0-74.95) xenial; urgency=low

  * linux: 4.4.0-74.95 -proposed tracker (LP: #1682041)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.4.0-73.94) xenial; urgency=low

  * linux: 4.4.0-73.94 -proposed tracker (LP: #1680416)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

  * Xenial update to v4.4.59 stable release (LP: #1678960)
    - xfrm: policy: init locks early
    - virtio_balloon: init ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.8.0-49.52

---------------
linux (4.8.0-49.52) yakkety; urgency=low

  * linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.8.0-48.51) yakkety; urgency=low

  * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.8.0-47.50) yakkety; urgency=low

  * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with n...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers