Lancer A0 Asic HBA's won't boot with 18.04

Bug #1768103 reported by laurie barry on 2018-04-30
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

== SRU Justification ==
It was reported by Broadcom that an early asic model (A0) of their 16/32GB
HBA's doesn't boot with the lpfc driver in Ubuntu 18.04.

Bionic with the lpfc driver 12.0.0.0 can't see LPe16002-M6 but can see LPe16002B-M6.

This bug is fixed by commits bf316c78517d and c221768bd49a, which are
both still in linux-next. Broadcom tested with this two commits and
confirmed they resolve the bug and allow the system to boot.

== Fixes ==
Currently in linux-next:
bf316c78517d ("scsi: lpfc: Fix WQ/CQ creation for older asic's.")
c221768bd49a ("scsi: lpfc: Fix 16gb hbas failing cq create.")

== Regression Potential ==
Low. Patches fix a current regression and are limited to lpfc.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

We have discovered that an early asic model (A0) of our 16/32GB HBA's doesn't boot with the lpfc driver in Ubuntu 18.04.

After further review and discussion, this has been deemed a low risk issue since early A0 HBA's were only ever shipped to OEMs for test purposes. These cards were never shipped to end customers. We have been working to replace those cards whenever we discover them.

We'll leave it up to Canonical to decide whether they want to pull this in this single patch to an 18.04 subsequent update.

Symptom: Ubuntu 18.04 with lpfc driver 12.0.0.0 they can't see LPe16002-M6 but can see LPe16002B-M6
Resolution: new lpfc driver patch update.

scsi: lpfc: Fix WQ/CQ creation for older asic's.
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.18/scsi-queue&id=83fae8ca4ae09403bfb99542f1aaa292c06cb111

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1768103

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
laurie barry (laurie-barry) wrote :

The original reporter didn't provide logs...logs would not help. Here's the original text of this issue as reported by NetApp test engineer.

Problem report from NetApp:

We’re testing Ubuntu 18.04 before its upcoming release this month and have
found that the most recent kernel that they have pulled in (4.15.0-15-generic)
contains the lpfc driver 12.0.0.0. This driver version seems to have lost
support for the LPe16002-M6 adapter but it DOES have support for the
LPe16002B-M6 adapter.

I’m assuming that you guys are major contributors to your own lpfc driver, so
I’m hoping you can shed some light on this one before I have to resort to going
out to the community.

The previous lpfc version, 11.4.0.1, that was in an Ubuntu 18.04 beta, worked
fine on the “non-B” card.

Matt Schulte

HSG QA Engineer

Interoperability

E-Series Linux

Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: bionic
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with linux-next commit bf316c78517d. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1768103

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

laurie barry (laurie-barry) wrote :

Yes, we will test it out.

Thank you
Laurie

laurie barry (laurie-barry) wrote :

Canonical Team,

Not sure why - but this kernel didn't appear to solve the issue.

Laurie
----------------------------
[reply] [-] Comment 8 Vinay Kumar Laghavarapu 2018-05-08 01:45:19 PDT
Hi Laurie,

We have installed Ubuntu 18.04 OS with latest patches provided in comment7. But
still LPe16002-M6 adapter is not displayed in OS.

OS used : Ubuntu 18.04 LTS (4.15.0-20-generic)

lpfc driver contained :

root@ubuntu18-04:~# cat /sys/module/lpfc/version
0:12.0.0.0

Installed the following .deb packages :

root@ubuntu18-04:~# ls
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21_lp1768103_amd64.deb
linux-modules-4.15.0-20-generic_4.15.0-20.21_lp1768103_amd64.deb
linux-modules-extra-4.15.0-20-generic_4.15.0-20.21_lp1768103_amd64.deb

root@ubuntu18-04:~# lspci | grep -i emu
04:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev
10)
04:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev
10)
06:00.1 Co-processor: Emulex Corporation ServerView iRMC HTI
81:00.0 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel
Host Adapter (rev 01)
81:00.1 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel
Host Adapter (rev 01)
82:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel
Host Adapter
82:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel
Host Adapter
83:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel
Host Adapter (rev 10)
83:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel
Host Adapter (rev 10)
root@ubuntu18-04:~#

You can access the server, SSH IP : 10.123.175.150 (root/password)

Joseph Salisbury (jsalisbury) wrote :

Can you run uname -a to confirm the correct kernel is booted? You should see this string in the output: "lp1768103".

Vinay Kumar (vlaghavarapu) wrote :

root@ubuntu18-04:~# uname -a
Linux ubuntu18-04 4.15.0-20-generic #21~lp1768103 SMP Thu May 3 22:29:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu18-04:~#

I think we have booted to correct kernel.

laurie barry (laurie-barry) wrote :

We apparently have a bug in our patch for this regression and have a fix in hand that we are verifying internally. Once we push that fix upstream and have a commit id, we will update this bug.

thank you
Laurie

Joseph Salisbury (jsalisbury) wrote :

Thank you for the update, Laurie.

laurie barry (laurie-barry) wrote :

Joseph,

Please minimally pull in patches 4/6 and 5/6 from the following upstream submit to address this regression. Apologies for this ongoing hassle.

The 12.0.0.4 patch set was pushed upstream. The patches 4 and 5 are the minimum that should be applied to fix this issue.
[PATCH 4/6] lpfc: Fix 16gb hbas failing cq create.
[PATCH 5/6] lpfc: Fix port initialization failure.

Granted if pulling the 2 patches, we recommend all in the patch set.

The url's for the upstream posting:

[PATCH 0/6] lpfc updates for 12.0.0.4
https://marc.info/?l=linux-scsi&m=152722135315362&w=2

[PATCH 1/6] lpfc: Fix MDS diagnostics failure (Rx < Tx)
https://marc.info/?l=linux-scsi&m=152722135515365&w=2

[PATCH 2/6] lpfc: correct oversubscription of nvme io requests for an adapter
https://marc.info/?l=linux-scsi&m=152722135715367&w=2

[PATCH 3/6] lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
https://marc.info/?l=linux-scsi&m=152722135915370&w=2

[PATCH 4/6] lpfc: Fix 16gb hbas failing cq create.
https://marc.info/?l=linux-scsi&m=152722136015371&w=2

[PATCH 5/6] lpfc: Fix port initialization failure.
https://marc.info/?l=linux-scsi&m=152722136315375&w=2

[PATCH 6/6] lpfc: update driver version to 12.0.0.4
https://marc.info/?l=linux-scsi&m=152722136315376&w=2

laurie barry (laurie-barry) wrote :

Please let us know when there's a revised kernel we can verify.

thx
Laurie

Joseph Salisbury (jsalisbury) wrote :

I attempted to apply patchs 4/6 and 5/6 to Bionic but ran into some issues. Patch 4 does not apply and is trying to make changes to code that never existed in mainline or Bionic. For example I don't see a git history in mainline that the struct lpfc_sli_asic_rev ever existed and the patch is trying to remove it:

--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ -104,17 +104,6 @@ struct lpfc_sli_intf {
 #define LPFC_SLI_INTF_IF_TYPE_VIRT 1
 };

-struct lpfc_sli_asic_rev {
- u32 word0;
-#define LPFC_SLI_ASIC_VER_A 0x0
-#define LPFC_SLI_ASIC_VER_B 0x1
-#define LPFC_SLI_ASIC_VER_C 0x2
-#define LPFC_SLI_ASIC_VER_D 0x3
-#define lpfc_sli_asic_ver_SHIFT 4
-#define lpfc_sli_asic_ver_MASK 0x0000000F
-#define lpfc_sli_asic_ver_WORD word0
-};
-
 #define LPFC_SLI4_MBX_EMBED true
 #define LPFC_SLI4_MBX_NEMBED false

Patch 5/6 looks like it will apply fine. Is there another version of the patch 4, or can you see if it's still needed?

laurie barry (laurie-barry) wrote :

I've asked the maintainer to engage asap on this bug.

Laurie

Dick Kennedy (dick-kennedy) wrote :

Reply to Comment 12.

There were 2 patches to fix this the 1st patch:
author James Smart <email address hidden> 2018-04-09 14:24:28 -0700
committer Martin K. Petersen <email address hidden> 2018-04-18 19:34:04 -0400
commit bf316c78517d9437656293f65a70d6ecdc2ec58e (patch)
tree 8ae623e9b580ec1782dbc2d8711cff8a9c9d5e88 /drivers/scsi/lpfc
parent 01466024d2de1c05652d69411461e8e7908f0d1e (diff)
download scsi-bf316c78517d9437656293f65a70d6ecdc2ec58e.tar.gz
scsi: lpfc: Fix WQ/CQ creation for older asic's.

Was checking for A0 family and asic ver 0, but this was removed because all of the A0 family was effected so thisdiff options
context:
space:
mode:
author James Smart <email address hidden> 2018-05-24 21:09:00 -0700
committer Martin K. Petersen <email address hidden> 2018-05-28 22:40:33 -0400
commit c221768bd49a7423be57c00a56985c0e9c4122cd (patch)
tree 95b132c07e187ecd69665ececbdd7ec12a08772c /drivers/scsi/lpfc
parent 7438273fa23bea6d1e647e66c451570b86e2758b (diff)
download scsi-c221768bd49a7423be57c00a56985c0e9c4122cd.tar.gz
scsi: lpfc: Fix 16gb hbas failing cq create. patch removed the asic ver check:

Joseph Salisbury (jsalisbury) wrote :

The following patch was not applied to Bionic as of yet:

bf316c78517d ("scsi: lpfc: Fix WQ/CQ creation for older asic's.")

So are you saying we should first apply commit bf316c78517d, then patch 4/6 from comment #10?

Dick Kennedy (dick-kennedy) wrote :

yes that is what I am saying or we have to manually apply the last patch. Whatever you want to do?

Dick Kennedy (dick-kennedy) wrote :

This is what your patch should looking like if you skip the 1st one.
@@ -10711,9 +10706,7 @@ lpfc_get_sli4_parameters(struct lpfc_hba *phba, LPFC_MBOXQ_t *mboxq)
        lpfc_printf_log(phba, KERN_INFO, LOG_INIT | LOG_NVME,
                        "6422 XIB %d: FCP %d %d NVME %d %d %d %d\n",
                        bf_get(cfg_xib, mbx_sli4_parameters),
                        phba->fcp_embed_pbde, phba->fcp_embed_io,
                        phba->nvme_support, phba->nvme_embed_pbde,
                        phba->cfg_nvme_embed_cmd, phba->cfg_suppress_rsp);

+ if ((bf_get(lpfc_sli_intf_if_type, &phba->sli4_hba.sli_intf) ==
+ LPFC_SLI_INTF_IF_TYPE_2) &&
+ (bf_get(lpfc_sli_intf_sli_family, &phba->sli4_hba.sli_intf) ==
+ LPFC_SLI_INTF_FAMILY_LNCR_A0))
   exp_wqcq_pages = false;

if ((bf_get(cfg_cqpsize, mbx_sli4_parameters) & LPFC_CQ_16K_PAGE_SZ) &&

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the following two patches:

scsi: lpfc: Fix WQ/CQ creation for older asic's.
scsi: lpfc: Fix 16gb hbas failing cq create.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1768103

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

laurie barry (laurie-barry) wrote :

Yes, I've asked Vinay to test it again.

thank you
Laurie

Vinay Kumar (vlaghavarapu) wrote :

Hi all,

Issue is not observed with latest patch update. We are able to see LPe16002 Lancer A0 adapter.

root@ubuntu18-04:~# uname -a
Linux ubuntu18-04 4.15.0-23-generic #26~lp1768103v2 SMP Mon Jun 4 16:04:59 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu18-04:~#

root@ubuntu18-04:~# lspci | grep -i emu
04:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
04:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
06:00.1 Co-processor: Emulex Corporation ServerView iRMC HTI
81:00.0 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
81:00.1 Fibre Channel: Emulex Corporation Lancer Gen6: LPe32000 Fibre Channel Host Adapter (rev 01)
82:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter
82:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter
83:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter (rev 30)
83:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre Channel Host Adapter (rev 30)

root@ubuntu18-04:~# ls /sys/class/fc_host/
host10 host11 host12 host13 host14 host15

root@ubuntu18-04:~# cat /sys/module/lpfc/version
0:12.0.0.0

Regards,
Vinay.

laurie barry (laurie-barry) wrote :

Most excellent, thank you Vinay.

Joseph - and thank you for your patience. Please include in your next errata kernel for 18.04. Do you have an ETA on when that might release so that I can inform my team.

thank you again
Laurie

Joseph Salisbury (jsalisbury) wrote :

SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-June/093013.html

I believe the current schedule for this cadence is June 14th.

description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
laurie barry (laurie-barry) wrote :

Is Canonical waiting for additional testing from Broadcom Emulex?

Please clarify; I thought we already validated the patch in comment #20?

thx
Laurie

Hi Laurie,

We ask the bug reporters to verify the fix again once the kernel is published on -proposed to make sure that the bug is really fixed with the kernel build that is going to be released and that no issue has been introduced when integrating the changes with all the other changes we are releasing.

So if you are able to test the fixes again we would really appreciate.

Thank you,
Kleber

laurie barry (laurie-barry) wrote :

Ok thank you for clarifying. I've asked Vinay to verify it in the kernel.

Laurie

laurie barry (laurie-barry) wrote :

Kleber,

Sorry to be obtuse, but where exactly is the download for the latest update kernel to 18.04? My test team and I can't seem to locate it.

thank you
Laurie

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers