Vport creation leads to out of memory and server hung on Ubuntu 18.04.3 (5.0) on Broadcom FC HBAs

Bug #1858840 reported by Laurie Barry
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Jeff,

We have analyzed this issue and see that there are 3 missing patches required to fix this problem. Otherwise vport creation >3 (depending on the customers configuration) will result in a hang for FC or NVMe/FC adapters.

Please pull these changes into the next hardware release.

These are the missing commits:
959239d [scsi] scsi: core: avoid pre-allocating big SGL for data
5418f2f [scsi] scsi: core: avoid pre-allocating big SGL for protection information
250f285 [nvme] scsi: lib/sg_pool.c: improve APIs for allocating sg pool

Laurie
---------------------------------------------
CONFIGURATION DETAILS
Host OS with Support Pack - Ubuntu 18.04.3 - HWE ()
Guest/VM OS Details -
System(s) Under Test - IBM x3650 M4
Adapter(s) Under Test - Prism 1-port
IPL Name -
Active Profile ID -
Network Configuration -
SAN Configuration -
OneCapture file attached -

BUG REPRODUCTION DETAILS
Test Case ID or ATID -
Reproducibility of Bug - Always
Last Known Working Build -
Time to Reproduce Bug - 5 mins
Steps To Reproduce Bug -

Create 126 vports on Prism adapter, observe that server goes out of memory and
hung.

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 1.7G 13G 1.3M 299M 13G
Swap: 2.0G 0B 2.0G

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 2.1G 12G 1.3M 299M 12G
Swap: 2.0G 0B 2.0G

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 6.1G 8.7G 1.5M 301M 8.7G
Swap: 2.0G 0B 2.0G

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 13G 1.4G 1.6M 303M 1.4G
Swap: 2.0G 0B 2.0G

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 14G 523M 1.7M 303M 528M
Swap: 2.0G 0B 2.0G

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 14G 147M 96K 48M 5.0M
Swap: 2.0G 38M 2.0G

root@ubuntu18043:~# free -h
              total used free shared buff/cache available
Mem: 15G 14G 145M 96K 49M 3.6M
Swap: 2.0G 38M 2.0G

Nov 28 05:37:29 ubuntu18043 kernel: [ 1523.166689] scsi host112: Emulex
LPe36000 32Gb PCIe Fibre Channel Adapter on PCI bus 11 device 00 irq 26 port 0
Logical Link Speed: 8000 Mbps PCI resettable
Nov 28 05:37:29 ubuntu18043 kernel: [ 1523.233464] lpfc 0000:11:00.0:
0:(111):1825 Vport Created.
Nov 28 05:37:29 ubuntu18043 kernel: [ 1523.234071] scsi host1: vport-1:0-110
created via shost1 channel 0
Nov 28 05:37:29 ubuntu18043 kernel: [ 1523.698396] scsi host113: Emulex
LPe36000 32Gb PCIe Fibre Channel Adapter on PCI bus 11 device 00 irq 26 port 0
Logical Link Speed: 8000 Mbps PCI resettable
Nov 28 05:37:30 ubuntu18043 kernel: [ 1523.862582] lpfc 0000:11:00.0:
0:(112):1825 Vport Created.
Nov 28 05:37:30 ubuntu18043 kernel: [ 1523.863142] scsi host1: vport-1:0-111
created via shost1 channel 0
Nov 28 05:37:31 ubuntu18043 kernel: [ 1525.097335] scsi host114: Emulex
LPe36000 32Gb PCIe Fibre Channel Adapter on PCI bus 11 device 00 irq 26 port 0
Logical Link Speed: 8000 Mbps PCI resettable

After creating 111 vports, server ran out of memory.

As server is in hung state, no Crash dump got collected.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1858840

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Jeff Lane  (bladernr)
summary: Vport creation leads to out of memory and server hung on Ubuntu 18.04.3
- on Broadcom FC HBAs
+ (5.3) on Broadcom FC HBAs
summary: Vport creation leads to out of memory and server hung on Ubuntu 18.04.3
- (5.3) on Broadcom FC HBAs
+ (5.0) on Broadcom FC HBAs
Revision history for this message
Jeff Lane  (bladernr) wrote :

Patches are in 5.3 via f7563f743d7081710a9d186a8b203997d09f383

and should be in linux-image-generic-hwe-18.04-edge

Asked to have testers verify that edge fixes the issue.

Revision history for this message
Jeff Lane  (bladernr) wrote :

errr, that should be 1f7563f743d7081710a9d186a8b203997d09f383

Revision history for this message
Jeff Lane  (bladernr) wrote :

Discussed this with OP, and as the patches are already in our 5.3 branch and thus will land in 18.04.4 shortly, there is nothing needed to do here, so marking invalid.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Laurie Barry (laurie-barry-4) wrote :

Jeff,

We have a few more patches required to bring out driver up to date with upstream. Looks like there's 1 additional kernel patch we'll need you to integrate before getting us a pointer to a kernel we can then apply our lpfc driver patches to. See the additional patches we need to bring us up to date since we first filed this bug.

thx
Laurie

Here is the list of additional patch commit ids that would be used to update from 12.6.0.2 (end of list in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855303) to 12.6.0.3 and 12.6.0.4 and to add FPIN/mpio support:

scsi: lpfc: Fix incomplete NVME discovery when target
commit 3ec5ec408ca633e17d3483e8b806c9fa447e19cc

scsi: lpfc: Fix: Rework setting of fdmi symbolic node name registration
commit d4e9ddd5ae8f26c521692283c7cbedc7e9881529

scsi: lpfc: Fix missing check for CSF in Write Object Mbox Rsp
commit a0c94c5ef874b214305cca965acc7edc27bb69a6

scsi: lpfc: Fix Fabric hostname registration if system hostname changes
commit 3712967ea7f2d3793ae2a39e9eae2df111d770b5

scsi: lpfc: Fix ras_log via debugfs
commit 25d4132f95c25e6f78a94666a1e067291272888a

scsi: lpfc: Fix disablement of FC-AL on lpe35000 models
commit 265fb8efca11eee1ffe8edd2bc0a9fbf31b6b84e

scsi: lpfc: Fix unmap of dpp bars affecting next driver load
commit b88d705fa037b222ecdd5de867132ae25403fe48

scsi: lpfc: Fix MDS Latency Diagnostics Err-drop rates
commit 78a7872570fbeed44e7362354619b403c6e6de1e

scsi: lpfc: Fix improper flag check for IO type
commit f44ccecf36581a2030313e979127013f9d455d3a

scsi: lpfc: Update lpfc version to 12.6.0.3
commit e627554eedcfac83bb2ba073adf711a13049273a

----

scsi: lpfc: Fix RQ buffer leakage when no IOCBs available
commit 9c75a0dee87b2bad80d381ad9fa8ef90847d3a1d

scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error path
commit 89cc9dba63c91a6c796dc46342c1d57c82c9b0cd

scsi: lpfc: Fix broken Credit Recovery after driver load
commit 692fc8380ca0f9185e75e01a74357d2cf3083743

scsi: lpfc: Fix registration of ELS type support in fdmi
commit 0f74f70cb674e3b8712e896863ad85db4d193d46

scsi: lpfc: Fix release of hwq to clear the eq relationship
commit 01034e708a353fed29963d084c35d58785a317a6

scsi: lpfc: Fix compiler warning on frame size
commit 3cd50eac891db354e945bf9898726e163a19560e

scsi: lpfc: Fix coverity errors in fmdi attribute handling
commit 3ad04f4f0ad75ad4e7d886133f673d9fe20aa9c4

scsi: lpfc: Remove handler for obsolete ELS - Read Port Status (RPS)
commit 8d9fae72109e9921f52be342007b4c78490ea4fe

scsi: lpfc: Clean up hba max_lun_queue_depth checks
commit 20d674a4bf64acf54eadc7215ea32b88a3a7687e

scsi: lpfc: Update lpfc version to 12.6.0.4
commit 35817310d9e05e666dbd242750008dad33ed8992

scsi: lpfc: Copyright updates for 12.6.0.4 patches
commit fd6cc30f341fea14bac45bf1c5c9d10702c18a9d

---

This is the one we need you to add --> scsi: fc: Update Descriptor definition and add RDF and Link Integrity FPINs
commit 73ec6d2748dc35db2b32cf3c182a27c4a0837b9b

scsi: lpfc: add RDF registration and Link Integrity FPIN logging
commit df3fe76658ed47617741819a501e2bd2ae446962

Revision history for this message
Laurie Barry (laurie-barry-4) wrote :

Ignore comment #5, that was intended for a different bug and added in error.

Laurie

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.