ISST-LTE:pVM:monklp5:Ubuntu16.04.1:system crashed at lpfc_sli4_scmd_to_wqidx_distr

Bug #1597974 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Tim Gardner
Xenial
Fix Released
Undecided
Tim Gardner
Yakkety
Won't Fix
High
Tim Gardner

Bug Description

---Problem Description---
We have Ubuntu16.04.1 installed on our system and run DLPAR test for ZR1 adapter after some time it crashes at lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100

Machine Type = 9119-MME*1085AE7

---Debugger Data---
e:mon> e
cpu 0xe: Vector: 300 (Data Access) at [c0000003d45335a0]
    pc: d000000003a374e0: lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]
    lr: d0000000039d749c: lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
    sp: c0000003d4533820
   msr: 8000000100009033
   dar: 0
 dsisr: 40000000
  current = 0xc0000003e06c2a20
  paca = 0xc000000007af8500 softe: 0 irq_happened: 0x01
    pid = 17983, comm = scsi_eh_23
e:mon> r
R00 = d0000000039d749c R16 = 0000000000000000
R01 = c0000003d4533820 R17 = c0000003d4533cd0
R02 = d000000003a84160 R18 = c0000003d4533cb8
R03 = c0000003ee76a000 R19 = c0000003d87e5088
R04 = c0000003dad6a800 R20 = c0000003d4533cb0
R05 = c0000003dad6a870 R21 = 000000000000001e
R06 = 0000000000000001 R22 = c0000000018aab78
R07 = d000000003a84160 R23 = c0000003dad6a870
R08 = d000000003a2f830 R24 = c0000003dad6a800
R09 = 0000000000000004 R25 = c0000003d4533978
R10 = 0000000000000000 R26 = 0000000000000001
R11 = d000000003a59a50 R27 = 0000000000000000
R12 = 0000000028533824 R28 = c0000003e841e000
R13 = c000000007af8500 R29 = c0000003ee76a000
R14 = c0000003d87e5000 R30 = c0000003dad6a800
R15 = c0000003d4533cb8 R31 = c0000003ee76a000
pc = d000000003a374e0 lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]
cfar= c000000000008468 slb_miss_realmode+0x50/0x78
lr = d0000000039d749c lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
msr = 8000000100009033 cr = 28538828
ctr = c000000000ae3cf0 xer = 0000000020000010 trap = 300
dar = 0000000000000000 dsisr = 40000000
e:mon>

Stack trace output:
 e:mon> t
[c0000003d4533850] d0000000039d749c lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
[c0000003d4533890] d0000000039df680 lpfc_sli_issue_iocb+0xf0/0x330 [lpfc]
[c0000003d45338f0] d0000000039e3824 lpfc_sli_issue_iocb_wait+0x264/0x680 [lpfc]
[c0000003d45339d0] d000000003a32944 lpfc_send_taskmgmt+0x2d4/0x7d0 [lpfc]
[c0000003d4533aa0] d000000003a33564 lpfc_device_reset_handler+0x114/0x210 [lpfc]
[c0000003d4533b60] c00000000075843c scsi_eh_ready_devs+0x68c/0xee0
[c0000003d4533c50] c00000000075a91c scsi_error_handler+0x6bc/0x9e0
[c0000003d4533d80] c0000000000e61e0 kthread+0x110/0x130
[c0000003d4533e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
--- Exception: 0 at 0000000000000000
e:mon>

e:mon> dl
[10194.079284] sd 13:0:3:0: [sdaf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.079293] sd 13:0:3:0: [sdaf] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.079297] blk_update_request: I/O error, dev sdaf, sector 41942912
[10194.079313] device-mapper: multipath: Failing path 65:240.
[10194.079351] sd 13:0:2:0: [sdab] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.079360] sd 13:0:2:0: [sdab] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.079364] blk_update_request: I/O error, dev sdab, sector 41942912
[10194.079375] device-mapper: multipath: Failing path 65:176.
[10194.102832] scsi 13:0:1:0: alua: Detached
[10194.110320] sd 13:0:1:1: [sdh] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.110324] sd 13:0:1:1: [sdh] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.110326] blk_update_request: I/O error, dev sdh, sector 41942912
[10194.110334] device-mapper: multipath: Failing path 8:112.
[10194.110394] sd 13:0:2:1: [sdac] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.110398] sd 13:0:2:1: [sdac] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.110401] blk_update_request: I/O error, dev sdac, sector 41942912
[10194.110407] device-mapper: multipath: Failing path 65:192.
[10194.110439] sd 13:0:3:1: [sdag] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.110448] sd 13:0:3:1: [sdag] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.110452] blk_update_request: I/O error, dev sdag, sector 41942912
[10194.110464] device-mapper: multipath: Failing path 66:0.
[10194.118851] scsi 13:0:0:1: alua: Detached
[10194.122868] sd 13:0:3:0: [sdaf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.122879] sd 13:0:3:0: [sdaf] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.122887] blk_update_request: I/O error, dev sdaf, sector 41942912
[10194.122911] device-mapper: multipath: Failing path 65:240.
[10194.138865] scsi 13:0:2:0: alua: Detached
[10194.158852] scsi 13:0:3:0: alua: Detached
[10194.162199] sd 13:0:3:1: [sdag] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[10194.162204] sd 13:0:3:1: [sdag] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[10194.162207] blk_update_request: I/O error, dev sdag, sector 41942912
[10194.162216] device-mapper: multipath: Failing path 66:0.
[10194.162241] device-mapper: multipath: Failing path 65:192.
[10194.194835] scsi 13:0:1:1: alua: Detached
[10194.202301] device-mapper: multipath: Failing path 65:208.
[10194.202323] device-mapper: multipath: Failing path 8:128.
[10194.202359] device-mapper: multipath: Failing path 66:16.
[10194.218852] scsi 13:0:0:2: alua: Detached
[10194.222391] device-mapper: multipath: Failing path 66:0.
[10194.250830] scsi 13:0:2:1: alua: Detached
[10194.274829] scsi 13:0:3:1: alua: Detached
[10194.278436] device-mapper: multipath: Failing path 66:16.
[10194.278467] device-mapper: multipath: Failing path 65:208.
[10194.298817] scsi 13:0:1:2: alua: Detached
[10194.306356] device-mapper: multipath: Failing path 65:224.
[10194.306383] device-mapper: multipath: Failing path 65:160.
[10194.306424] device-mapper: multipath: Failing path 66:32.
[10194.334838] scsi 13:0:0:3: alua: Detached
[10194.338579] device-mapper: multipath: Failing path 66:16.
[10194.354934] scsi 13:0:2:2: alua: Detached
[10194.378850] scsi 13:0:3:2: alua: Detached
[10194.382605] device-mapper: multipath: Failing path 66:32.
[10194.382643] device-mapper: multipath: Failing path 65:224.
[10194.406826] scsi 13:0:1:3: alua: Detached
[10194.410973] device-mapper: multipath: Failing path 66:32.
[10194.434908] scsi 13:0:2:3: alua: Detached
[10194.462920] scsi 13:0:3:3: alua: Detached
[10194.587776] iommu: Removing device 0007:01:00.0 from group 0
[10204.593263] pci_bus 0007:01: busn_res: [bus 01-ff] is released
[10204.593333] rpadlpar_io: slot PHB 21 removed
[10849.383986] PCI host bridge /pci@800000020000015 ranges:
[10849.383991] MEM 0x00003fc600000000..0x00003fc67effffff -> 0x0000000080000000
[10849.383993] MEM 0x000030c000000000..0x000030cfffffffff -> 0x0003d0c000000000
[10849.389303] PCI: I/O resource not set for host bridge /pci@800000020000015 (domain 8)
[10849.389372] PCI host bridge to bus 0008:01
[10849.389378] pci_bus 0008:01: root bus resource [mem 0x3fc600000000-0x3fc67effffff] (bus address [0x80000000-0xfeffffff])
[10849.389384] pci_bus 0008:01: root bus resource [bus 01-ff]
[10849.394162] pci 0008:01:00.1: reg 0x160: [mem 0x00000000-0x0000ffff 64bit pref]
[10849.394165] pci 0008:01:00.1: VF(n) BAR0 space: [mem 0x00000000-0x0013ffff 64bit pref] (contains BAR0 for 20 VFs)
[10849.405662] pci 0008:01:00.0: reg 0x160: [mem 0x00000000-0x0000ffff 64bit pref]
[10849.405664] pci 0008:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x0013ffff 64bit pref] (contains BAR0 for 20 VFs)
[10849.491175] iommu: Adding device 0008:01:00.1 to group 0
[10849.491704] iommu: Adding device 0008:01:00.0 to group 0
[10849.492196] PIAR: overlapping address range
[10849.492198] PIAR: overlapping address range
[10849.492199] PIAR: overlapping address range
[10849.492199] PIAR: overlapping address range
[10849.492200] PIAR: overlapping address range
[10849.492441] lpfc 0008:01:00.1: enabling device (0140 -> 0142)
[10849.495406] lpfc 0008:01:00.1: ibm,query-pe-dma-windows(53) 10000 8000000 20000015 returned 0
[10849.542283] lpfc 0008:01:00.1: Using 64-bit direct DMA at offset 800000000000000
[10849.675139] scsi host14: Emulex LPe16000 16Gb PCIe Fibre Channel Adapter on PCI bus 01 device 01 irq 505
[10850.235317] lpfc 0008:01:00.0: enabling device (0140 -> 0142)
[10850.239152] lpfc 0008:01:00.0: Using 64-bit direct DMA at offset 800000000000000
[10850.399263] scsi host15: Emulex LPe16000 16Gb PCIe Fibre Channel Adapter on PCI bus 01 device 00 irq 504
[10850.959301] rpaphp: Slot [U78CA.001.CSS003P-P1-C6-C1] registered
[10850.959309] rpadlpar_io: slot PHB 21 added
[10851.847229] lpfc 0008:01:00.0: 1:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
[10852.026827] scsi 15:0:0:0: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.027527] sd 15:0:0:0: alua: supports implicit TPGS
[10852.027843] sd 15:0:0:0: alua: port group 00 rel port 230
[10852.027890] sd 15:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.028276] sd 15:0:0:0: alua: port group 00 state A preferred supports tolusnA
[10852.028425] sd 15:0:0:0: [sda] Write Protect is off
[10852.028431] sd 15:0:0:0: [sda] Mode Sense: f5 00 00 08
[10852.028455] sd 15:0:0:0: Attached scsi generic sg0 type 0
[10852.028711] sd 15:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.029804] scsi 15:0:0:1: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.030789] sd 15:0:0:1: alua: supports implicit TPGS
[10852.031484] sd 15:0:0:1: alua: port group 00 rel port 230
[10852.031522] sd 15:0:0:1: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.031994] sd 15:0:0:1: alua: port group 00 state A preferred supports tolusnA
[10852.032153] sd 15:0:0:1: Attached scsi generic sg1 type 0
[10852.032239] sd 15:0:0:1: [sdb] Write Protect is off
[10852.032246] sd 15:0:0:1: [sdb] Mode Sense: ed 00 00 08
[10852.032596] sd 15:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.033460] scsi 15:0:0:2: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.034530] sd 15:0:0:2: alua: supports implicit TPGS
[10852.034917] sd 15:0:0:2: alua: port group 00 rel port 230
[10852.035000] sd 15:0:0:2: [sdd] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.035294] sd 15:0:0:2: alua: port group 00 state A preferred supports tolusnA
[10852.035568] sd 15:0:0:2: Attached scsi generic sg2 type 0
[10852.036739] scsi 15:0:0:3: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.037441] sd 15:0:0:3: alua: supports implicit TPGS
[10852.037740] sd 15:0:0:3: alua: port group 00 rel port 230
[10852.037798] sd 15:0:0:3: [sde] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.038070] sd 15:0:0:3: alua: port group 00 state A preferred supports tolusnA
[10852.038234] sd 15:0:0:3: Attached scsi generic sg3 type 0
[10852.038349] sd 15:0:0:3: [sde] Write Protect is off
[10852.038355] sd 15:0:0:3: [sde] Mode Sense: ed 00 00 08
[10852.038683] sd 15:0:0:3: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.039314] scsi 15:0:1:0: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.039748] sdb:
[10852.040238] sd 15:0:1:0: alua: supports implicit TPGS
[10852.040632] sd 15:0:1:0: alua: port group 00 rel port 30
[10852.040708] sd 15:0:1:0: [sdg] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.041053] sd 15:0:1:0: alua: port group 00 state A preferred supports tolusnA
[10852.041481] sd 15:0:1:0: [sdg] Write Protect is off
[10852.041496] sd 15:0:1:0: [sdg] Mode Sense: f5 00 00 08
[10852.041550] sd 15:0:0:1: [sdb] Attached SCSI disk
[10852.041786] sd 15:0:1:0: [sdg] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.042390] sd 15:0:1:0: Attached scsi generic sg4 type 0
[10852.044049] scsi 15:0:1:1: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.044795] sd 15:0:1:1: alua: supports implicit TPGS
[10852.045180] sd 15:0:1:1: alua: port group 00 rel port 30
[10852.045226] sd 15:0:0:0: [sda] Attached SCSI disk
[10852.045313] sd 15:0:1:1: [sdh] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.045631] sd 15:0:1:1: alua: port group 00 state A preferred supports tolusnA
[10852.045730] sd 15:0:1:1: Attached scsi generic sg5 type 0
[10852.045942] sd 15:0:1:1: [sdh] Write Protect is off
[10852.045949] sd 15:0:1:1: [sdh] Mode Sense: ed 00 00 08
[10852.046318] sd 15:0:1:1: [sdh] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.046813] scsi 15:0:1:2: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.047808] sd 15:0:1:2: alua: supports implicit TPGS
[10852.048133] sd 15:0:1:2: alua: port group 00 rel port 30
[10852.048358] sd 15:0:1:2: [sdi] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.048520] sd 15:0:1:2: alua: port group 00 state A preferred supports tolusnA
[10852.048643] sd 15:0:1:2: Attached scsi generic sg6 type 0
[10852.049296] sdh:
[10852.049299] sd 15:0:1:2: [sdi] Write Protect is off
[10852.049308] sd 15:0:1:2: [sdi] Mode Sense: ed 00 00 08
[10852.049634] sd 15:0:1:2: [sdi] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.049667] scsi 15:0:1:3: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.050354] sd 15:0:1:3: alua: supports implicit TPGS
[10852.050853] sd 15:0:1:3: alua: port group 00 rel port 30
[10852.050943] sd 15:0:1:3: [sdaa] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.051214] sd 15:0:1:1: [sdh] Attached SCSI disk
[10852.051287] sd 15:0:1:3: alua: port group 00 state A preferred supports tolusnA
[10852.051426] sd 15:0:1:3: Attached scsi generic sg7 type 0
[10852.051646] sd 15:0:1:3: [sdaa] Write Protect is off
[10852.051656] sd 15:0:1:3: [sdaa] Mode Sense: ed 00 00 08
[10852.051967] sd 15:0:1:3: [sdaa] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.052323] scsi 15:0:2:0: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.052973] sd 15:0:2:0: alua: supports implicit TPGS
[10852.053314] sd 15:0:2:0: alua: port group 00 rel port 100
[10852.053406] sd 15:0:2:0: [sdab] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.053686] sd 15:0:2:0: alua: port group 00 state A preferred supports tolusnA
[10852.053892] sd 15:0:2:0: Attached scsi generic sg8 type 0
[10852.054069] sd 15:0:2:0: [sdab] Write Protect is off
[10852.054078] sd 15:0:2:0: [sdab] Mode Sense: f5 00 00 08
[10852.054391] sd 15:0:2:0: [sdab] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.055081] scsi 15:0:2:1: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.056159] sd 15:0:2:1: alua: supports implicit TPGS
[10852.056493] sd 15:0:2:1: alua: port group 00 rel port 100
[10852.056574] sd 15:0:2:1: [sdac] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.056997] sd 15:0:2:1: alua: port group 00 state A preferred supports tolusnA
[10852.057274] sd 15:0:2:1: [sdac] Write Protect is off
[10852.057280] sd 15:0:2:1: Attached scsi generic sg29 type 0
[10852.057290] sd 15:0:2:1: [sdac] Mode Sense: ed 00 00 08
[10852.057578] sd 15:0:2:1: [sdac] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.058491] scsi 15:0:2:2: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.059132] sde:
[10852.059173] sdaa:
[10852.060148] sd 15:0:2:2: alua: supports implicit TPGS
[10852.060723] sd 15:0:2:2: alua: port group 00 rel port 100
[10852.060814] sd 15:0:0:3: [sde] Attached SCSI disk
[10852.060858] sd 15:0:1:3: [sdaa] Attached SCSI disk
[10852.060942] sd 15:0:2:2: alua: rtpg failed with 8000002
[10852.061167] sdac:
[10852.061313] sd 15:0:2:2: [sdad] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.061363] sd 15:0:2:2: alua: port group 00 state A preferred supports tolusnA
[10852.061615] sd 15:0:2:2: Attached scsi generic sg30 type 0
[10852.062278] sd 15:0:2:2: [sdad] Write Protect is off
[10852.062291] sd 15:0:2:2: [sdad] Mode Sense: ed 00 00 08
[10852.062841] scsi 15:0:2:3: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.062902] sd 15:0:2:2: [sdad] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.063740] sd 15:0:2:1: [sdac] Attached SCSI disk
[10852.063965] sd 15:0:2:3: alua: supports implicit TPGS
[10852.064330] sd 15:0:2:0: [sdab] Attached SCSI disk
[10852.064437] sd 15:0:2:3: alua: port group 00 rel port 100
[10852.064507] sd 15:0:2:3: [sdae] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.064927] sd 15:0:2:3: alua: port group 00 state A preferred supports tolusnA
[10852.065231] sd 15:0:2:3: Attached scsi generic sg31 type 0
[10852.065348] sd 15:0:2:3: [sdae] Write Protect is off
[10852.065358] sd 15:0:2:3: [sdae] Mode Sense: ed 00 00 08
[10852.065859] sd 15:0:2:3: [sdae] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.065872] sdad:
[10852.065959] sdi:
[10852.066310] scsi 15:0:3:0: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.067721] sd 15:0:2:2: [sdad] Attached SCSI disk
[10852.067796] sd 15:0:1:2: [sdi] Attached SCSI disk
[10852.067876] sd 15:0:3:0: alua: supports implicit TPGS
[10852.068387] sd 15:0:3:0: [sdaf] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.068426] sd 15:0:3:0: alua: port group 00 rel port 300
[10852.068904] sd 15:0:3:0: alua: port group 00 state A preferred supports tolusnA
[10852.069151] sdae:
[10852.069204] sd 15:0:3:0: Attached scsi generic sg32 type 0
[10852.069657] sd 15:0:3:0: [sdaf] Write Protect is off
[10852.069664] sd 15:0:3:0: [sdaf] Mode Sense: f5 00 00 08
[10852.070344] sd 15:0:3:0: [sdaf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.070954] sd 15:0:2:3: [sdae] Attached SCSI disk
[10852.074942] sd 15:0:3:0: [sdaf] Attached SCSI disk
[10852.074954] scsi 15:0:3:1: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.076026] sd 15:0:3:1: alua: supports implicit TPGS
[10852.076714] sd 15:0:3:1: alua: port group 00 rel port 300
[10852.076745] sd 15:0:3:1: [sdag] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.077546] sd 15:0:3:1: alua: port group 00 state A preferred supports tolusnA
[10852.077885] sd 15:0:3:1: Attached scsi generic sg33 type 0
[10852.078100] sd 15:0:3:1: [sdag] Write Protect is off
[10852.078109] sd 15:0:3:1: [sdag] Mode Sense: ed 00 00 08
[10852.078403] sd 15:0:3:1: [sdag] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.080875] sdag:
[10852.083603] sd 15:0:3:1: [sdag] Attached SCSI disk
[10852.086470] scsi 15:0:3:2: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.087290] sd 15:0:3:2: alua: supports implicit TPGS
[10852.087630] sd 15:0:3:2: alua: port group 00 rel port 300
[10852.087790] sd 15:0:3:2: [sdah] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.087984] sd 15:0:3:2: alua: port group 00 state A preferred supports tolusnA
[10852.088145] sd 15:0:3:2: Attached scsi generic sg34 type 0
[10852.088392] sd 15:0:3:2: [sdah] Write Protect is off
[10852.088411] sd 15:0:3:2: [sdah] Mode Sense: ed 00 00 08
[10852.088687] sd 15:0:3:2: [sdah] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[10852.089078] scsi 15:0:3:3: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[10852.089911] sd 15:0:3:3: alua: supports implicit TPGS
[10852.090323] sd 15:0:3:3: alua: port group 00 rel port 300
[10852.090360] sd 15:0:3:3: [sdai] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[10852.090737] sd 15:0:3:3: alua: port group 00 state A preferred supports tolusnA
[10852.091050] sd 15:0:3:3: Attached scsi gen0
[12126.399029] scsi host17: Emulex LPe16000 16Gb PCIe Fibre Channel Adapter on PCI bus 01 device 00 irq 504
[12126.959169] rpaphp: Slot [U78CA.001.CSS003P-P1-C6-C1] registered
[12126.959176] rpadlpar_io: slot PHB 21 added
[12127.844158] lpfc 0009:01:00.0: 1:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
[12128.043085] scsi 17:0:0:0: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[12128.043688] sd 17:0:0:0: alua: supports implicit TPGS
[12128.043969] sd 17:0:0:0: alua: port group 00 rel port 300
[12128.044190] sd 17:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[12128.044311] sd 17:0:0:0: alua: port group 00 state A preferred supports tolusnA
[12128.044430] sd 17:0:0:0: Attached scsi generic sg0 type 0
[12128.044896] sd 17:0:0:0: [sda] Write Protect is off
[12128.044903] sd 17:0:0:0: [sda] Mode Sense: f5 00 00 08
[12128.045179] sd 17:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or [sdaf] Mode Sense: f5 00 00 08
[12128.088966] sd 17:0:3:0: Attached scsi generic sg32 type 0
[12128.089258] sd 17:0:3:0: [sdaf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[12128.089877] sd 17:0:2:3: [sdae] Attached SCSI disk
[12128.090349] scsi 17:0:3:1: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[12128.092029] sd 17:0:3:1: alua: supports implicit TPGS
[12128.092495] sd 17:0:3:1: alua: port group 00 rel port 30
[12128.092548] sd 17:0:3:1: [sdag] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[12128.092885] sd 17:0:3:1: alua: port group 00 state A preferred supports tolusnA
[12128.093103] sd 17:0:3:1: Attached scsi generic sg33 type 0
[12128.093172] sd 17:0:3:1: [sdag] Write Protect is off
[12128.093183] sd 17:0:3:1: [sdag] Mode Sense: ed 00 00 08
[12128.093503] sd 17:0:3:1: [sdag] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[12128.094183] scsi 17:0:3:2: Direct-Access IBM th 65:224.
[12749.073922] device-mapper: multipath: Failing path 66:32.
[12749.073953] device-mapper: multipath: Failing path 65:160.
[12749.090309] scsi 17:0:0:3: alua: Detached
[12749.094400] device-mapper: multipath: Failing path 66:16.
[12749.118398] scsi 17:0:2:2: alua: Detached
[12749.138317] scsi 17:0:3:2: alua: Detached
[12749.141437] device-mapper: multipath: Failing path 65:224.
[12749.141467] device-mapper: multipath: Failing path 66:32.
[12749.162316] scsi 17:0:1:3: alua: Detached
[12749.165448] device-mapper: multipath: Failing path 66:32.
[12749.202489] scsi 17:0:2:3: alua: Detached
[12749.238436] scsi 17:0:3:3: alua: Detached
[12749.371720] iommu: Removing device 0009:01:00.0 from group 0
[12759.378488] pci_bus 0009:01: busn_res: [bus 01-ff] is released
[12759.378559] rpadlpar_io: slot PHB 21 removed
[13405.725246] PCI host bridge /pci@800000020000015 ranges:
[13405.725253] MEM 0x00003fc600000000..0x00003fc67effffff -> 0x0000000080000000 l port 100
[13408.460796] sd 19:0:2:1: [sdac] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[13408.461979] sd 19:0:2:1: [sdac] Write Protect is off
[13408.461987] sd 19:0:2:1: [sdac] Mode Sense: ed 00 00 08
[13408.462292] sd 19:0:2:1: [sdac] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[13408.463890] sd 19:0:2:0: [sdab] Attached SCSI disk
[13408.465140] sdac:
[13408.467041] sd 19:0:2:1: [sdac] Attached SCSI disk
[13408.569203] sdd:
[13408.569287] sdi:
[13408.570556] sd 19:0:0:2: [sdd] Attached SCSI disk
[13408.570631] sd 19:0:1:2: [sdi] Attached SCSI disk
[13438.697588] sd 19:0:0:1: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[13439.326030] rport-19:0-9: blocked FC remote port time out: removing rport
[13453.426174] sd 19:0:0:1: [sdb] Write Protect is off
[13453.426184] sd 19:0:0:1: [sdb] Mode Sense: ed 00 00 08
[13453.426711] sd 19:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[14683.037860] scsi 21:0:0:2: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[14683.039069] sd 21:0:0:2: alua: supports implicit TPGS
[14683.039472] sd 21:0:0:2: alua: port group 00 rel port 300
[14683.039557] sd 21:0:0:2: [sdd] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[14683.039882] sd 21:0:0:2: alua: port group 00 state A preferred supports tolusnA
[14683.040143] sd 21:0:0:2: Attached scsi generic sg2 type 0
[14683.040221] sd 21:0:0:2: [sdd] Write Protect is off
[14683.040231] sd 21:0:0:2: [sdd] Mode Sense: ed 00 00 08
[14683.040517] sd 21:0:0:2: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[14683.041295] scsi 21:0:0:3: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[14683.042497] sd 21:0:0:3: alua: supports implicit TPGS
[14683.042870] sd 21:0:0:3: [sde] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[1468er: multipath: Failing path 8:16.
[15299.302590] iommu: Removing device 000b:01:00.1 from group 0
[15299.317740] scsi 21:0:0:0: alua: Detached
[15299.325260] sd 21:0:2:0: [sdab] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[15299.325265] sd 21:0:2:0: [sdab] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[15299.325267] blk_update_request: I/O error, dev sdab, sector 41942912
[15299.325275] device-mapper: multipath: Failing path 65:176.
[15299.325296] sd 21:0:3:0: [sdae] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[15299.325298] sd 21:0:3:0: [sdae] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
[15299.325301] blk_update_request: I/O error, dev sdae, sector 41942912
[15299.325309] device-mapper: multipath: Failing path 65:224.
[15299.353733] scsi 21:0:1:0: alua: Detached
[15299.361265] sd 21:0:3:1: [sdaf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[15299.361269] sd 21:0:3:1: [sdaf] tag#0 CDB958.559922] scsi 23:0:1:1: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[15958.560721] sd 23:0:1:1: alua: supports implicit TPGS
[15958.561066] sd 23:0:1:1: alua: port group 00 rel port 100
[15958.561092] sd 23:0:1:1: [sdh] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[15958.561697] sd 23:0:1:1: alua: port group 00 state A preferred supports tolusnA
[15958.561880] sd 23:0:1:1: Attached scsi generic sg5 type 0
[15958.561951] sd 23:0:1:1: [sdh] Write Protect is off
[15958.561958] sd 23:0:1:1: [sdh] Mode Sense: ed 00 00 08
[15958.562239] sd 23:0:1:1: [sdh] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[15958.562942] scsi 23:0:1:2: Direct-Access IBM 2107900 .149 PQ: 0 ANSI: 5
[15958.563577] sd 23:0:1:0: [sdg] Attached SCSI disk
[15958.563921] sd 23:0:1:2: alua: supports implicit TPGS
[15958.564270] sd 23:0:1:2: alua: port group 00 rel port 100
[15958.564443] sd 23:0:1:2: [sdi] 41943040 512-byte
d000000003a37530 7c6307b4 extsw r3,r3
d000000003a37534 38210030 addi r1,r1,48
d000000003a37538 e8010010 ld r0,16(r1)
d000000003a3753c ebc1fff0 ld r30,-16(r1)
d000000003a37540 ebe1fff8 ld r31,-8(r1)
d000000003a37544 7c0803a6 mtlr r0
d000000003a37548 4e800020 blr
d000000003a3754c 60420000 ori r2,r2,0
d000000003a37550 813f0b14 lwz r9,2836(r31)
d000000003a37554 2b890001 cmplwi cr7,r9,1
d000000003a37558 409dffa8 ble cr7,d000000003a37500 # lpfc_sli4_scmd_to_wqidx_distr+0x50/0x100 [lpfc]
d000000003a3755c a14d0008 lhz r10,8(r13)
d000000003a37560 a13f0572 lhz r9,1394(r31)
d000000003a37564 7f895000 cmpw cr7,r9,r10
d000000003a37568 409dff98 ble cr7,d000000003a37500 # lpfc_sli4_scmd_to_wqidx_distr+0x50/0x100 [lpfc]
d000000003a3756c e93f0568 ld r9,1384(r31)
e:mon>

---Steps to Reproduce---
 cd /kte/tools
./setup dlar
cd /kte/tools/dlpar
./start.dlpar -d 0

Doing some analysis without a crashdump, as it's taking too long.

This looks like a NULL pointer dereference,
if my assembly reading/matching to C is correct.

Would need to understand why/how this
'(struct scsi_cmnd *cmnd)->device' field is NULL.

Analysis
--------

From xmon:
 pc: <...>: lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]

 R10 = 0000000000000000 R26 = 0000000000000001

From 'objdump -d /usr/lib/debug/<...>/lpfc.ko',

 lpfc_sli4_scmd_to_wqidx_distr = 674b0
 lpfc_sli4_scmd_to_wqidx_distr+0x30 = 674e0 (crash)

 and

 00000000000674b0 <lpfc_sli4_scmd_to_wqidx_distr>:
 <...>
    674cc: 78 23 9e 7c mr r30,r4
    674d0: 78 1b 7f 7c mr r31,r3
 <...>
    674dc: 10 00 5e e9 ld r10,16(r30)
    674e0: 00 00 2a e9 ld r9,0(r10) <<-- (crash)
    674e4: 00 00 29 e9 ld r9,0(r9)
 <...>

This is the relevant snippet of code:

From Ubuntu 16.04 kernel 4.4.0-22.40 [1]

 int lpfc_sli4_scmd_to_wqidx_distr(struct lpfc_hba *phba,
       struct lpfc_scsi_buf *lpfc_cmd)
 {
  struct scsi_cmnd *cmnd = lpfc_cmd->pCmd;
 <...>
  if (shost_use_blk_mq(cmnd->device->host)) {
 <...>

So, back to the assembly, this seems the 2 function parameters
passed by register (r4, r3), loaded into other registers (r30, r31).

    674cc: 78 23 9e 7c mr r30,r4
    674d0: 78 1b 7f 7c mr r31,r3

Per the load below, r10 is *cmnd, and r30 is *lpfc_cmd; it loads
lpfc_cmd->pCmd, which has offset 16 bytes into struct lpfc_cmd [2]
(after 2 pointers * 8-bytes each, from list_head list [3])

    674dc: 10 00 5e e9 ld r10,16(r30)

 struct lpfc_scsi_buf {
  struct list_head list;
  struct scsi_cmnd *pCmd;
 <...>

 struct list_head {
  struct list_head *next, *prev;
 };

And the load below hits the crash, because it dereferences r10 (*cmnd) which is zero:

    674e0: 00 00 2a e9 ld r9,0(r10) <<-- (crash)

 From xmon:

 R10 = 0000000000000000 R26 = 0000000000000001

That deference was for cmnd->device; you can see the load instruction immediately
afterward would further dereference the cmnd->device pointer, for the device->host field,
which has offset 0 into struct scsi_device [4]:
 --- this is a confirmantion that the assembly/C matching looks correct.

  if (shost_use_blk_mq(cmnd->device->host)) {

 struct scsi_device {
  struct Scsi_Host *host;
 <...>

[1] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/drivers/scsi/lpfc/lpfc_scsi.c?h=Ubuntu-4.4.0-22.40
[2] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/drivers/scsi/lpfc/lpfc_scsi.h?h=Ubuntu-4.4.0-22.40#n130
[3] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/include/linux/types.h?h=Ubuntu-4.4.0-22.40#n185
[4] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/include/scsi/scsi_device.h?h=Ubuntu-4.4.0-22.40#n77

One detail missing..

(In reply to comment #16)
> And the load below hits the crash, because it dereferences r10 (*cmnd) which
> is zero:
>
> 674e0: 00 00 2a e9 ld r9,0(r10) <<-- (crash)
>
> From xmon:
>
> R10 = 0000000000000000 R26 = 0000000000000001
>
> That deference was for cmnd->device; [snip]

which has offset zero into struct scsi_cmnd [5]

        if (shost_use_blk_mq(cmnd->device->host)) {

        struct scsi_cmnd {
         struct scsi_device *device;
        <...>

[5] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/include/scsi/scsi_cmnd.h?h=Ubuntu-4.4.0-22.40#n59

> Would need to understand why/how this
> '(struct scsi_cmnd *cmnd)->device' field is NULL.

Checking this again today, it occurred to me that the problem is actually *cmnd == NULL, and dereferencing *cmnd (for cmnd->device) hits the crash.

Fix submitted upstream:
    http://marc.info/?l=linux-scsi&m=146534119707379&w=2

I didn't provide a test kernel because the system was running regression tests over weekend, and it takes long to reproduce the problem w/ DLPAR operations -- but the same problem could be reproduced w/ simpler test-cases (see commit), so I worked it in the background.

If you keep hitting this, let me know and I'll provide a test kernel.

Hi Mauricio,
Installed the fix and verified. Now am not finding the issue.

Hi Canonical,

Can you consider picking up this fix that has not yet made the upstream kernel?

It's fairly obvious and trivial, very documented in the commit message (w/ test-cases), and has been tested successfully here in IBM (also see commit msg).

http://marc.info/?l=linux-scsi&m=146534119707379&w=2

The adapter vendor's team has not yet reviewed it on the mailing list (and no other patches for lpfc), so I guess it'll take some time until this makes in.

Is that possible?

Thanks

Mauricio

bugproxy (bugproxy)
tags: added: architecture-ppc64 bugnameltc-141959 severity-high targetmilestone-inin16041
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-07-21 19:03 EDT-------
Hi Canonical,

The vendor's developer reviewed the patch -- "Looks good."
Can it be applied?

Thanks

http://marc.info/?l=linux-scsi&m=146913719625306&w=2

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Stefan Bader (smb)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Thanks for marking this bug as verified.

It looks good -- I've checked the patch is there, and change is trivial, but couldn't get a response from the test team in time. Sorry for that.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.4 KiB)

This bug was fixed in the package linux - 4.4.0-36.55

---------------
linux (4.4.0-36.55) xenial; urgency=low

  [ Stefan Bader ]

  * Release Tracking Bug
    - LP: #1612305

  * I2C touchpad does not work on AMD platform (LP: #1612006)
    - SAUCE: pinctrl/amd: Remove the default de-bounce time

  * CVE-2016-5696
    - tcp: make challenge acks less predictable

linux (4.4.0-35.54) xenial; urgency=low

  [ Stefan Bader ]

  * Release Tracking Bug
    - LP: #1611215

  * [i915_bpo] Sync with v4.7 (LP: #1609742)
    - SAUCE: i915_bpo: Sync with v4.7

  * s390/cio: fix reset of channel measurement block (LP: #1609415)
    - s390/cio: allow to reset channel measurement block

  * in Ubuntu16.10: Hit on Call traces and system goes down when transactional
    memory tests are running in 32TB Brazos system (LP: #1606786)
    - powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
    - powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint()

  * Power Menu does not display after press the Power Button (LP: #1609204)
    - intel-vbtn: new driver for Intel Virtual Button
    - [config] enable CONFIG_INTEL_VBTN=m

  * OptiPlex 7450 AIO hangs when rebooting (LP: #1608762)
    - x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk

  * virtualbox+usb 3.0 breaks boot, -28 kernel works (LP: #1604058)
    - SAUCE: xhci: Fix soft lockup in xhci_pci_probe path when XHCI_STATE_HALTED

  * linux-kernel: Freeing IRQ from IRQ context (LP: #1597908)
    - block: defer timeouts to a workqueue

  * Tunnel offload indications not stripped from encapsulated packets, causing
    performance overhead (LP: #1602755)
    - tunnels: Remove encapsulation offloads on decap.

  * lm-sensors is throwing "ERROR: Can't get value of subfeature temp1_input:
    I/O error" for be2net driver (LP: #1607387)
    - be2net: perform temperature query in adapter regardless of its interface
      state

  * Dell dock MAC Address pass through doesn't work in Ubuntu (LP: #1579984)
    - r8152: Add support for setting pass through MAC address on RTL8153-AD

  * vmxnet3 LRO IPv6 performance issues (stalling TCP) (LP: #1605494)
    - Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets

  * ISST-LTE:pVM:monklp5:Ubuntu16.04.1:system crashed at
    lpfc_sli4_scmd_to_wqidx_distr (LP: #1597974)
    - SAUCE: lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from
      lpfc_send_taskmgmt()

  * Backport cxlflash shutdown patch to Xenial SRU (LP: #1605405)
    - SAUCE: cxlflash: Verify problem state area is mapped before notifying
      shutdown

  * Xenial update to v4.4.16 stable release (LP: #1607404)
    - mac80211: fix fast_tx header alignment
    - mac80211: mesh: flush mesh paths unconditionally
    - mac80211_hwsim: Add missing check for HWSIM_ATTR_SIGNAL
    - mac80211: Fix mesh estab_plinks counting in STA removal case
    - EDAC, sb_edac: Fix rank lookup on Broadwell
    - IB/cm: Fix a recently introduced locking bug
    - IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs
    - powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
    - powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
    - usb: dwc2: fix reg...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie yakkety. The bug task representing the yakkety nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Won't Fix
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.