ISST-LTE: system crashes at lpfc_sli4_scmd_to_wqidx_distr

Bug #1496989 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Taco Screen team

Bug Description

-- Problem Description --
We have Ubuntu 15.10 installed on our system and run stress test for around 24 hrs then it crashes at lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100

0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000000a0575a0]
    pc: d000000003115b30: lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]
    lr: d0000000030b749c: lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
    sp: c00000000a057820
   msr: 8000000100009033
   dar: 0
 dsisr: 40000000
  current = 0xc000000272dbbcf0
  paca = 0xc00000000e7f0000 softe: 0 irq_happened: 0x01
    pid = 246, comm = scsi_eh_0
0:mon> t
[c00000000a057850] d0000000030b749c lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
[c00000000a057890] d0000000030bf680 lpfc_sli_issue_iocb+0xf0/0x320 [lpfc]
[c00000000a0578f0] d0000000030c3804 lpfc_sli_issue_iocb_wait+0x264/0x680 [lpfc]
[c00000000a0579d0] d000000003110fd4 lpfc_send_taskmgmt+0x2d4/0x7d0 [lpfc]
[c00000000a057aa0] d000000003111bf4 lpfc_device_reset_handler+0x114/0x210 [lpfc]
[c00000000a057b60] c00000000071254c scsi_eh_ready_devs+0x68c/0xee0
[c00000000a057c50] c00000000071474c scsi_error_handler+0x6ac/0x9e0
[c00000000a057d80] c0000000000e1e20 kthread+0x110/0x130
[c00000000a057e30] c000000000009530 ret_from_kernel_thread+0x5c/0xac
0:mon> di d000000003115b30
d000000003115b30 e92a0000 ld r9,0(r10)
d000000003115b34 e9290000 ld r9,0(r9)
d000000003115b38 e92901a8 ld r9,424(r9)
d000000003115b3c 7928b7e3 rldicl. r8,r9,54,63
d000000003115b40 40820090 bne d000000003115bd0 # lpfc_sli4_scmd_to_wqidx_distr+0xd0/0x100 [lpfc]
d000000003115b44 813f0ae0 lwz r9,2784(r31)
d000000003115b48 2f890001 cmpwi cr7,r9,1
d000000003115b4c 419e0054 beq cr7,d000000003115ba0 # lpfc_sli4_scmd_to_wqidx_distr+0xa0/0x100 [lpfc]
d000000003115b50 395f0d58 addi r10,r31,3416
d000000003115b54 39200001 li r9,1
d000000003115b58 7c2004ac lwsync
d000000003115b5c 7c605028 lwarx r3,0,r10
d000000003115b60 7c691a14 add r3,r9,r3
d000000003115b64 7c60512d stwcx. r3,0,r10
d000000003115b68 40c2fff4 bne- d000000003115b5c # lpfc_sli4_scmd_to_wqidx_distr+0x5c/0x100 [lpfc]
d000000003115b6c 7c0004ac sync

0:mon> d c000000000ab00e0
c000000000ab00e0 4c696e7578207665 7273696f6e20342e |Linux version 4.|
c000000000ab00f0 322e302d372d6765 6e65726963202862 |2.0-7-generic (b|
c000000000ab0100 75696c6464406465 6e6e656564303429 |uildd@denneed04)|
c000000000ab0110 2028676363207665 7273696f6e20352e | (gcc version 5.|

lpfc_sli4_scmd_to_wqidx_distr() got moved around and changed a bit to lpfc_scsi.c with commit 8b0dff14164d3f43eba8365950b506d898e0e1e6 and the crash appears to be due to an invalid address of 0x0 for struct scsi_cmnd *cmnd

3860 int lpfc_sli4_scmd_to_wqidx_distr(struct lpfc_hba *phba,
3861 struct lpfc_scsi_buf *lpfc_cmd)
3862 {
3863 struct scsi_cmnd *cmnd = lpfc_cmd->pCmd;
3864 struct lpfc_vector_map_info *cpup;
3865 int chann, cpu;
3866 uint32_t tag;
3867 uint16_t hwq;
3868
3869 if (shost_use_blk_mq(cmnd->device->host)) {
3870 tag = blk_mq_unique_tag(cmnd->request);
3871 hwq = blk_mq_unique_tag_to_hwq(tag);
3872
3873 return hwq;
3874 }

0:mon> r
R00 = d0000000030b749c R16 = c00000000a057cd0
R01 = c00000000a057820 R17 = c00000000a057cb8
R02 = d000000003163d28 R18 = c00000000a52a088
R03 = c00000027e9fe000 R19 = c00000000a057cb0
R04 = c00000027139a400 R20 = 000000000000001e
R05 = c00000027139a470 R21 = 0000000000000001
R06 = 0000000000000001 R22 = c00000000180c268
R07 = d000000003163d28 R23 = c00000027139a470
R08 = d00000000310de90 R24 = c00000027139a400
R09 = 0000000000000004 R25 = c00000000a057978
R10 = 0000000000000000 R26 = 0000000000000001
R11 = d000000003137e20 R27 = 0000000000000000
R12 = 0000000028641824 R28 = c00000000a528000
R13 = c00000000e7f0000 R29 = c00000027e9fe000
R14 = c00000000a057cb8 R30 = c00000027139a400
R15 = 0000000000000000 R31 = c00000027e9fe000
pc = d000000003115b30 lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]
cfar= c000000000008468 slb_miss_realmode+0x50/0x78
lr = d0000000030b749c lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
msr = 8000000100009033 cr = 28648828
ctr = c000000000a95a70 xer = 0000000020000000 trap = 300
dar = 0000000000000000 dsisr = 40000000
0:mon> di $lpfc_sli4_scmd_to_wqidx_distr
d000000003115b00 3c4c0005 addis r2,r12,5
d000000003115b04 3842e228 addi r2,r2,-7640
d000000003115b08 7c0802a6 mflr r0
d000000003115b0c fbc1fff0 std r30,-16(r1)
d000000003115b10 fbe1fff8 std r31,-8(r1)
d000000003115b14 f8010010 std r0,16(r1)
d000000003115b18 f821ffd1 stdu r1,-48(r1)
d000000003115b1c 7c9e2378 mr r30,r4
d000000003115b20 7c7f1b78 mr r31,r3
d000000003115b24 48000008 b d000000003115b2c # lpfc_sli4_scmd_to_wqidx_distr+0x2c/0x100 [lpfc]
d000000003115b28 e8410018 ld r2,24(r1)
d000000003115b2c e95e0010 ld r10,16(r30)
d000000003115b30 e92a0000 ld r9,0(r10)
d000000003115b34 e9290000 ld r9,0(r9)
d000000003115b38 e92901a8 ld r9,424(r9)
d000000003115b3c 7928b7e3 rldicl. r8,r9,54,63
0:mon> d c00000027139a400
c00000027139a400 0001100000000000 0002200000000000 |.......... .....|
c00000027139a410 0000000000000000 a8649972020000c0 |.........d.r....|
c00000027139a420 3c00000000000000 0000000000000000 |<...............|
c00000027139a430 0000000000000000 0000000000000000 |................|
0:mon>

There were some I/O errors and failed paths before the kernel faults...

Revision history for this message
bugproxy (bugproxy) wrote : xmon log

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-130686 severity-high targetmilestone-inin1510
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1496989/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Luciano Chavez (lnx1138)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-09-25 21:29 EDT-------
Any update? We have 2 systems hitting this bug.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Several kernel updates have occurred since this bug report. Can you re-run your stress testing against the current 15.10 kernel version ?

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Tim,

Sorry, this has been closed as unreproducible on the LTC side, and not reflected here.

Marking this bug as Invalid.

Changed in linux (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.