iscsitarget causing kernel lockup

Bug #1078398 reported by Gavin Haslett
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
iscsitarget (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Under heavy load, ISCSITARGET on Ubuntu 12.04 LTS is crashing repeatedly. The syslog contains;

Nov 13 11:05:38 iscsi0 kernel: [ 3330.568639] ------------[ cut here ]------------
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568669] kernel BUG at /var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/iscsi.c:1053!
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568708] invalid opcode: 0000 [#1] SMP
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568735] CPU 0
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568747] Modules linked in: iscsi_trgt(O) xen_fbfront joydev mac_hid fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront lp parport [last unloaded: iscsi_trgt]
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568861]
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568873] Pid: 1464, comm: istd1 Tainted: G O 3.2.0-32-generic #51-Ubuntu
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568917] RIP: e030:[<ffffffffa00668be>] [<ffffffffa00668be>] cmnd_rx_start+0x6ae/0xab0 [iscsi_trgt]
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568968] RSP: e02b:ffff88003b0a1d10 EFLAGS: 00010286
Nov 13 11:05:38 iscsi0 kernel: [ 3330.568995] RAX: 0000000000000000 RBX: ffff8800207c5618 RCX: 0000000000000001
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569030] RDX: 0000000000000000 RSI: ffff88003b0a1f58 RDI: ffff88003b0a1f58
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569065] RBP: ffff88003b0a1d90 R08: 000000000000000a R09: 0000000000000000
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569100] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880003e72000
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569134] R13: ffff880003e70000 R14: 0000000000000000 R15: ffff880004188578
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569177] FS: 00007f08709347c0(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569217] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569246] CR2: 00007f7d186b0000 CR3: 000000003cf25000 CR4: 0000000000002660
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569315] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569350] Process istd1 (pid: 1464, threadinfo ffff88003b0a0000, task ffff88003a7b4500)
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569389] Stack:
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569401] 0000000000000138 0000000000000138 ffff88003b0a1d30 ffff880000000000
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569447] ffff88003b0a1910 0000000000000001 ffff88003b0a1d50 ffffffff8165ae3e
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569494] 0000047c05a95a00 00000000a0064673 ffff88003b0a1d80 ffff880003e72000
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569540] Call Trace:
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569557] [<ffffffff8165ae3e>] ? _raw_spin_lock+0xe/0x20
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569588] [<ffffffffa006779b>] istd+0x30b/0x1470 [iscsi_trgt]
Nov 13 11:05:38 iscsi0 kernel: [ 3330.569620] [<ffffffff815aa4e0>] ? inet_sendmsg+0xb0/0xb0
Nov 13 11:05:38 iscsi0 kernel: [ 3330.570514] [<ffffffff8104c398>] ? __wake_up_common+0x58/0x90
Nov 13 11:05:38 iscsi0 kernel: [ 3330.571409] [<ffffffff8165b04e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572314] [<ffffffffa0067490>] ? nthread_wakeup+0x50/0x50 [iscsi_trgt]
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] [<ffffffff8108a0dc>] kthread+0x8c/0xa0
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] [<ffffffff816655b4>] kernel_thread_helper+0x4/0x10
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] [<ffffffff81663663>] ? int_ret_from_sys_call+0x7/0x1b
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] [<ffffffff8165b33c>] ? retint_restore_args+0x5/0x6
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] [<ffffffff816655b0>] ? gs_change+0x13/0x13
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] Code: 89 b9 fa ff ff 48 c7 c1 84 40 07 a0 ba 1d 04 00 00 48 c7 c6 d8 27 07 a0 48 c7 c7 10 28 07 a0 31 c0 e8 f1 b9 5d e1 e8 37 8d 5d e1 <0f> 0b 80 7b 39 00 0f 89 01 02 00 00 44 8b 5b 74 45 85 db 0f 85
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] RIP [<ffffffffa00668be>] cmnd_rx_start+0x6ae/0xab0 [iscsi_trgt]
Nov 13 11:05:38 iscsi0 kernel: [ 3330.572320] RSP <ffff88003b0a1d10>
Nov 13 11:05:38 iscsi0 kernel: [ 3330.583226] ---[ end trace 37722237d92cd12b ]---

I have this error on both a physical box (2x 4-core CPU, 128GB RAM) and a virtual machine (1 vcpu, 1GB RAM). Have tried both blockio and fileio modes but the problem persists. Have tried changing RAM and CPU sizing to no avail.

The problem is seemingly random but usually occurs under a decent load.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: iscsitarget 1.4.20.2-5ubuntu3.1
ProcVersionSignature: Ubuntu 3.2.0-32.51-generic 3.2.30
Uname: Linux 3.2.0-32-generic x86_64
ApportVersion: 2.0.1-0ubuntu14
Architecture: amd64
Date: Tue Nov 13 11:45:52 2012
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: iscsitarget
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.default.iscsitarget: ISCSITARGET_ENABLE=true
mtime.conffile..etc.default.iscsitarget: 2012-11-12T12:44:03.035180
mtime.conffile..etc.iet.ietd.conf: 2012-11-13T10:11:18.416470
mtime.conffile..etc.iet.initiators.allow: 2012-11-12T12:53:34.124434
mtime.conffile..etc.iet.targets.allow: 2012-11-12T12:54:09.156435

Revision history for this message
Gavin Haslett (ym-gavin) wrote :
Revision history for this message
Gavin Haslett (ym-gavin) wrote :

I don't know for sure if this is pertinent;

The problem didn't occur until the LUN started to fill up with data. There are four LUNs in total spread across two client machines. I have 38TB of local storage which I have carved up into a 10TB LUN and a 9TB LUN, with two LUNs presented to each host.

Is this perhaps a size limit? The problem was not extant until I had more than 5TB on a single LUN. Is there an addressing limit which is maybe causing this crash once we exceed that limit?

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Setting to confirmed since it's a kernel bug (albeit in external module it seems reasonable to treat the same way)
High->seems appropriate for oops on server.

Gavin:
  Can you paste more of the syslog please; in particular is there are a line of 4 numbers just before it ; I can see an eprintk in the iscsi code that looks like it might trigger on the path I think it might be hitting (there is an assert at line 1053 in the code I'm looking at, although it's slightly newer).

Also, is the oops identical every time you hit it?

Dave

Changed in iscsitarget (Ubuntu):
importance: Undecided → High
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.