3ware 9650SE issue - dpkg segfaults, hangs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
I'm trying to use a brand newly installed system (at an ISP), which has a 3ware RAID card:
05:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
after a brand new Ubuntu 10.04 64bit install, I'm simply using dselect to install the packages I want - and running a number of rsync instances in the meantime, to transfer data. on two different occasions, this resulted in dpkg crashing, so much that in fact I couldn't even kill it with kill -9. this is what I got in syslog:
Aug 15 13:30:03 d4 kernel: [ 4.659871] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
Aug 15 13:30:03 d4 kernel: [ 4.660221] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
Aug 15 13:30:03 d4 kernel: [ 4.660706] 3w-9xxx: scsi0: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
Aug 15 13:30:03 d4 kernel: [ 4.743196] type=1505 audit(128189340
Aug 15 13:30:03 d4 kernel: [ 4.743673] type=1505 audit(128189340
Aug 15 13:30:03 d4 kernel: [ 4.743929] type=1505
audit(128189340
Aug 15 13:30:03 d4 kernel: [ 4.752861] type=1505 audit(128189340
Aug 15 13:30:04 d4 kernel: [ 6.162523] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
Aug 15 13:30:04 d4 kernel: [ 6.162526] 0000:0d:00.0: eth0: 10/100 speed: disabling TSO
Aug 15 13:30:04 d4 kernel: [ 6.163308] ADDRCONF(
Aug 15 13:49:59 d4 kernel: [ 1200.730658] dpkg D 0000000000000000 0 5429 1174 0x00000000
Aug 15 13:49:59 d4 kernel: [ 1200.730663] ffff880225087db8 0000000000000082 0000000000015bc0 0000000000015bc0 Aug 15 13:49:59 d4 kernel: [ 1200.730668] ffff880224241ab0 ffff880225087fd8 0000000000015bc0 ffff8802242416f0
Aug 15 13:49:59 d4 kernel: [ 1200.730671] 0000000000015bc0 ffff880225087fd8 0000000000015bc0 ffff880224241ab0
Aug 15 13:49:59 d4 kernel: [ 1200.730675] Call Trace:
Aug 15 13:49:59 d4 kernel: [ 1200.730684] [<ffffffff81166
Aug 15 13:49:59 d4 kernel: [ 1200.730687] [<ffffffff81166
Aug 15 13:49:59 d4 kernel: [ 1200.730692] [<ffffffff81542
Aug 15 13:49:59 d4 kernel: [ 1200.730695] [<ffffffff81166
Aug 15 13:49:59 d4 kernel: [ 1200.730698] [<ffffffff81542
Aug 15 13:49:59 d4 kernel: [ 1200.730702] [<ffffffff81085
Aug 15 13:49:59 d4 kernel: [ 1200.730705] [<ffffffff81166
Aug 15 13:49:59 d4 kernel: [ 1200.730708] [<ffffffff81167
Aug 15 13:49:59 d4 kernel: [ 1200.730712] [<ffffffff81167
Aug 15 13:49:59 d4 kernel: [ 1200.730715] [<ffffffff8116b
Aug 15 13:49:59 d4 kernel: [ 1200.730718] [<ffffffff8116b
Aug 15 13:49:59 d4 kernel: [ 1200.730721] [<ffffffff8116b
Aug 15 13:49:59 d4 kernel: [ 1200.730725] [<ffffffff81013
interestingly, when I killed the rsync processes, dpkg somehow got resurrected the second time.
this is all kind of disturbing, as it seems on a bit heavy disk I/O, the raid subsystem becomes unstable - what could be the issue here?
here's some additonal info on the system:
# lspci
00:00.0 Host bridge: Intel Corporation 3200/3210 Chipset DRAM Controller (rev 01)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 02)
05:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
11:04.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Z9s/Z9m (XG21 core)
Changed in linux (Ubuntu): | |
status: | Expired → New |
I got exactly the same problem, on Ubuntu 10.04.1 64bit :
Did you find any workaround by tweaking the parameters in the 3ware BIOS ?
Could it be caused by the 64 bit system ?
# uname -a
Linux NAS 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010 x86_64 GNU/Linux
01:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
Capabilities: [40] Power Management version 2
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/5 Enable-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting <?>
Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f8000000 (64-bit, prefetchable) [size=32M]
Memory at fbdff000 (64-bit, non-prefetchable) [size=4K]
I/O ports at c800 [size=256]
Expansion ROM at fbdc0000 [disabled] [size=128K]
Kernel driver in use: 3w-9xxx
Kernel modules: 3w-9xxx
# dmesg | grep 3w
[ 1.642150] 3ware 9000 Storage Controller device driver for Linux v2.26.02.012.
[ 1.642186] 3w-9xxx 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.642190] 3w-9xxx 0000:01:00.0: setting latency timer to 64
[ 2.719058] scsi4 : 3ware 9000 Storage Controller
[ 2.719137] 3w-9xxx: scsi4: Found a 3ware 9000 Storage Controller at 0xfbdff000, IRQ: 16.
[ 3.078989] 3w-9xxx: scsi4: Firmware FE9X 4.08.00.006, BIOS BE9X 4.08.00.001, Ports: 8.
[ 4.390974] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 4.391246] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 4.391495] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 4.427395] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 4.429360] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 4.429615] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 93.437874] 3w-9xxx: scsi4: AEN: INFO (0x04:0x000C): Initialize started:unit=1.
[ 1215.126837] 3w-9xxx: scsi4: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=1.
[ 1226.366384] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 1226.382731] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 1236.209880] 3w-9xxx: scsi4: AEN: INFO (0x04:0x000C): Initialize started:unit=1.
[ 1236.495049] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
[ 1236.495410] 3w-9xxx: scsi4: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85.
I am also having dpkg hanging up, this was also during a heavy disk I/O (mkfs on a 6TB partition) :
[ 2880.398449] INFO: task dpkg:1214 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message.
[ 2880.398522] "echo 0 > /proc/sys/
[ 2880.398589] dpkg D 0000000000000000 0 1214 1196 0x00000000
[ 2880.398596] ffff88007ba83db8 0000000000000086 0000000000015bc0 0000000000015bc0
[ 2880.398603] ffff880132269ab0 ffff88007ba83fd8 0000000000015bc0 ffff8801322696f0
[ 2880.398608] 0000000000015bc0 ffff88007ba83...