aacraid driver stalls on high-load SMP machines
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Won't Fix
|
Medium
|
|||
linux (Ubuntu) |
Invalid
|
Medium
|
Unassigned |
Bug Description
Under load, this happens rather often:
Jul 18 22:55:24 nun kernel: [86674.467410] aacraid: Host adapter abort request (0,0,2,0)
Jul 18 22:55:24 nun kernel: [86674.467487] aacraid: Host adapter abort request (0,0,3,0)
Jul 18 22:55:24 nun kernel: [86674.467617] aacraid: Host adapter reset request. SCSI hang ?
Jul 18 22:57:26 nun kernel: [86815.728423] aacraid: Host adapter abort request (0,0,0,0)
Jul 18 22:57:26 nun kernel: [86815.728500] aacraid: Host adapter abort request (0,0,3,0)
Jul 18 22:57:26 nun kernel: [86815.728573] aacraid: Host adapter abort request (0,0,2,0)
Jul 18 22:57:26 nun kernel: [86815.728640] aacraid: Host adapter abort request (0,0,1,0)
Jul 18 22:57:26 nun kernel: [86815.728772] aacraid: Host adapter reset request. SCSI hang ?
Access to the storage thus stalls for ten seconds or so.
I have successfully worked around the problem by using "schedtool -a 1 pid-of-
However, one CPU is _somewhat_ slower than four, which is quite noticeable, so we'd like to get this handled somehow :-/
lspci:
05:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
Subsystem: Dell PowerEdge 2400,2500,2550,4400
Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 7
BIST result: 00
I/O ports at cc00 [size=256]
Memory at fccff000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at fcd00000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
05:06.1 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
Subsystem: Dell PowerEdge 2400,2500,2550,4400
Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
BIST result: 00
I/O ports at c800 [size=256]
Memory at fccfe000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at f8100000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
lspci -n:
05:06.0 0100: 9005:00c5 (rev 01)
Subsystem: 1028:00c5
Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 7
BIST result: 00
I/O ports at cc00 [size=256]
Memory at fccff000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at fcd00000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
05:06.1 0100: 9005:00c5 (rev 01)
Subsystem: 1028:00c5
Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
BIST result: 00
I/O ports at c800 [size=256]
Memory at fccfe000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at f8100000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Changed in linux: | |
status: | Unknown → Confirmed |
Changed in linux: | |
status: | Confirmed → In Progress |
Changed in linux: | |
assignee: | nobody → ubuntu-kernel-team |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in linux (Ubuntu): | |
assignee: | nobody → Andy Whitcroft (apw) |
status: | Triaged → In Progress |
Changed in linux: | |
status: | In Progress → Invalid |
Changed in linux (Ubuntu): | |
assignee: | Andy Whitcroft (apw) → nobody |
Changed in linux (Ubuntu): | |
status: | In Progress → Triaged |
Changed in linux: | |
status: | Invalid → Won't Fix |
Changed in linux: | |
importance: | Unknown → Medium |
Update: my uniprocessor band-aid, besides significantly decreasing performance, resulted in an eventual CPU soft-hang (all of them) some hours later, so this workaround obviously doesn't.