Description of problem:
When doing "intensive" I/O, the mpt* drivers crashes the filesystem, on Fedora 12.
The problem is on an IBM x3580 M2 machine, using the integrated LSI SAS1078 C1 PCI-express Fusion-MPT SAS.
Steps to Reproduce:
1. Create a big allocated space (20GB for example)
2. dd if=/dev/vg/mybigspace of=/dev/null
3. After a few minutes, the filesystem access becomes impossible. Looking at dmesg, you get the following:
Message from syslogd@mymachine at Nov 26 10:38:28 ...
kernel:This should not happen.!! Data will be lost
The first error message ("Calgary: DMA error on CalIOC2 PHB 0x3") seems to be related to a bug in the Calgary code, as detailed in a thread in LKML:
"The calgary code can give drivers addresses above 4GB which is very bad for hardware that is only 32bit DMA addressable" (http://<email address hidden>/2008-06/05248/Re:_%5BPATCH_-mm%5D_x86_calgary:_fix_handling_of_devces_that_aren%27t_behind_the_Calgary ).
But it's from 2008, I thought this would have been corrected...
After looking on another bug report (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/343749 ), the temporary solution seems to be to set iommu=soft at boot. But I guess this affects performance... Acccording to that bug report, the bug seems to be corrected on RH 9 ?!
The bug exists in Fedora 12, and makes it unusable on a x3580 M2.
Description of problem:
When doing "intensive" I/O, the mpt* drivers crashes the filesystem, on Fedora 12.
The problem is on an IBM x3580 M2 machine, using the integrated LSI SAS1078 C1 PCI-express Fusion-MPT SAS.
Steps to Reproduce: vg/mybigspace of=/dev/null
1. Create a big allocated space (20GB for example)
2. dd if=/dev/
3. After a few minutes, the filesystem access becomes impossible. Looking at dmesg, you get the following:
Calgary: DMA error on CalIOC2 PHB 0x3 5-127.fc12. x86_64 #1 efc>] __report_ bad_irq+ 0x3d/0x8c b063>] note_interrupt+ 0x118/0x17d b6f2>] handle_ fasteoi_ irq+0xa1/ 0xc6 463c>] handle_ irq+0x8b/ 0x93 e9cc>] do_IRQ+0x5c/0xbc 26d3>] ret_from_ intr+0x0/ 0x11 07f>] ? mwait_idle+ 0x91/0xae 907f>] ? mwait_idle+ 0x91/0xae 9021>] ? mwait_idle+ 0x33/0xae d079>] ? atomic_ notifier_ call_chain+ 0x13/0x15 0bb8>] ? enter_idle+ 0x25/0x27 0c60>] ? cpu_idle+0xa6/0xe9 45be>] ? start_secondary +0x1f3/ 0x234 d7e>] (mpt_interrupt+ 0x0/0x8bb [mptbase]) fa400) IssueTaskMgmt! ! fa400) b3100) IssueTaskMgmt! ! b3100) b2600) IssueTaskMgmt! ! mpage_da_ map_blocks block allocation failed for inode 11762 at logical offset 2 with max blocks 1 with error -30
Calgary: 0x80000000@CSR 0x00000000@PLSSR 0xb0008000@CSMR 0x00000000@MCK
Calgary: 0x00000000@0x810 0x00000000@0x820 0x00000000@0x830 0x00000000@0x840 0x00000000@0x850 0x00000000@0x860 0x00000000@0x870
Calgary: 0x40000000@0xcb0
irq 46: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.31.
Call Trace:
<IRQ> [<ffffffff8109a
[<ffffffff8109
[<ffffffff8109
[<ffffffff8101
[<ffffffff8141
[<ffffffff8101
<EOI> [<ffffffff81019
[<ffffffff8101
[<ffffffff8101
[<ffffffff8141
[<ffffffff8101
[<ffffffff8101
[<ffffffff8141
handlers:
[<ffffffffa00e3
Disabling IRQ #46
mptscsih: ioc0: attempting task abort! (sc=ffff880a0d8
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 fb 5b a7 00 00 60 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY WARNING!
mptbase: WARNING - (-1) Cannot recover ioc0
mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=ffff880a0d8
mptscsih: ioc0: attempting task abort! (sc=ffff880a04b
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 28 f5 ff 00 00 08 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: WARNING - NOT READY WARNING!
mptbase: WARNING - (-1) Cannot recover ioc0
mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!
mptscsih: ioc0: task abort: FAILED (sc=ffff880a04b
mptscsih: ioc0: attempting task abort! (sc=ffff880a04b
sd 2:1:4:0: [sda] CDB: Write(10): 2a 00 00 61 de 27 00 00 08 00
mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!
mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_
mptbase: ioc0: Initiating recovery
mptbase: ioc0: WARNING - Unexpected doorbell active!
[root@flanders ubuntu]#
Message from syslogd@mymachine at Nov 26 10:38:28 ...
kernel:
Message from syslogd@mymachine at Nov 26 10:38:28 ...
kernel:This should not happen.!! Data will be lost
The first error message ("Calgary: DMA error on CalIOC2 PHB 0x3") seems to be related to a bug in the Calgary code, as detailed in a thread in LKML: /2008-06/ 05248/Re: _%5BPATCH_ -mm%5D_ x86_calgary: _fix_handling_ of_devces_ that_aren% 27t_behind_ the_Calgary ).
"The calgary code can give drivers addresses above 4GB which is very bad for hardware that is only 32bit DMA addressable" (http://<email address hidden>
But it's from 2008, I thought this would have been corrected...
After looking on another bug report (https:/ /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/343749 ), the temporary solution seems to be to set iommu=soft at boot. But I guess this affects performance... Acccording to that bug report, the bug seems to be corrected on RH 9 ?!
The bug exists in Fedora 12, and makes it unusable on a x3580 M2.