I'm starting to think this is a bug in the raid4-5 driver. I'm seeing this in my kernel log:
Dec 22 21:56:30 faldara kernel: [ 1939.397600] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
Dec 22 21:56:30 faldara kernel: [ 1939.417446] quiet_error: 9 callbacks suppressed
Dec 22 21:56:30 faldara kernel: [ 1939.417455] Buffer I/O error on device dm-9, logical block 0
Dec 22 21:56:30 faldara kernel: [ 1939.417470] Buffer I/O error on device dm-9, logical block 1
Dec 22 21:56:30 faldara kernel: [ 1939.417480] Buffer I/O error on device dm-9, logical block 2
Dec 22 21:56:30 faldara kernel: [ 1939.417489] Buffer I/O error on device dm-9, logical block 3
Dec 22 21:56:30 faldara kernel: [ 1939.417498] Buffer I/O error on device dm-9, logical block 4
Dec 22 21:56:30 faldara kernel: [ 1939.417507] Buffer I/O error on device dm-9, logical block 5
Dec 22 21:56:30 faldara kernel: [ 1939.417516] Buffer I/O error on device dm-9, logical block 6
Dec 22 21:56:30 faldara kernel: [ 1939.417525] Buffer I/O error on device dm-9, logical block 7
Dec 22 21:56:30 faldara kernel: [ 1939.417543] Buffer I/O error on device dm-9, logical block 0
Dec 22 21:56:30 faldara kernel: [ 1939.417552] Buffer I/O error on device dm-9, logical block 1
Dec 22 21:56:30 faldara kernel: [ 1939.419218] device-mapper: table: 252:10: raid45: Invalid RAID device offset parameter
Dec 22 21:56:30 faldara kernel: [ 1939.419230] device-mapper: ioctl: error adding target to table
And then the dmraid process becomes stuck in the uninterruptable state.
Can you check your kernel log for similar entries, and if dmraid -ay becomes stuck, hit alt-sysrq-w and it should add lines to the log that look something like:
Dec 22 21:57:06 faldara kernel: [ 1975.682785] SysRq : Show Blocked State
Dec 22 21:57:06 faldara kernel: [ 1975.682796] task PC stack pid father
Dec 22 21:57:06 faldara kernel: [ 1975.682864] dmraid D 0000000100028e87 0 3700 3475 0x00000004
Dec 22 21:57:06 faldara kernel: [ 1975.682874] ffff880064a77ce8 0000000000000086 ffffffff00000000 0000000000015980
Dec 22 21:57:06 faldara kernel: [ 1975.682883] ffff880064a77fd8 0000000000015980 ffff880064a77fd8 ffff88006743c4a0
Dec 22 21:57:06 faldara kernel: [ 1975.682892] 0000000000015980 0000000000015980 ffff880064a77fd8 0000000000015980
Dec 22 21:57:06 faldara kernel: [ 1975.682901] Call Trace:
Dec 22 21:57:06 faldara kernel: [ 1975.682918] [<ffffffff81588175>] schedule_timeout+0x195/0x310
Dec 22 21:57:06 faldara kernel: [ 1975.682931] [<ffffffff810702f0>] ? process_timeout+0x0/0x10
Dec 22 21:57:06 faldara kernel: [ 1975.682940] [<ffffffff8158830e>] schedule_timeout_uninterruptible+0x1e/0x20
Dec 22 21:57:06 faldara kernel: [ 1975.682949] [<ffffffff81071420>] msleep+0x20/0x30
Dec 22 21:57:06 faldara kernel: [ 1975.682958] [<ffffffff81457dba>] __dm_destroy+0x9a/0x150
Dec 22 21:57:06 faldara kernel: [ 1975.682965] [<ffffffff81457ea3>] dm_destroy+0x13/0x20
Dec 22 21:57:06 faldara kernel: [ 1975.682974] [<ffffffff8145d570>] dev_remove+0x90/0x110
Dec 22 21:57:06 faldara kernel: [ 1975.682981] [<ffffffff8145d4e0>] ? dev_remove+0x0/0x110
Dec 22 21:57:06 faldara kernel: [ 1975.682988] [<ffffffff8145ddb5>] ctl_ioctl+0x1a5/0x250
Dec 22 21:57:06 faldara kernel: [ 1975.682996] [<ffffffff8145de73>] dm_ctl_ioctl+0x13/0x20
Dec 22 21:57:06 faldara kernel: [ 1975.683005] [<ffffffff81162f0d>] vfs_ioctl+0x3d/0xd0
Dec 22 21:57:06 faldara kernel: [ 1975.683011] [<ffffffff811637e1>] do_vfs_ioctl+0x81/0x340
Dec 22 21:57:06 faldara kernel: [ 1975.683018] [<ffffffff81163b21>] sys_ioctl+0x81/0xa0
Dec 22 21:57:06 faldara kernel: [ 1975.683026] [<ffffffff8158a99e>] ? do_device_not_available+0xe/0x10
Dec 22 21:57:06 faldara kernel: [ 1975.683038] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
I'm starting to think this is a bug in the raid4-5 driver. I'm seeing this in my kernel log:
Dec 22 21:56:30 faldara kernel: [ 1939.397600] device-mapper: ioctl: device doesn't appear to be in the dev hash table.
Dec 22 21:56:30 faldara kernel: [ 1939.417446] quiet_error: 9 callbacks suppressed
Dec 22 21:56:30 faldara kernel: [ 1939.417455] Buffer I/O error on device dm-9, logical block 0
Dec 22 21:56:30 faldara kernel: [ 1939.417470] Buffer I/O error on device dm-9, logical block 1
Dec 22 21:56:30 faldara kernel: [ 1939.417480] Buffer I/O error on device dm-9, logical block 2
Dec 22 21:56:30 faldara kernel: [ 1939.417489] Buffer I/O error on device dm-9, logical block 3
Dec 22 21:56:30 faldara kernel: [ 1939.417498] Buffer I/O error on device dm-9, logical block 4
Dec 22 21:56:30 faldara kernel: [ 1939.417507] Buffer I/O error on device dm-9, logical block 5
Dec 22 21:56:30 faldara kernel: [ 1939.417516] Buffer I/O error on device dm-9, logical block 6
Dec 22 21:56:30 faldara kernel: [ 1939.417525] Buffer I/O error on device dm-9, logical block 7
Dec 22 21:56:30 faldara kernel: [ 1939.417543] Buffer I/O error on device dm-9, logical block 0
Dec 22 21:56:30 faldara kernel: [ 1939.417552] Buffer I/O error on device dm-9, logical block 1
Dec 22 21:56:30 faldara kernel: [ 1939.419218] device-mapper: table: 252:10: raid45: Invalid RAID device offset parameter
Dec 22 21:56:30 faldara kernel: [ 1939.419230] device-mapper: ioctl: error adding target to table
And then the dmraid process becomes stuck in the uninterruptable state.
Can you check your kernel log for similar entries, and if dmraid -ay becomes stuck, hit alt-sysrq-w and it should add lines to the log that look something like:
Dec 22 21:57:06 faldara kernel: [ 1975.682785] SysRq : Show Blocked State 175>] schedule_ timeout+ 0x195/0x310 2f0>] ? process_ timeout+ 0x0/0x10 30e>] schedule_ timeout_ uninterruptible +0x1e/0x20 420>] msleep+0x20/0x30 dba>] __dm_destroy+ 0x9a/0x150 ea3>] dm_destroy+ 0x13/0x20 570>] dev_remove+ 0x90/0x110 4e0>] ? dev_remove+ 0x0/0x110 db5>] ctl_ioctl+ 0x1a5/0x250 e73>] dm_ctl_ ioctl+0x13/ 0x20 f0d>] vfs_ioctl+0x3d/0xd0 7e1>] do_vfs_ ioctl+0x81/ 0x340 b21>] sys_ioctl+0x81/0xa0 99e>] ? do_device_ not_available+ 0xe/0x10 0f2>] system_ call_fastpath+ 0x16/0x1b
Dec 22 21:57:06 faldara kernel: [ 1975.682796] task PC stack pid father
Dec 22 21:57:06 faldara kernel: [ 1975.682864] dmraid D 0000000100028e87 0 3700 3475 0x00000004
Dec 22 21:57:06 faldara kernel: [ 1975.682874] ffff880064a77ce8 0000000000000086 ffffffff00000000 0000000000015980
Dec 22 21:57:06 faldara kernel: [ 1975.682883] ffff880064a77fd8 0000000000015980 ffff880064a77fd8 ffff88006743c4a0
Dec 22 21:57:06 faldara kernel: [ 1975.682892] 0000000000015980 0000000000015980 ffff880064a77fd8 0000000000015980
Dec 22 21:57:06 faldara kernel: [ 1975.682901] Call Trace:
Dec 22 21:57:06 faldara kernel: [ 1975.682918] [<ffffffff81588
Dec 22 21:57:06 faldara kernel: [ 1975.682931] [<ffffffff81070
Dec 22 21:57:06 faldara kernel: [ 1975.682940] [<ffffffff81588
Dec 22 21:57:06 faldara kernel: [ 1975.682949] [<ffffffff81071
Dec 22 21:57:06 faldara kernel: [ 1975.682958] [<ffffffff81457
Dec 22 21:57:06 faldara kernel: [ 1975.682965] [<ffffffff81457
Dec 22 21:57:06 faldara kernel: [ 1975.682974] [<ffffffff8145d
Dec 22 21:57:06 faldara kernel: [ 1975.682981] [<ffffffff8145d
Dec 22 21:57:06 faldara kernel: [ 1975.682988] [<ffffffff8145d
Dec 22 21:57:06 faldara kernel: [ 1975.682996] [<ffffffff8145d
Dec 22 21:57:06 faldara kernel: [ 1975.683005] [<ffffffff81162
Dec 22 21:57:06 faldara kernel: [ 1975.683011] [<ffffffff81163
Dec 22 21:57:06 faldara kernel: [ 1975.683018] [<ffffffff81163
Dec 22 21:57:06 faldara kernel: [ 1975.683026] [<ffffffff8158a
Dec 22 21:57:06 faldara kernel: [ 1975.683038] [<ffffffff8100a
Please confirm if this is the case.