Mdadm crash on raid5 reshape

Bug #1001019 reported by Alex Sergeyev
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Ubuntu 12.04 LTS
mdadm 3.2.3-2ubuntu1 0

I try to grow mdadm raid5 array with changing device number and chunk size on same time:
mdadm --grow /dev/md2 --raid-devices=4 --chunk=512 --backup-file=/root/md2_backup.img

After this /proc/mdstat don't respond and I can't gen any info about raid status

May 17 23:27:17 protoss kernel: [40835.240145] RAID conf printout:
May 17 23:27:17 protoss kernel: [40835.240153] --- level:5 rd:4 wd:4
May 17 23:27:17 protoss kernel: [40835.240160] disk 0, o:1, dev:sdc1
May 17 23:27:17 protoss kernel: [40835.240166] disk 1, o:1, dev:sdd1
May 17 23:27:17 protoss kernel: [40835.240171] disk 2, o:1, dev:sdf1
May 17 23:27:17 protoss kernel: [40835.240176] disk 3, o:1, dev:sdg1
May 17 23:27:17 protoss kernel: [40835.240332] md: reshape of RAID array md2
May 17 23:27:17 protoss kernel: [40835.240342] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
May 17 23:27:17 protoss kernel: [40835.240348] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
May 17 23:27:17 protoss kernel: [40835.240365] md: using 128k window, over a total of 1953513472k.
May 17 23:27:17 protoss kernel: [40835.507761] md: md_do_sync() got signal ... exiting
May 17 23:30:42 protoss kernel: [41040.652048] INFO: task md2_raid5:259 blocked for more than 120 seconds.
May 17 23:30:42 protoss kernel: [41040.652055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 23:30:42 protoss kernel: [41040.652062] md2_raid5 D ffffffff81806240 0 259 2 0x00000000
May 17 23:30:42 protoss kernel: [41040.652074] ffff8801336a1c40 0000000000000046 ffffea0001d7f180 ffffffff7fffffff
May 17 23:30:42 protoss kernel: [41040.652086] ffff8801336a1fd8 ffff8801336a1fd8 ffff8801336a1fd8 0000000000013780
May 17 23:30:42 protoss kernel: [41040.652097] ffff880138b796f0 ffff880133ba2de0 ffff8801336a1c30 ffff880133ba2de0
May 17 23:30:42 protoss kernel: [41040.652108] Call Trace:
May 17 23:30:42 protoss kernel: [41040.652123] [<ffffffff8165a55f>] schedule+0x3f/0x60
May 17 23:30:42 protoss kernel: [41040.652157] [<ffffffffa00a3d2e>] resize_stripes+0x51e/0x590 [raid456]
May 17 23:30:42 protoss kernel: [41040.652167] [<ffffffff81056c9c>] ? update_shares+0xcc/0x100
May 17 23:30:42 protoss kernel: [41040.652176] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
May 17 23:30:42 protoss kernel: [41040.652192] [<ffffffffa00a3e1f>] check_reshape+0x7f/0xd0 [raid456]
May 17 23:38:42 protoss kernel: [41520.652310] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
May 17 23:38:42 protoss kernel: [41520.652319] [<ffffffff81666bf0>] ? gs_change+0x13/0x13
May 17 23:38:42 protoss kernel: [41520.652357] INFO: task lvm:25938 blocked for more than 120 seconds.
May 17 23:38:42 protoss kernel: [41520.652362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 23:38:42 protoss kernel: [41520.652367] lvm D ffffffff81806240 0 25938 25935 0x00000000
May 17 23:38:42 protoss kernel: [41520.652377] ffff88006c6b1ae8 0000000000000086 0000000000000000 ffff88006c6b1aa8
May 17 23:38:42 protoss kernel: [41520.652388] ffff88006c6b1fd8 ffff88006c6b1fd8 ffff88006c6b1fd8 0000000000013780
May 17 23:38:42 protoss kernel: [41520.652398] ffff880138b616f0 ffff88010c02dbc0 ffff88006c6b1ab8 ffff88013fc94040
May 17 23:38:42 protoss kernel: [41520.652409] Call Trace:
May 17 23:38:42 protoss kernel: [41520.652417] [<ffffffff8165a55f>] schedule+0x3f/0x60
May 17 23:38:42 protoss kernel: [41520.652424] [<ffffffff8165a60f>] io_schedule+0x8f/0xd0
May 17 23:38:42 protoss kernel: [41520.652434] [<ffffffff811b0e74>] dio_await_completion+0x54/0xd0
May 17 23:38:42 protoss kernel: [41520.652442] [<ffffffff811b3484>] __blockdev_direct_IO+0x954/0xd90
May 17 23:38:42 protoss kernel: [41520.652451] [<ffffffff811af6f0>] ? blkdev_get_block+0x80/0x80
May 17 23:38:42 protoss kernel: [41520.652461] [<ffffffff811af1b7>] blkdev_direct_IO+0x57/0x60
May 17 23:38:42 protoss kernel: [41520.652468] [<ffffffff811af6f0>] ? blkdev_get_block+0x80/0x80
May 17 23:38:42 protoss kernel: [41520.652478] [<ffffffff811196bb>] generic_file_aio_read+0x24b/0x280
May 17 23:38:42 protoss kernel: [41520.652488] [<ffffffff811874ac>] ? path_openat+0xfc/0x3f0
May 17 23:38:42 protoss kernel: [41520.652496] [<ffffffff81177452>] do_sync_read+0xd2/0x110
May 17 23:38:42 protoss kernel: [41520.652506] [<ffffffff8129cd03>] ? security_file_permission+0x93/0xb0
May 17 23:38:42 protoss kernel: [41520.652514] [<ffffffff811778d1>] ? rw_verify_area+0x61/0xf0
May 17 23:38:42 protoss kernel: [41520.652521] [<ffffffff81177db0>] vfs_read+0xb0/0x180
May 17 23:38:42 protoss kernel: [41520.652529] [<ffffffff81177eca>] sys_read+0x4a/0x90
May 17 23:38:42 protoss kernel: [41520.652537] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b

Revision history for this message
James Lee (james-lee) wrote :

I've seen this bug too - I was also doing a reshape of a RAID5 array (but without changing chunk size - just adding a new drive).

I'm also seeing this intermittently only. One thing that might be relevant is that I'm reshaping several arrays in succession through a script (doing them in series, but with no waiting between reshapes). I've attached my dmesg output - similar error and stack to the above.

Let me know if there are any other diags which would help here, I can hopefully repro this without too much trouble.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
Revision history for this message
ole.tange (n-launchpad-net-tange-dk) wrote :

I have seen this too:

md1 : active raid6 sdg[0] sdi[12](S) sdt[15](S) sdy[17](S) sdx[16](S) sdh[8] sdw[13] sdo[14] sdk[5] sdd[11] sdc[3] sdv[9] sdn[10]
      27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/9] [UUUUUUUUU]
      bitmap: 2/2 pages [8KB], 1048576KB chunk

mdadm -v --grow /dev/md1 -b none

mdadm -v --grow /dev/md1 --raid-devices=10 --backup-file=/root/back-md1

cat /proc/mdstat
<<hangs>>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.