Ubuntu
linux package

Bug #1882312
Comment #0

Comment 0 for bug 1882312

Revision history for this message

Frode Sandholtbraaten (sfrode) wrote on 2020-06-05:

Linux [hostname removed] 5.3.0-55-generic #49-Ubuntu SMP Thu May 21 12:47:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu release: 19.10 (although the same issue is present in 18.04 and 20.04 as well).

A RAID5 reshape from 3 to 4 devices got stuck:

md127 : active raid5 sde1[5] sdd1[4] sdc1[0] sdf1[3]
      7813769216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [>....................] reshape = 1.8% (72261116/3906884608) finish=1663133.7min speed=38K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

with the following stack trace:

[54979.996871] INFO: task md127_reshape:7090 blocked for more than 1208 seconds.
[54979.996922] Tainted: P OE 5.3.0-55-generic #49-Ubuntu
[54979.996967] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[54979.997018] md127_reshape D 0 7090 2 0x80004080
[54979.997019] Call Trace:
[54979.997022] __schedule+0x2b9/0x6c0
[54979.997023] schedule+0x42/0xb0
[54979.997027] reshape_request+0x878/0x950 [raid456]
[54979.997028] ? wait_woken+0x80/0x80
[54979.997030] raid5_sync_request+0x302/0x3b0 [raid456]
[54979.997032] md_do_sync.cold+0x3ef/0x999
[54979.997034] ? ecryptfs_write_begin+0x70/0x280
[54979.997034] ? __switch_to_asm+0x40/0x70
[54979.997035] ? __switch_to_asm+0x34/0x70
[54979.997035] ? __switch_to_asm+0x40/0x70
[54979.997036] ? __switch_to_asm+0x34/0x70
[54979.997036] ? __switch_to_asm+0x40/0x70
[54979.997037] ? __switch_to_asm+0x34/0x70
[54979.997038] md_thread+0x97/0x160
[54979.997040] kthread+0x104/0x140
[54979.997040] ? md_start_sync+0x60/0x60
[54979.997041] ? kthread_park+0x80/0x80
[54979.997042] ret_from_fork+0x35/0x40

No other hardware errors were reported and the reshape got stuck at somewhat different blocks every time it was restarted (all within the same vicinity of each others). It turns out that md had injected the same exact sector into the badblock log of multiple devices at some point before the reshape was started. This could be seen with "mdadm --examine-badblocks /dev/sdXY". The original cause for the badblocks entries was probably a loose cable as the reported sectors were fully readable with the "dd" and "badblocks" command.

The problem was eventually resolved by removing the badblock log on the RAID5 device using "mdadm --assemble /dev/md0 --update=force-no-bbl". Having removed the badblock log, reshape progressed beyond the previously troublesome area of blocks.

I would have expected at least an error message in the kernel log rather than just a "hung task" message, probably before the reshape was allowed to be initiated (aka early termination). Furthermore, it would be beneficial if mdadm could allow the badblock log to be cleared for a device rather than removed on the array with "update=force-no-bbl".

Linux [hostname removed] 5.3.0-55-generic #49-Ubuntu SMP Thu May 21 12:47:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu release: 19.10 (although the same issue is present in 18.04 and 20.04 as well).

A RAID5 reshape from 3 to 4 devices got stuck:

md127 : active raid5 sde1[5] sdd1[4] sdc1[0] sdf1[3]
      7813769216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  reshape =  1.8% (72261116/3906884608) finish=1663133.7min speed=38K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

with the following stack trace:

[54979.996871] INFO: task md127_reshape:7090 blocked for more than 1208 seconds.
[54979.996922]       Tainted: P           OE     5.3.0-55-generic #49-Ubuntu
[54979.996967] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[54979.997018] md127_reshape   D    0  7090      2 0x80004080
[54979.997019] Call Trace:
[54979.997022]  __schedule+0x2b9/0x6c0
[54979.997023]  schedule+0x42/0xb0
[54979.997027]  reshape_request+0x878/0x950 [raid456]
[54979.997028]  ? wait_woken+0x80/0x80
[54979.997030]  raid5_sync_request+0x302/0x3b0 [raid456]
[54979.997032]  md_do_sync.cold+0x3ef/0x999
[54979.997034]  ? ecryptfs_write_begin+0x70/0x280
[54979.997034]  ? __switch_to_asm+0x40/0x70
[54979.997035]  ? __switch_to_asm+0x34/0x70
[54979.997035]  ? __switch_to_asm+0x40/0x70
[54979.997036]  ? __switch_to_asm+0x34/0x70
[54979.997036]  ? __switch_to_asm+0x40/0x70
[54979.997037]  ? __switch_to_asm+0x34/0x70
[54979.997038]  md_thread+0x97/0x160
[54979.997040]  kthread+0x104/0x140
[54979.997040]  ? md_start_sync+0x60/0x60
[54979.997041]  ? kthread_park+0x80/0x80
[54979.997042]  ret_from_fork+0x35/0x40

Ubuntulinux package

Comment 0 for bug 1882312

Ubuntu
linux package