mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox x
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Incomplete
|
Medium
|
Unassigned |
Bug Description
we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36(ubuntu package linux-image-
We have created raid10 using 22 960GB SSDs [1] . The problem we're
experiencing is that /usr/share/
(executed by cron, included in a mdadm pkg) results in (soft?)
deadlock - load on the node spikes up to 500-700 and all I/O operations
are blocked for a period of time. We can see traces liek these [2] in
our kernel log.
e.g. it ends up in static state like
test@os-node1:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4] dm-19[5] dm-17[3]
10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22] [UUUUUUUUUUUUUU
[
bitmap: 0/39 pages [0KB], 131072KB chunk
unused devices: <none>
and the only solution is to hard reboot the node. What we found out is that it
doesn't happen on idle raid, we have to generate some significant load
(10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.
Anyone ever experienced similar issues? Do you have any suggestions how to
better trouble shoot this issue and maybe identify if disks or software layer
is responsible for this behavior
[1] http://
[2] https:/
[3] https:/
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1776159
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.