Ubuntu
linux package

mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox x

Bug #1776159 reported by haad on 2018-06-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Incomplete	Medium	Unassigned

Bug Description

we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36(ubuntu package linux-image-generic-hwe-16.04).
We have created raid10 using 22 960GB SSDs [1] . The problem we're
experiencing is that /usr/share/mdadm/checkarray
(executed by cron, included in a mdadm pkg) results in (soft?)
deadlock - load on the node spikes up to 500-700 and all I/O operations
are blocked for a period of time. We can see traces liek these [2] in
our kernel log.

e.g. it ends up in static state like

test@os-node1:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4] dm-19[5] dm-17[3]
                    dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18] dm-11[17]
                    dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12] dm-5[11] dm-4[10] dm-3[1] dm-2[0]
      10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22] [UUUUUUUUUUUUUUUUUUUUUU]
      [===>.................] check = 19.0% (1965748032/10313171968) finish=1034728.8min speed=134K/sec
      bitmap: 0/39 pages [0KB], 131072KB chunk
unused devices: <none>

and the only solution is to hard reboot the node. What we found out is that it
doesn't happen on idle raid, we have to generate some significant load
(10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.

Anyone ever experienced similar issues? Do you have any suggestions how to
better trouble shoot this issue and maybe identify if disks or software layer
is responsible for this behavior

[1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
[2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
[3] https://github.com/axboe/fio

Tags:

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-06-11: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1776159

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-06-12:

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17

tags:	added: artful kernel-da-key
Changed in linux (Ubuntu):
importance:	Undecided → Medium

Revision history for this message

haad (haaaad) wrote on 2018-06-13: Re: [Bug 1776159] Re: mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox x

Download full text (5.8 KiB)

Hi,

For us trigger for this issue is checkarray script run by crond every first
sunday of month. In 16.04.1 however there was som problem with dash
shell/kernel [1] which resulted in state where cron was executing
checkarray script but it was failing silently for almost 2 years. I have
tried to install 4.17 on our test system and it looks like it needs newer
version of libssl1.1. based on [2] we should install newer system which
will not work for us I'm afraid. Do you have any suggestion how to test
4.17 on 16.04.4 ?

root@os-node6:~# dpkg -i
linux-headers-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
linux-modules-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
Selecting previously unselected package linux-headers-4.17.0-041700-generic.
(Reading database ... 156683 files and directories currently installed.)
Preparing to unpack
linux-headers-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb ...
Unpacking linux-headers-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
Selecting previously unselected package
linux-image-unsigned-4.17.0-041700-generic.
Preparing to unpack
linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
...
Unpacking linux-image-unsigned-4.17.0-041700-generic
(4.17.0-041700.201806041953) ...
Selecting previously unselected package linux-modules-4.17.0-041700-generic.
Preparing to unpack
linux-modules-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb ...
Unpacking linux-modules-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
dpkg: dependency problems prevent configuration of
linux-headers-4.17.0-041700-generic:
linux-headers-4.17.0-041700-generic depends on
linux-headers-4.17.0-041700; however:
Package linux-headers-4.17.0-041700 is not installed.
linux-headers-4.17.0-041700-generic depends on libssl1.1 (>= 1.1.0);
however:
Package libssl1.1 is not installed.

dpkg: error processing package linux-headers-4.17.0-041700-generic
(--install):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of
linux-image-unsigned-4.17.0-041700-generic:
linux-image-unsigned-4.17.0-041700-generic depends on linux-base (>=
4.5ubuntu1~16.04.1); however:
Version of linux-base on system is 4.0ubuntu1.

dpkg: error processing package linux-image-unsigned-4.17.0-041700-generic
(--install):
dependency problems - leaving unconfigured
Setting up linux-modules-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
Errors were encountered while processing:
linux-headers-4.17.0-041700-generic
linux-image-unsigned-4.17.0-041700-generic

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
[2] https://wiki.ubuntu.com/Kernel/MainlineBuilds

On Tue, Jun 12, 2018 at 10:32 PM, Joseph Salisbury <
<email address hidden>> wrote:

Hi,

For us trigger for this issue is checkarray script run by crond every first
sunday of month. In 16.04.1 however there was som problem with dash
shell/kernel  [1] which resulted in state where cron was executing
checkarray script but it was failing silently for almost 2 years. I have
tried to install 4.17 on our test system and it looks like it needs newer
version of libssl1.1. based on [2] we should install newer system which
will not work for us I'm afraid. Do you have any suggestion how to test
4.17 on 16.04.4 ?

root@os-node6:~# dpkg -i
linux-headers-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
linux-modules-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
Selecting previously unselected package linux-headers-4.17.0-041700-generic.
(Reading database ... 156683 files and directories currently installed.)
Preparing to unpack
linux-headers-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb ...
Unpacking linux-headers-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
Selecting previously unselected package
linux-image-unsigned-4.17.0-041700-generic.
Preparing to unpack
linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
...
Unpacking linux-image-unsigned-4.17.0-041700-generic
(4.17.0-041700.201806041953) ...
Selecting previously unselected package linux-modules-4.17.0-041700-generic.
Preparing to unpack
linux-modules-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb ...
Unpacking linux-modules-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
dpkg: dependency problems prevent configuration of
linux-headers-4.17.0-041700-generic:
 linux-headers-4.17.0-041700-generic depends on
linux-headers-4.17.0-041700; however:
  Package linux-headers-4.17.0-041700 is not installed.
 linux-headers-4.17.0-041700-generic depends on libssl1.1 (>= 1.1.0);
however:
  Package libssl1.1 is not installed.

dpkg: error processing package linux-headers-4.17.0-041700-generic
(--install):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of
linux-image-unsigned-4.17.0-041700-generic:
 linux-image-unsigned-4.17.0-041700-generic depends on linux-base (>=
4.5ubuntu1~16.04.1); however:
  Version of linux-base on system is 4.0ubuntu1.

dpkg: error processing package linux-image-unsigned-4.17.0-041700-generic
(--install):
 dependency problems - leaving unconfigured
Setting up linux-modules-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
Errors were encountered while processing:
 linux-headers-4.17.0-041700-generic
 linux-image-unsigned-4.17.0-041700-generic

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
[2] https://wiki.ubuntu.com/Kernel/MainlineBuilds

On Tue, Jun 12, 2018 at 10:32 PM, Joseph Salisbury <
joseph.salisbury@canonical.com> wrote:

> Did this issue start happening after an update/upgrade?  Was there a
> prior kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v4.17 kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17
>
> ** Tags added: artful kernel-da-key
>
> ** Changed in: linux (Ubuntu)
>    Importance: Undecided => Medium
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1776159
>
> Title:
>   mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox  x
>
> Status in linux package in Ubuntu:
>   Incomplete
>
> Bug description:
>   we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36(ubuntu
> package linux-image-generic-hwe-16.04).
>   We have created raid10 using 22 960GB SSDs [1] . The problem we're
>   experiencing is that /usr/share/mdadm/checkarray
>   (executed by cron, included in a mdadm pkg) results in (soft?)
>   deadlock - load on the node spikes up to 500-700 and all I/O operations
>   are blocked for a period of time. We can see traces liek these [2] in
>   our kernel log.
>
>   e.g. it ends up in static state like
>
>   test@os-node1:~$ cat /proc/mdstat
>   Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
>   md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4]
> dm-19[5] dm-17[3]
>                       dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18]
> dm-11[17]
>                       dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12]
> dm-5[11] dm-4[10] dm-3[1] dm-2[0]
>         10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22]
> [UUUUUUUUUUUUUUUUUUUUUU]
>         [===>.................]  check = 19.0% (1965748032/10313171968)
> finish=1034728.8min speed=134K/sec
>         bitmap: 0/39 pages [0KB], 131072KB chunk
>   unused devices: <none>
>
>   and the only solution is to hard reboot the node. What we found out is
> that it
>   doesn't happen on idle raid, we have to generate some significant load
>   (10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the
> issue.
>
>   Anyone ever experienced similar issues? Do you have any suggestions how
> to
>   better trouble shoot this issue and maybe identify if disks or software
> layer
>   is responsible for this behavior
>
>   [1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
>   [2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
>   [3] https://github.com/axboe/fio
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1776159/+subscriptions
>

Regards.

Adam

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #787950
[done important patch] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox x

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
linux package