Comment 3 for bug 1776159

Revision history for this message
haad (haaaad) wrote : Re: [Bug 1776159] Re: mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox x

Hi,

For us trigger for this issue is checkarray script run by crond every first
sunday of month. In 16.04.1 however there was som problem with dash
shell/kernel [1] which resulted in state where cron was executing
checkarray script but it was failing silently for almost 2 years. I have
tried to install 4.17 on our test system and it looks like it needs newer
version of libssl1.1. based on [2] we should install newer system which
will not work for us I'm afraid. Do you have any suggestion how to test
4.17 on 16.04.4 ?

root@os-node6:~# dpkg -i
linux-headers-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
linux-modules-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
Selecting previously unselected package linux-headers-4.17.0-041700-generic.
(Reading database ... 156683 files and directories currently installed.)
Preparing to unpack
linux-headers-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb ...
Unpacking linux-headers-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
Selecting previously unselected package
linux-image-unsigned-4.17.0-041700-generic.
Preparing to unpack
linux-image-unsigned-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb
...
Unpacking linux-image-unsigned-4.17.0-041700-generic
(4.17.0-041700.201806041953) ...
Selecting previously unselected package linux-modules-4.17.0-041700-generic.
Preparing to unpack
linux-modules-4.17.0-041700-generic_4.17.0-041700.201806041953_amd64.deb ...
Unpacking linux-modules-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
dpkg: dependency problems prevent configuration of
linux-headers-4.17.0-041700-generic:
 linux-headers-4.17.0-041700-generic depends on
linux-headers-4.17.0-041700; however:
  Package linux-headers-4.17.0-041700 is not installed.
 linux-headers-4.17.0-041700-generic depends on libssl1.1 (>= 1.1.0);
however:
  Package libssl1.1 is not installed.

dpkg: error processing package linux-headers-4.17.0-041700-generic
(--install):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of
linux-image-unsigned-4.17.0-041700-generic:
 linux-image-unsigned-4.17.0-041700-generic depends on linux-base (>=
4.5ubuntu1~16.04.1); however:
  Version of linux-base on system is 4.0ubuntu1.

dpkg: error processing package linux-image-unsigned-4.17.0-041700-generic
(--install):
 dependency problems - leaving unconfigured
Setting up linux-modules-4.17.0-041700-generic (4.17.0-041700.201806041953)
...
Errors were encountered while processing:
 linux-headers-4.17.0-041700-generic
 linux-image-unsigned-4.17.0-041700-generic

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
[2] https://wiki.ubuntu.com/Kernel/MainlineBuilds

On Tue, Jun 12, 2018 at 10:32 PM, Joseph Salisbury <
<email address hidden>> wrote:

> Did this issue start happening after an update/upgrade? Was there a
> prior kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v4.17 kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17
>
> ** Tags added: artful kernel-da-key
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => Medium
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1776159
>
> Title:
> mdadm raid soft lock-ups ubuntu kernel 4.13.0-36 Inbox x
>
> Status in linux package in Ubuntu:
> Incomplete
>
> Bug description:
> we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36(ubuntu
> package linux-image-generic-hwe-16.04).
> We have created raid10 using 22 960GB SSDs [1] . The problem we're
> experiencing is that /usr/share/mdadm/checkarray
> (executed by cron, included in a mdadm pkg) results in (soft?)
> deadlock - load on the node spikes up to 500-700 and all I/O operations
> are blocked for a period of time. We can see traces liek these [2] in
> our kernel log.
>
> e.g. it ends up in static state like
>
> test@os-node1:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4]
> dm-19[5] dm-17[3]
> dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18]
> dm-11[17]
> dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12]
> dm-5[11] dm-4[10] dm-3[1] dm-2[0]
> 10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22]
> [UUUUUUUUUUUUUUUUUUUUUU]
> [===>.................] check = 19.0% (1965748032/10313171968)
> finish=1034728.8min speed=134K/sec
> bitmap: 0/39 pages [0KB], 131072KB chunk
> unused devices: <none>
>
> and the only solution is to hard reboot the node. What we found out is
> that it
> doesn't happen on idle raid, we have to generate some significant load
> (10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the
> issue.
>
> Anyone ever experienced similar issues? Do you have any suggestions how
> to
> better trouble shoot this issue and maybe identify if disks or software
> layer
> is responsible for this behavior
>
> [1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
> [2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
> [3] https://github.com/axboe/fio
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1776159/+subscriptions
>

--

Regards.

Adam