mdadm segfault error 4 in libc-2.23.so

Bug #1617919 reported by Johan Ehnberg on 2016-08-29
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Low
Unassigned
Trusty
High
Dan Streetman
Xenial
High
Dan Streetman

Bug Description

[impact]

the mdadm cron jobs invoke mdadm to scan raid arrays periodically, but when inside a unpriviledged container mdadm does not have access to the arrays, and it segfaults when invoked. This is logged in the host system's logs, and while harmless, causes confusion about mdadm segfaults in the host logs.

[test case]

install a ubuntu system and create one or more raid/mdadm arrays. create a container, with either trusty or xenial inside the container. in the container, install mdadm. Run:

$ mdadm --monitor --scan --oneshot

that is the command run by mdadm's cronjob (though other variations on the command will also segfault). With the current mdadm code, mdadm will segfault. With the patched code, mdadm exits normally.

[regression potential]

this patch changes mdadm's code that processes each array's name; a bug in this area may cause mdadm to fail when performing any operation on arrays, but not during the operation, the failure would occur before mdadm opened the array.

[other info]

this commit fixing this is already upstream and included in zesty and later; this is required only for trusty and xenial.

[original description]

On Ubuntu 16.04.1 LTS Xenial, mdadm segfaults every day on two machines. Everything works as normal though, and the RAID arrays are not degraded.

[3712474.763430] mdadm[17665]: segfault at 0 ip 00007fd0369bed16 sp 00007fff8c5c9478 error 4 in libc-2.23.so[7fd036934000+1c0000]
[3712474.949633] mdadm[17727]: segfault at 0 ip 00007f2814111d16 sp 00007ffca92fe168 error 4 in libc-2.23.so[7f2814087000+1c0000]
[3798863.008741] mdadm[25359]: segfault at 0 ip 00007fa6af198d16 sp 00007ffc1b253e48 error 4 in libc-2.23.so[7fa6af10e000+1c0000]
[3798863.190382] mdadm[25393]: segfault at 0 ip 00007f72218a0d16 sp 00007ffef918f118 error 4 in libc-2.23.so[7f7221816000+1c0000]
[3885251.386711] mdadm[32081]: segfault at 0 ip 00007f3d99ca2d16 sp 00007ffe5e69a7a8 error 4 in libc-2.23.so[7f3d99c18000+1c0000]
[3885251.402337] mdadm[32083]: segfault at 0 ip 00007f770ccc1d16 sp 00007ffe16074378 error 4 in libc-2.23.so[7f770cc37000+1c0000]
[3971638.258574] mdadm[7936]: segfault at 0 ip 00007fcacddb3d16 sp 00007ffc062faff8 error 4 in libc-2.23.so[7fcacdd29000+1c0000]
[3971638.410750] mdadm[8053]: segfault at 0 ip 00007ff573757d16 sp 00007fffd3cca398 error 4 in libc-2.23.so[7ff5736cd000+1c0000]

The segfault message always appears twice in quick succession.

It seems to be triggered by /etc/cron.daily/mdadm which essentially runs
mdadm --monitor --scan --oneshot

As such, the frequency is around every 85000 seconds or 24 hours give or take, depending on when the cron job was executed.

It does not happen when running the command manually.

There is one similar bug #1576055 concerning libc and a few cases elsewhere but further digging into this has yet to reveal anything conclusive.

Note that these machines have almost exactly the same hardware (Xeon D-1518, 16GB ECC), so hardware design flaws cannot be ruled out. However, memory testing has not turned up any faults. That said, I know some segfaults can be difficult to find even when they are hardware issues.

affects: gvfs (Ubuntu) → mdadm (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
Johan Ehnberg (johan-ehnberg) wrote :

With the latest updates and a reboot and a few days of uptime, the message has not yet reappeared. This may have been fixed?
linux-image 4.4.0-36-generic
libc 2.23-0ubuntu3
mdadm 3.3-2ubuntu7.1

Andrew Martin (asmartin) wrote :

I'm still seeing this on 4.4.0-38-generic and the same versions of libc and mdadm that you listed in comment 2

claus (claus2) wrote :

This morning I had that message for the first time (server is up with this package setup for several days now):
Oct 28 08:25:01 apphost1 kernel: [331502.883078] mdadm[19974]: segfault at 0 ip 00007f99f55d4d16 sp 00007ffff5dbc7d8 error 4 in libc-2.23.so[7f99f554a000+1c0000]

Versions of packages are:
libc-bin 2.23-0ubuntu3
linux-image-generic 4.4.0.45.48
mdadm 3.3-2ubuntu7.1

claus (claus2) wrote :

Ok, I think I made some progress tracking this thing down:
On the host, where I experienced the bug, I have an LXD/LXC container running.
It turns out that mdadm is installed by default inside a container created by "lxc launch ubunut:16.04 c1"
And when I call /etc/cron.daily/mdadm inside the container I get the segfault which is also logged in the host system.
So I think there are two bugs here:
a) mdadm should not segfault
b) mdadm should not be in the container by default (makes no sense, right?) For this part I think a separate bug should be filed somewhere, but I have no clue where this should be put?

Johan Ehnberg (johan-ehnberg) wrote :

Yes this makes a lot of sense. The situation is the same for me, but I started automatically removing a list of packages from my lxd containers - including mdadm. That's why the problem disappeared.

Mdadm could still make sense in a container (privileged with raw access to devices etc.) but considering what containers are typically used for that should be quite rare. So at least as a workaround when mdadm is not needed, removing mdadm is an option.

So we have confirmed this. Is anyone on linux-raid? Lxd devs may also be interested.

claus (claus2) wrote :

I just got the following feedback from Stéphane Graber (LXC and LXD project leader):

"It's not a bug for it to be installed as the Ubuntu image we run is bit
for bit identical to what's run on the cloud or on physical servers and
that's on purpose.

mdadm is supposed to be configured in a way that it won't actually start
in a container, if it does, that's a bug.

I suspect the issue here is that the cronjob wasn't made aware that
mdadm doesn't run in a container and so attempts to run despite mdadm
being stopped and it not having any access to the devices it needs."

Dan Streetman (ddstreet) on 2017-10-09
Changed in mdadm (Ubuntu):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Low
status: Confirmed → In Progress
Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet) wrote :
Dan Streetman (ddstreet) wrote :

this is fixed upstream already by commit https://github.com/neilbrown/mdadm/commit/1e08717f0b7856b389e9d5eb2dc330d146636183

that commit is included in zesty and later, this is needed only for trusty and xenial

Dan Streetman (ddstreet) on 2017-10-09
description: updated
Dan Streetman (ddstreet) wrote :

verified the patch fixes this on xenial and trusty using test ppa from comment 8

The attachment "lp1617919-trusty.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Changed in mdadm (Ubuntu):
assignee: Dan Streetman (ddstreet) → nobody
status: In Progress → Fix Released
Changed in mdadm (Ubuntu Trusty):
status: New → Triaged
Changed in mdadm (Ubuntu Xenial):
status: New → Triaged
Changed in mdadm (Ubuntu Trusty):
importance: Undecided → High
Changed in mdadm (Ubuntu Xenial):
importance: Undecided → High
Changed in mdadm (Ubuntu Trusty):
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in mdadm (Ubuntu Xenial):
assignee: nobody → Dimitri John Ledkov (xnox)
Dan Streetman (ddstreet) on 2017-10-23
tags: added: sts-sponsor-ddstreet
Dan Streetman (ddstreet) on 2017-11-03
Changed in mdadm (Ubuntu Trusty):
assignee: Dimitri John Ledkov (xnox) → Dan Streetman (ddstreet)
Changed in mdadm (Ubuntu Xenial):
assignee: Dimitri John Ledkov (xnox) → Dan Streetman (ddstreet)
Dan Streetman (ddstreet) on 2017-11-03
Changed in mdadm (Ubuntu Trusty):
status: Triaged → In Progress
Changed in mdadm (Ubuntu Xenial):
status: Triaged → In Progress
tags: added: sts-sponsor-ddstreet-done
removed: sts-sponsor-ddstreet

Hello Johan, or anyone else affected,

Accepted mdadm into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/3.3-2ubuntu7.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in mdadm (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Changed in mdadm (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed-trusty
Chris J Arges (arges) wrote :

Hello Johan, or anyone else affected,

Accepted mdadm into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/3.2.5-5ubuntu4.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Dan Streetman (ddstreet) wrote :

From inside xenial container, with raid array in host:

ubuntu@mdadm:~$ dpkg -l | grep mdadm
ii mdadm 3.3-2ubuntu7.5 amd64 tool to administer Linux MD arrays (software RAID)
ubuntu@mdadm:~$ mdadm --monitor --scan -1
Segmentation fault (core dumped)

with -proposed mdadm:

ubuntu@mdadm:~$ dpkg -l | grep mdadm
ii mdadm 3.3-2ubuntu7.6 amd64 tool to administer Linux MD arrays (software RAID)
ubuntu@mdadm:~$ mdadm --monitor --scan -1
ubuntu@mdadm:~$

in trusty container:

ubuntu@mdadm-trusty:~$ dpkg -l | grep mdadm
ii mdadm 3.2.5-5ubuntu4.3 amd64 tool to administer Linux MD arrays (software RAID)
ubuntu@mdadm-trusty:~$ mdadm --monitor --scan -1
Segmentation fault (core dumped)

with -proposed mdadm:

ubuntu@mdadm-trusty:~$ dpkg -l | grep mdadm
ii mdadm 3.2.5-5ubuntu4.4 amd64 tool to administer Linux MD arrays (software RAID)
ubuntu@mdadm-trusty:~$ mdadm --monitor --scan -1
ubuntu@mdadm-trusty:~$

tags: added: verification-done verification-done-trusty verification-done-xenial
removed: verification-needed verification-needed-trusty verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 3.2.5-5ubuntu4.4

---------------
mdadm (3.2.5-5ubuntu4.4) trusty; urgency=medium

  * Prevent segfault when get_md_name() returns NULL
    This fixes mdadm segfaults when running inside a container.
    (LP: #1617919)

 -- Dan Streetman <email address hidden> Mon, 09 Oct 2017 10:06:22 -0400

Changed in mdadm (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for mdadm has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 3.3-2ubuntu7.6

---------------
mdadm (3.3-2ubuntu7.6) xenial; urgency=medium

  * Prevent segfault when get_md_name() returns NULL
    This fixes mdadm segfaults when running inside a container.
    (LP: #1617919)

 -- Dan Streetman <email address hidden> Mon, 09 Oct 2017 10:06:22 -0400

Changed in mdadm (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers