Comment 0 for bug 1617919

Johan Ehnberg (johan-ehnberg) wrote :

On Ubuntu 16.04.1 LTS Xenial, mdadm segfaults every day on two machines. Everything works as normal though, and the RAID arrays are not degraded.

[3712474.763430] mdadm[17665]: segfault at 0 ip 00007fd0369bed16 sp 00007fff8c5c9478 error 4 in[7fd036934000+1c0000]
[3712474.949633] mdadm[17727]: segfault at 0 ip 00007f2814111d16 sp 00007ffca92fe168 error 4 in[7f2814087000+1c0000]
[3798863.008741] mdadm[25359]: segfault at 0 ip 00007fa6af198d16 sp 00007ffc1b253e48 error 4 in[7fa6af10e000+1c0000]
[3798863.190382] mdadm[25393]: segfault at 0 ip 00007f72218a0d16 sp 00007ffef918f118 error 4 in[7f7221816000+1c0000]
[3885251.386711] mdadm[32081]: segfault at 0 ip 00007f3d99ca2d16 sp 00007ffe5e69a7a8 error 4 in[7f3d99c18000+1c0000]
[3885251.402337] mdadm[32083]: segfault at 0 ip 00007f770ccc1d16 sp 00007ffe16074378 error 4 in[7f770cc37000+1c0000]
[3971638.258574] mdadm[7936]: segfault at 0 ip 00007fcacddb3d16 sp 00007ffc062faff8 error 4 in[7fcacdd29000+1c0000]
[3971638.410750] mdadm[8053]: segfault at 0 ip 00007ff573757d16 sp 00007fffd3cca398 error 4 in[7ff5736cd000+1c0000]

The segfault message always appears twice in quick succession.

It seems to be triggered by /etc/cron.daily/mdadm which essentially runs
mdadm --monitor --scan --oneshot

As such, the frequency is around every 85000 seconds or 24 hours give or take, depending on when the cron job was executed.

It does not happen when running the command manually.

There is one similar bug #1576055 concerning libc and a few cases elsewhere but further digging into this has yet to reveal anything conclusive.

Note that these machines have almost exactly the same hardware (Xeon D-1518, 16GB ECC), so hardware design flaws cannot be ruled out. However, memory testing has not turned up any faults. That said, I know some segfaults can be difficult to find even when they are hardware issues.