rootfs on Intel Matrix Raid hangs or shuts down not clean

Bug #1722491 reported by Dimitri John Ledkov on 2017-10-10
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned

Bug Description

[Impact]

 * In xenial, mdadm-waitclean is an init.d script, and it appears that it doesn't not run late enough. In later releases a systemd-shutdown script is shipped instead. Which results in wait-clean action executed later in the shutdown cycle, having more chances to complete the shutdown with clean/synced RAID array state.

 * In xenial, there is no default shutdown initramfs, and therefore mdmon processes are killed before the Intel Matrix RAID / DFF external metadata RAID arrays are stopped. This leads to a full resync upon next assembly. One options is to implement a shutdown initramfs. This is now integrated in he package a mdadm-shutdown.service, optional command.

 * In xenial, systemd does not support sendsigs.omit.d, and therefore this results in mdmon process being killed prematurely. Thus keep mdmon processes started in the initramfs running - such that systemd-shutdown can unmount dm held nodes. Ideally systemd on xenial should honor sendsigs.omit.d pid-files, or mdmon processes should be migrated to be udev rules activated - but such a change, imho, is too risky for an sru.

[Bugfix]
* Backport mdadm.shutdown systemd-shutdown script to xenial
* Backport mdadm-shutdown.service job

[Test Case]

 * Switch logging to console, and make it verbose LogLevel=debug LogTarget=console. Perform shutdown and observe that mdadm.shutdown is executed during shutdown.
 * Check that the system boots with a clean raid-array state.

 * Install system with root on Intel Matrix or DDF raid
 * Install dracut-core and activate mdadm-shutdown.service
 * Reboot
 * System should reboot cleanly, with Intel Matrix raid array synced

[Regression Potential]

 * On systems that have rootfs on the Intel Matrix / DDF raid (external metadata mdadm raid, i.e. NOT the generic linux raid) initramfs will result in being held up through the lifetime of the boot thus using more steady state RAM. This only affects systems that use Intel Matrix / DDF controllers, and are typically bare-metal servers.

 * additional wait-clean shutdown script is quick but has an impact on shutdown.target speed / time to shutdown or reboot.

 *

[Other Info]

 * Later releases do not use sysv-init script thus this not a direct backport of code from later releases

Roughly corresponds to https://tracker.debian.org/news/878208 & parts of https://browse.dgit.debian.org/mdadm.git/commit/?id=61c54b388ce54b8129d039aa6d422aaca0dd0e77 specifically shipment of the mdadm.shutdown script

description: updated
tags: added: id-59dba43d6e5b35acb9bbee0b
description: updated
summary: - improve mdadm imsm integration with systemd
+ rootfs on Intel Matrix Raid hangs or shuts down not clean
Dimitri John Ledkov (xnox) wrote :

zesty and up have full systmed units & shutdown script integration and are only missing mdadm-shutdown.service, which is not a critical component of the complete fix (not enabled by default). mdadm-shutdown.service is in unstable, to be synced into artful and/or sru'ed into artful & zesty.

Brian Murray (brian-murray) wrote :

I'd like to see mdadm-shutdown.service make it into Artful before accepting this.

Dimitri John Ledkov (xnox) wrote :

This is in artful since 2017-10-11.

mdadm (4.0-2) unstable; urgency=medium

  * Ship mdadm-shutdown.service and suggest dracut-core. Users of systemd
    with rootfs on Intel Matrix Raid and DDF external metadata-raid arrays
    that require mdmon monitoring, may wish to install dracut-core package
    and enable mdadm-shutdown.service. This will create a shutdown
    initramfs, that systemd-shutdown can pivot to. This may result in an
    improved shutdown behaviour with less hangs and synced raid
    arrays. The generated initramfs will takeover mdmon monitoring, wait
    for the arrays to be clean before stopping them and unmounting
    everything and finally executing requested shutdown command.
  * Bump standards version, no changes required.
  * Bump debhelper to 9.
  * Add second email address to avoid NMU warnings.

 -- Dimitri John Ledkov <email address hidden> Tue, 10 Oct 2017 11:55:52 +0100

Changed in mdadm (Ubuntu):
status: New → Fix Released
Changed in mdadm (Ubuntu Xenial):
status: New → In Progress
Changed in mdadm (Ubuntu Zesty):
status: New → In Progress

Hello Dimitri, or anyone else affected,

Accepted mdadm into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/3.4-4ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in mdadm (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-zesty
Changed in mdadm (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Brian Murray (brian-murray) wrote :

Hello Dimitri, or anyone else affected,

Accepted mdadm into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/3.3-2ubuntu7.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Dimitri John Ledkov (xnox) wrote :

Upgraded to mdadm 3.3-2ubuntu7.5.
Installed dracut-core.
Enabled mdadm-shutdown.
Force shutdown the server, and boot it using the new mdadm, waited for drives to resync.

Rebooted the server, reboot did not hang, and it come up with a clean raid array state in about 3 minutes time, which is normal for the size of said server.

Unclean shutdown, before upgrade would hang for more than 10 minutes and would require a hard reset.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Dimitri John Ledkov (xnox) wrote :

Verified again with zesty and 3.4-4ubuntu0.1, all is good and reboots result in raid array coming up in sync in about 3 minutes time.

Also have another verification on xenial at https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1608495 that non-rootfs raid arrays are fixed too.

tags: added: verification-done verification-done-zesty
removed: verification-needed verification-needed-zesty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 3.4-4ubuntu0.1

---------------
mdadm (3.4-4ubuntu0.1) zesty; urgency=medium

  * Ship mdadm-shutdown.service and suggest dracut-core. Users of systemd
    with rootfs on Intel Matrix Raid and DDF external metadata-raid arrays
    that require mdmon monitoring, may wish to install dracut-core package
    and enable mdadm-shutdown.service. This will create a shutdown
    initramfs, that systemd-shutdown can pivot to. This may result in an
    improved shutdown behaviour with less hangs and synced raid
    arrays. The generated initramfs will takeover mdmon monitoring, wait
    for the arrays to be clean before stopping them and unmounting
    everything and finally executing requested shutdown command.
    LP: #1722491

 -- Dimitri John Ledkov <email address hidden> Fri, 13 Oct 2017 18:31:49 +0100

Changed in mdadm (Ubuntu Zesty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for mdadm has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 3.3-2ubuntu7.5

---------------
mdadm (3.3-2ubuntu7.5) xenial; urgency=medium

  * Add systemd shutdown script which waits for arrays to be clean during
    shutdown phase.
  * Do not take over initramfs mdmon services, and continue running them
    off initrd to avoid killing mdmon processes before systemd attempts dm
    detach and hang. As systemd in xenial does not appear to honor
    sendsigs.omit.d.
  * Add mdadm-shutdown.service which is recommended to be used on systems
    with Intel Matrix / DDF raid. It creates a shutdown initramfs which
    systemd-shutdown may pivot to, and complete clean external metadata
    raid array shutdown.
  * LP: #1722491

 -- Dimitri John Ledkov <email address hidden> Mon, 09 Oct 2017 14:56:09 +0100

Changed in mdadm (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers