System can not shutdown if system has multiple VROC RAID arrays
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OEM Priority Project |
Fix Released
|
Critical
|
Cyrus Lien | ||
systemd (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Jammy |
Fix Released
|
Medium
|
Unassigned | ||
Kinetic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[ Impact ]
The system can not shutdown if the system has multiple VROC RAID arrays.
Intel has fixed it in systemd v251 [1].
Need to cherry-pick the commit to ubuntu-jammy systemd 249.11-0ubuntu3.9.
[1] The commit fixes the issue:
commit 3a3b022d2cc1128
Author: Mariusz Tkaczyk <email address hidden>
Date: Tue Mar 29 12:49:54 2022 +0200
shutdown: get only active md arrays.
Current md_list_get() implementation filters all block devices, started from
"md*". This is ambiguous because list could contain:
- partitions created upon md device (mdXpY)
- external metadata container- specific type of md array.
For partitions there is no issue, because they aren't handle STOP_ARRAY
ioctl sent later. It generates misleading errors only.
Second case is more problematic because containers are not locked in kernel.
They are stopped even if container member array is active. For that reason
reboot or shutdown flow could be blocked because metadata manager cannot be
restarted after switch root on shutdown.
Add filters to remove partitions and containers from md_list. Partitions
can be excluded by DEVTYPE. Containers are determined by MD_LEVEL
property, we are excluding all with "container" value.
Signed-off-by: Mariusz Tkaczyk <email address hidden>
In the journal, we can see systemd-shutdown looping repeatedly as it tries and fails to detach all md devices:
...
[ 513.416293] systemd-
[ 513.422953] systemd-
[ 513.431227] systemd-
[ 513.437952] systemd-
[ 513.449298] systemd-
[ 513.456278] systemd-
[ 513.465323] systemd-
[ 513.472564] systemd-
[ 513.485302] systemd-
[ 513.496195] systemd-
[ 513.502176] systemd-
[ 513.513382] systemd-
[ 513.521436] systemd-
[ 513.534810] systemd-
[ 513.545384] systemd-
[ 513.557265] md: md126 stopped.
[ 513.561451] systemd-
[ 513.576673] systemd-
[ 513.589274] systemd-
[ 513.597976] systemd-
[ 513.607263] systemd-
[ 513.615067] systemd-
[ 513.625157] systemd-
[ 513.632209] systemd-
[ 513.641474] systemd-
[ 513.653660] systemd-
[ 513.661257] systemd-
[ 513.668833] systemd-
[ 513.677347] systemd-
[ 513.687047] systemd-
[ 513.697206] systemd-
[ 513.707193] md: md126 stopped.
...
[ Test Plan ]
1. Build two VROC RAID. One RAID 0 for System volume, another RAID 10 for Data volume.
2. Install system on System volume.
3. Update systemd.
4. Reboot the system.
5. Verify if the system can reboot.
[ Where problems could occur ]
The patch confirmed fixed the reboot issue on the system with two VROC RAIDs but more than two VROC RAIDs and the combinations of RAID levels are not all tested. The patch itself adds logic to skip partitions and containers from the list of md devices to try and stop. Therefore any regressions would also be related to stopping md devices in systemd-shutdown.
[ Scope ]
Jammy
Related branches
- Lukas Märdian: Approve
-
Diff: 796 lines (+623/-45)16 files modifieddebian/patches/lp1977630-fix_machinectl_pull_tar.patch (+81/-0)
debian/patches/lp1978079-efi-pstore-not-cleared-on-boot.patch (+5/-4)
debian/patches/lp1991829-add-CAP_LINUX_IMMUTABLE-to-systemd-machined-so-it-ca.patch (+29/-0)
debian/patches/lp1999275/binfmt-check-if-binfmt-is-mounted-before-applying-rules.patch (+80/-0)
debian/patches/lp1999275/binfmt-util-also-check-if-binfmt-is-mounted-in-read-write.patch (+41/-0)
debian/patches/lp1999275/binfmt-util-split-out-binfmt_mounted.patch (+69/-0)
debian/patches/lp1999275/unit-check-more-specific-path-to-be-written-by-systemd-bi.patch (+26/-0)
debian/patches/lp2009743/network-dhcp4-do-not-ignore-the-gateway-even-if-the-desti.patch (+59/-0)
debian/patches/lp2009743/test-network-add-one-more-testcase-for-DHCPv4-classless-r.patch (+33/-0)
debian/patches/lp2013543-core-reorder-systemd-arguments-on-reexec.patch (+58/-0)
debian/patches/lp2025563-shutdown-get-only-active-md-arrays.patch (+67/-0)
debian/patches/lp2028180-udev-rules-fix-nvme-symlink-creation-on-namespace-changes.patch (+47/-0)
debian/patches/series (+11/-1)
debian/systemd.postinst (+16/-1)
debian/tests/tests-in-lxd (+1/-1)
dev/null (+0/-38)
affects: | systemd (Ubuntu) → oem-priority |
Changed in oem-priority: | |
assignee: | nobody → Cyrus Lien (cyruslien) |
importance: | Undecided → Critical |
status: | New → Confirmed |
affects: | oem-priority → systemd (Ubuntu) |
Changed in systemd (Ubuntu): | |
assignee: | Cyrus Lien (cyruslien) → nobody |
Changed in oem-priority: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
assignee: | nobody → Cyrus Lien (cyruslien) |
tags: | added: originate-from-2025253 |
tags: | added: oem-priority |
description: | updated |
Changed in oem-priority: | |
status: | Invalid → In Progress |
tags: | added: foundations-todo |
description: | updated |
Changed in systemd (Ubuntu Jammy): | |
status: | Incomplete → Triaged |
Changed in systemd (Ubuntu Jammy): | |
status: | Triaged → In Progress |
Changed in oem-priority: | |
status: | In Progress → Fix Released |
Changed in oem-priority: | |
status: | Fix Released → Confirmed |
Changed in oem-priority: | |
status: | Confirmed → Fix Released |
tags: | removed: foundations-todo |
Hi Cyrus, thanks for the patch!
As per the SRU guidelines in https:/ /wiki.ubuntu. com/StableRelea seUpdates# SRU_Bug_ Template, regarding the "[ Where problems could occur ]" section,
> * This must '''never''' be "None" or "Low", or entirely an argument as to why
your upload is low risk.
Describing where problems could occur and how they would manifest is a nice way to show to the SRU team that we did consider possible unwanted outcomes for the SRU and that we'd be ready to address them in case they do occur. It also helps people testing the SRU, guiding them on what to look for when checking for regressions.