Cryptswap periodically fails to mount at boot due to missing a udev notification
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| systemd |
New
|
Unknown
|
||
| systemd (Ubuntu) |
High
|
Unassigned | ||
| Bionic |
Undecided
|
Unassigned | ||
| Focal |
High
|
Dan Streetman | ||
| Groovy |
High
|
Unassigned |
Bug Description
[impact]
systems using cryptsetup-based encrypted swap may hang during boot due to udevd missing the notification that swap has been setup on the newly created swap device.
[test case]
see original description, and reproduction is intermittent based on timing
[regression potential]
any regression would likely occur during, or after, boot when creating an encrypted swap device and/or while waiting to activate the new swap device. Regressions may cause failure to correctly enable swap and/or hung boot waiting for the swap device.
[scope]
this was (potentially) fixed upstream with PR 15836, which is not yet included in any upstream release, so this is needed in all releases, including groovy.
also note while the upstream bug is closed, and code review seems to indicate this *should* fix this specific issue, there are some comments in the upstream bug indicating it may not completely solve the problem, although there is no further debug of the new reports.
[original description]
On some systems, cryptsetup-based encrypted swap partitions cause systemd to get stuck at boot. This is a timing-sensitive Heisenbug, so the rate of occurrence varies from one system to another. Some hardware will not experience the issue at all, others will only occasionally experience the issue, and then there are the unlucky who are unable to boot at all, no matter how many times they restart.
The workaround is for the cryptsetup-
Related branches
- Balint Reczey: Approve on 2020-07-12
-
Diff: 330 lines (+278/-0)8 files modifieddebian/changelog (+15/-0)
debian/patches/lp1838329/0001-blockdev-propagate-one-more-unexpected-error.patch (+23/-0)
debian/patches/lp1838329/0003-dissect-use-log_debug_errno-where-appropriate.patch (+33/-0)
debian/patches/lp1838329/0004-blockdev-add-helper-for-locking-whole-block-device.patch (+60/-0)
debian/patches/lp1838329/0005-makefs-lock-device-while-we-operate.patch (+57/-0)
debian/patches/lp1838329/0006-makefs-normalize-logging-a-bit.patch (+39/-0)
debian/patches/lp1838329/0007-cryptsetup-generator-use-systemd-makefs-for-implemen.patch (+45/-0)
debian/patches/series (+6/-0)
Michael Aaron Murphy (mmstick76) wrote : | #1 |
Dan Streetman (ddstreet) wrote : | #3 |
> This patch has already been submitted upstream to systemd
What is the upstream systemd issue number?
Dan Streetman (ddstreet) wrote : | #4 |
Looks like this still needs to be worked out upstream.
Michael Aaron Murphy (mmstick76) wrote : | #5 |
However, it's unknown when the issue is going to be fixed. So for now I'm carrying it in Pop!_OS 18.04, 19.04, and 19.10 at the moment.
Changed in systemd: | |
status: | Unknown → New |
tags: | added: ddstreet |
Sebastien Bacher (seb128) wrote : | #6 |
Dan, the workaround seems safe and fix a real issue than some users are hitting, maybe it would make sense to distro patch include it?
Changed in systemd (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: rls-ff-incoming |
Dan Streetman (ddstreet) wrote : | #7 |
@mmstick76, while I agree with the grumblings upstream that udev should be better about race conditions like this, if we're working around it I'd prefer to flock while mkswap instead of retriggering udev...can you test with that change?
Patch below, and I have test builds for b/f here:
https:/
--- a/src/cryptsetu
+++ b/src/cryptsetu
@@ -202,8 +202,8 @@ static int create_disk(
if (swap)
- "ExecStartPost=
- name_escaped);
+ "ExecStartPost=
+ name_escaped, name_escaped);
r = fflush_
if (r < 0)
Changed in systemd (Ubuntu Focal): | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → ubuntu-20.04.1 |
tags: | removed: rls-ff-incoming |
tags: | added: id-5eb44cf735b12c4b9b721452 |
Changed in systemd (Ubuntu Focal): | |
assignee: | nobody → Dan Streetman (ddstreet) |
status: | Triaged → In Progress |
Changed in systemd (Ubuntu): | |
status: | Triaged → Fix Released |
Dan Streetman (ddstreet) wrote : | #8 |
This was (possibly) fixed upstream in a similar way to comment 7:
https:/
essentially instead of calling mkswap inside flock, it calls systemd-makefs swap, which itself flocks the block device.
description: | updated |
tags: | removed: bionic ddstreet disco eoan |
Changed in systemd (Ubuntu Groovy): | |
status: | Fix Released → In Progress |
status: | In Progress → New |
Robie Basak (racb) wrote : | #9 |
@ddstreet
Does the Groovy bug task status need fixing?
I see that in your Focal upload you have a series of seven patches picked from upstream to fix this properly. But if your trivial patch in comment 7 works, wouldn't that be better for an SRU? Reference this paragraph of SRU policy:
"In line with this, the requirements for stable updates are not necessarily the same as those in the development release. When preparing future releases, one of our goals is to construct the most elegant and maintainable system possible, and this often involves fundamental improvements to the system's architecture, rearranging packages to avoid bundled copies of other software so that we only have to maintain it in one place, and so on. However, once we have completed a release, the priority is normally to minimise risk caused by changes not explicitly required to fix qualifying bugs, and this tends to be well-correlated with minimising the size of those changes. As such, the same bug may need to be fixed in different ways in stable and development releases."
How do you think this applies to this case?
Dan Streetman (ddstreet) wrote : | #10 |
> Does the Groovy bug task status need fixing?
The g MR is open and linked in this bug; I prefer to leave that up to @rbalint for the devel release, I'm not sure if he wants to take the patches or do a merge of newer systemd later.
> But if your trivial patch in comment 7 works, wouldn't that be better for an SRU?
I'm not convinced it would work (it needs to lock the parent device if the target is a partition), and it wouldn't help with cryptsetup other than swap. Hence the proper, complete, upstream patch series is required.
Hello Michael, or anyone else affected,
Accepted systemd into focal-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in systemd (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed verification-needed-focal |
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.2) | #12 |
All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.2) for focal have finished running.
The following regressions have been reported in tests triggered by the package:
apt/unknown (armhf)
indicator-
dovecot/
postgresql-
mir/unknown (armhf)
systemd/
umockdev/unknown (armhf)
policykit-1/unknown (armhf)
asterisk/unknown (armhf)
anbox/unknown (armhf)
php7.4/unknown (armhf)
ksystemlog/unknown (armhf)
polkit-qt-1/unknown (armhf)
Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUp
https:/
[1] https:/
Thank you!
Launchpad Janitor (janitor) wrote : | #13 |
This bug was fixed in the package systemd - 245.6-3ubuntu3
---------------
systemd (245.6-3ubuntu3) groovy; urgency=medium
* Rebuild against libselinux 3.0
systemd (245.6-3ubuntu2) groovy; urgency=medium
* basic/cap-list: Print unknown capabilities in hexadecimal.
This fixes autopkgtest running on 5.8 kernels
(when systemd was built on an earlier one) (LP: #1885755)
File: debian/
https:/
systemd (245.6-3ubuntu1) groovy; urgency=medium
* Merge to Ubuntu from Debian unstable
- Dropped changes:
* Enable EFI/bootctl on armhf.
systemd (245.6-3) unstable; urgency=medium
[ Dan Streetman ]
* d/t/upstream: capture new merged 'system.journal' from tests.
https:/
* d/t/upstream: use --directory or --file param for journalctl.
Properly tell journalctl if the journal to parse is a dir or file.
* d/t/storage: check for ext2 or ext4 fs when using crypttab 'tmp' option.
https:/
[ Martin Pitt ]
* debian/
Unconditionally back up/restore locale configuration files and generate
en_US.UTF-8. Previously the test failed in environments which have some
locale other than en_US.UTF-8 in /etc/default/
Also fix the assertion of /etc/locale.conf not being present after
localectl. This only applies to Debian/Ubuntu tests, not upstream ones.
[ Dimitri John Ledkov ]
* Enable EFI/bootctl on armhf.
systemd (245.6-2ubuntu2) groovy; urgency=medium
[ Balint Reczey ]
* debian/
File: debian/
https:/
[ Dimitri John Ledkov ]
* ubuntu: enable CET on amd64.
File: debian/rules
https:/
[ Dan Streetman ]
* Lock swap blockdevice while calling mkswap (LP: #1838329)
Files:
- debian/
- debian/
- debian/
- debian/
- debian/
- debian/
https:/
systemd (245.6-2ubuntu1) groovy; urgency=medium
* Merge to Ubuntu from Debian unstable
- Dropped changes:
* dhclient-
* hwdb: Mask rfkill event from intel-hid on HP platforms (LP: #1883846...
Changed in systemd (Ubuntu Groovy): | |
status: | New → Fix Released |
Dan Streetman (ddstreet) wrote : | #14 |
I wasn't able to reproduce this myself, due to the failure being dependent on timing, but I set up the reproducer from the upstream bug and rebooted several times with the proposed package, and had no problems/
tags: |
added: verification-done verification-done-focal removed: verification-needed verification-needed-focal |
Launchpad Janitor (janitor) wrote : | #16 |
This bug was fixed in the package systemd - 245.4-4ubuntu3.2
---------------
systemd (245.4-4ubuntu3.2) focal; urgency=medium
[ Dan Streetman ]
* Hotadd only offline memory and CPUs (LP: #1876018)
File: debian/
https:/
* Lock swap blockdevice while calling mkswap (LP: #1838329)
Files:
- d/p/lp1838329/
- d/p/lp1838329/
- d/p/lp1838329/
- d/p/lp1838329/
- d/p/lp1838329/
- d/p/lp1838329/
- d/p/lp1838329/
https:/
[ Balint Reczey ]
* debian/
(LP: #1880541)
File: debian/
https:/
* d/p/hwdb-
hwdb: Mask rfkill event from intel-hid on HP platforms
(LP: #1883846)
https:/
* journald: stream pid change newline fix (LP: #1875708)
Files:
- debian/
- debian/
- debian/
- debian/
- debian/
- debian/
- debian/
- debian/
https:/
-- Dan Streetman <email address hidden> Mon, 06 Jul 2020 17:38:31 -0400
Changed in systemd (Ubuntu Focal): | |
status: | Fix Committed → Fix Released |
The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
The attachment "The workaround for this issue" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]