Comment 36 for bug 1755863

Revision history for this message
Brian Nelson (bhnelson) wrote :

So I've found a complete work-around for this. I also found that this issue is NOT new in 18.04 as it also affects 16.x (and likely 15 and 17 too). However it is DIFFERENT in 18.04. More details below.

TL;DR:
You need to netboot with an initramfs that doesn't have 'scripts/casper-bottom/25disable_cdrom.mount' in it. This script masks the dynamically-generated cdrom.mount systemd unit (where the NFS mount goes). That causes all the issues described in this bug.

From whatever machine where netboot initramfs is created:

# Disable/block the problem script
mkdir -p /etc/initramfs-tools/scripts/casper-bottom
touch /etc/initramfs-tools/scripts/casper-bottom/25disable_cdrom.mount

# rebuild initramfs
update-initramfs -u

# Move/copy the new file to the netboot server

The issue here is that systemd isn't able to update its mount status properly. In the case of 18.04, all of the 'failed' mounts are actually successfully mounted. This includes /tmp. BUT systemd doesn't recognize that fact and marks them all as red/failed.

In 16.04 this issue is a bit different. When booting, all of the same mounts are again mounted successfully AND systemd shows them all as green/active. BUT if you try to stop/unmount any of them you will see a similar situation. The unmount will actually succeed, but systemd will report an unmount failure and continue to show the unit as green/active.

Per the call trace thh noted in comment #21:
From what I can tell, mount_load_proc_self_mountinfo iterates through every active mount on the system (some perhaps more than once). When it gets to the nfs-mount on /cdrom, it does fail in unit_set_slice and generate the "Failed to set up mount unit: Device or resource busy" error. For whatever reason, that failure seems to completely bork systemd's ability to update its mount status. Thus mounts get 'stuck' either mounted or not from systemd's perspective.

The failure seems to be caused by the fact that the cdrom.mount unit (NFS mount) is masked. Once it's unmasked the failure doesn't occur and all mounts work as expected. You can actually observe this from within a 'broken' boot at the emergency prompt:
rm /lib/systemd/system/cdrom.mount
systemctl daemon-reload
umount /tmp (ensure it's gone, there may be multiple mounts)
systemctl reset-failed tmp.mount
systemctl start tmp.mount
..and it will succeed

I did verify this issue by actually booting from a 'real' DVD and the problem doesn't happen there. It's something specific to having the image mounted over NFS and masking it's unit.

For reference, the disable_cdrom.mount script was the solution for this bug
https://bugs.launchpad.net/ubuntu/+source/casper/+bug/1436715