cloud-init can add users in wrong filesystem (race with `mount /home`)

Bug #1961620 reported by Paride Legovini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Expired
High
Unassigned
subiquity
New
Undecided
Unassigned

Bug Description

When cloud-init is used to configure a new Ubuntu Server system installed from the ISO images, and /home is configured as a separate partition, there is a (slow) race between the user creation and /home being mounted. This can lead to the user $HOME being created in the wrong filesystem.

Steps to reproduce:

1. Prepare to install focal-live-server-amd64.iso in a VM.
   In my case I used one of the 20.04.4 dailies.

2. Proceed with all-defaults but for storage. Configure the storage
   so / is in a dedicated partition, while /home in a an *encrypted*
   LVM volume. (The only purpose of encryption is to add delay in the
   /home mount, see the next point.)

3. Finish the install and reboot. At the dm-crypt password prompt
   stop and wait a few minutes. At some point cloud-init will proceed
   creating the configured username, but /home is not mounted yet!
   The user's $HOME is now in the same filesystem as /.

4. Enter the dm-crypt password. This will cause /home to be mounted
   from the encrypted volume, and this will shadow the actual $HOME.

5. Login with the configured credentials and verify that $HOME is
   inaccessible.

Tags: iso-testing
Revision history for this message
Paride Legovini (paride) wrote :
Revision history for this message
Paride Legovini (paride) wrote :

Not a regression: reproduces with ubuntu-20.04.3-live-server-amd64.iso.

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1961620

tags: added: iso-testing
Chad Smith (chad.smith)
Changed in cloud-init:
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
Chad Smith (chad.smith) wrote :

Spent a bit of time on this bug yesterday, the case is fairly unique in that the live server installer creates the partition layouts outside of cloud-init so cloud-init is unaware of the status of encrypted device status and failure to mount the /home dir. That said, cloud-init should probably minimally grow awareness of where there is a mount point /home defined for operations that are attempted to setup users in /home/* or import ssh keys etc. Additionally, cloud-init should probably be more aware of the status of any mount units defined by `systemd-fstab-generator`
certainly if those mount points are targets of cloud-init configuration changes.

Specifically if we are dealing with ssh key imports, user & group setup etc, cloud-init needs to be wary of of a failed /home/ or /home/* mount to avoid laying down configuration that would ultimately "disappear" of that failed mount get corrected by delayed user input.

Revision history for this message
Chad Smith (chad.smith) wrote :

I can confirm this bug and a 2 minute timeout from cloud-init across reboot when the encrypted partition password isn't provided during boot. The result with a lack of such a mount triggers cloud-init to configure users in the /home directory of the base image so that if /home gets mounted, no passwords/keys are imported for the user.

This same race exists on every boot with an encrypted partition regardless of whether cloud-init initially got into the right encrypted partition and setup the user credentials/imported keys etc if the user doesn't input their password each boot.

I think ultimately cloud-init could add a warning that expected mounts to /home don't appear to be present, but I don't see how cloud-init would need to wait indefinitely on a mount point that may never show up due to lack of user-input.

I can confirm as in additional testing that there may be an alternative to subiquity triggering cloud-init boot stages before reboot (when the encrypted partition is originally mounted within the chroot /target by dropping to shell that server Live installer running cloud-init boot stages with:
 cloud-init init --local, cloud-init init, cloud-init modules --mode=config, cloud-init --mode=final.

that at least will guarantee during initial install that cloud-init lays down the right data in the mounted /home dir. If the person isn't around post reboot to re-enter their encrypted password the mount of /home will still fail, but it is recoverable whenever the proper password is entered.

Changed in cloud-init:
importance: Undecided → Medium
Revision history for this message
Paride Legovini (paride) wrote :

Thanks for looking into this. I thought about pivoting into /target too, that should save us from worrying about mounts at all, but requires changes in how subiquity operates, and in general moves a bit away from the idea that installs done from ISO should be treatable like cloud instances, which is what allows us to use cloud-init on bare metal after all. It a good guiding principle, as it helps convergence between bare metal server systems and cloud instances.

On "not making cloud-init wait forever": I see your point, however in the case of subiquity we're speaking of a freshly installed system, which can't be in production. Between blocking boot and booting but misconfiguring the system, I'm not sure the latter is better.

Revision history for this message
Paride Legovini (paride) wrote :

I think this is worth discussing with the subiquity devs, so I added a subiquity task.

Revision history for this message
Paride Legovini (paride) wrote :

I did a very quick

  chroot /target
  cloud-init init

by jumping to a shell at the end of a subiquity install, right before rebooting, without mounting any of the tmpfs filesystem, and it seems to have worked.

Revision history for this message
Chad Smith (chad.smith) wrote :

I chatted with subiquity folks today. If subiquity is passing cloud-config user-data straight into cloud-init in all cases for the next boot, there could be problems with custom bootcmd, runcmd and or snap configuration issues that may not like working in chroots. Performing all cloud-init initialization during ephemeral boot and a pivot may not make sense for all cases, but it minimally makes sense for cloud-init to break if it has awareness of a significant mount that is missing.

James Falcon (falcojr)
Changed in cloud-init:
importance: Medium → High
Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.