Activity log for bug #1860046

Date Who What changed Old value New value Message
2020-01-16 22:04:18 Dave Jones bug added bug
2020-01-17 13:43:54 Francis Ginther tags id-5db0502bcb3f2112497e1b92
2020-01-21 14:34:33 Dan Watkins cloud-init (Ubuntu): status New Triaged
2020-01-24 18:08:36 Łukasz Zemczak description We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. I'm not entirely sure this is a bug - it could well be argued this is mis-configuration on our part, and there's a trivial workaround, but I'll leave the cloud-init devs to make that call. The configuration is as follows: 1. On recent Ubuntu classic images for the Raspberry Pi, cloud-init's seed is located on the boot partition (/dev/mmcblk0p1, labelled "system-boot"). This is desirable as it's a straight-forward FAT partition, and thus easily accessible and editable on any OS (as opposed to the original seed location on the root ext4 partition). 2. Cloud-init is configured to look for its seed with "fs_label: system-boot" 3. Systemd is configured (via /etc/fstab) to mount /dev/disk/by-label/system-boot (/dev/mmcblk0p1) on /boot/firmware The result on boot-up is one of three situations: 1. cloud-init mounts /dev/mmcblk0p1 on a temp path. While attempting to read its seed, systemd attempts to mount /dev/mmcblk0p1 on /boot/firmware but fails because the device is already mounted elsewhere. cloud-init succeeds in reading its seed, umounts /dev/mmcblk0p1 and the system boots successfully, but /boot/firmware isn't mounted (leading to issues further down the line with other things like flash-kernel that expect it to be mounted). 2. cloud-init checks for /dev/mmcblk0p1 in /proc/mounts, and doesn't see it there. It goes to mount /dev/mmcblk0p1 on a temp path, but before it can do so, systemd mounts it on /boot/firmware. cloud-init's mount fails (device already mounted), so it fails to read its seed and uses a default config instead (which isn't entirely desirable for us as the default involves a long wait if ethernet is not connected; our default seed overrides that). 3. Everything works fine. This occurs when either a) systemd mounts /boot/firmware first, then cloud-init sees this mount in /proc/mounts and uses it or b) cloud-init mounts /dev/mmcblk0p1 on a temp path, reads its seed and unmounts it, *then* systemd mounts /boot/firmware. Here's the relevant snippet of traceback from when the second result occurs: ... 2020-01-14 20:48:01,337 - util.py[DEBUG]: Read 386 bytes from /etc/os-release 2020-01-14 20:48:01,338 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,348 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,357 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,415 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,432 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=SYSTEM-BOOT', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,448 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=system-boot', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,466 - DataSourceNoCloud.py[DEBUG]: Attempting to use data from /dev/mmcblk0p1 2020-01-14 20:48:01,467 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False) 2020-01-14 20:48:01,467 - util.py[DEBUG]: Read 2062 bytes from /proc/mounts 2020-01-14 20:48:01,468 - util.py[DEBUG]: Fetched {'sysfs': {'fstype': 'sysfs', 'mountpoint': '/sys', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'proc': {'fstype': 'proc', 'mountpoint': '/proc', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'udev': {'fstype': 'devtmpfs', 'mountpoint': '/dev', 'opts': 'rw,nosuid,relatime,size=1913472k,nr_inodes=156992,mode=755'}, 'devpts': {'fstype': 'devpts', 'mountpoint': '/dev/pts', 'opts': 'rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000'}, 'tmpfs': {'fstype': 'tmpfs', 'mountpoint': '/etc/machine-id', 'opts': 'ro,relatime,size=391432k,mode=755'}, '/dev/mmcblk0p2': {'fstype': 'ext4', 'mountpoint': '/', 'opts': 'rw,relatime'}, 'securityfs': {'fstype': 'securityfs', 'mountpoint': '/sys/kernel/security', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'cgroup': {'fstype': 'cgroup', 'mountpoint': '/sys/fs/cgroup/freezer', 'opts': 'rw,nosuid,nodev,noexec,relatime,freezer'}, 'pstore': {'fstype': 'pstore', 'mountpoint': '/sys/fs/pstore', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'systemd-1': {'fstype': 'autofs', 'mountpoint': '/proc/sys/fs/binfmt_misc', 'opts': 'rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1728'}, 'debugfs': {'fstype': 'debugfs', 'mountpoint': '/sys/kernel/debug', 'opts': 'rw,relatime'}, 'mqueue': {'fstype': 'mqueue', 'mountpoint': '/dev/mqueue', 'opts': 'rw,relatime'}, 'tracefs': {'fstype': 'tracefs', 'mountpoint': '/sys/kernel/debug/tracing', 'opts': 'rw,relatime'}, 'configfs': {'fstype': 'configfs', 'mountpoint': '/sys/kernel/config', 'opts': 'rw,relatime'}, 'fusectl': {'fstype': 'fusectl', 'mountpoint': '/sys/fs/fuse/connections', 'opts': 'rw,relatime'}} mounts from proc 2020-01-14 20:48:01,468 - util.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] with allowed return codes [0] (shell=False, capture=True) 2020-01-14 20:48:01,529 - util.py[DEBUG]: Failed mount of '/dev/mmcblk0p1' as 'auto': Unexpected error while running command. Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] Exit code: 32 Reason: - Stdout: Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware. 2020-01-14 20:48:01,530 - util.py[WARNING]: Failed to mount /dev/mmcblk0p1 when looking for data 2020-01-14 20:48:01,533 - util.py[DEBUG]: Failed to mount /dev/mmcblk0p1 when looking for data Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data pp2d_kwargs) File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb (device, tmpd, failure_reason)) cloudinit.util.MountFailedError: Failed mounting /dev/mmcblk0p1 to /run/cloud-init/tmp/tmp2z8rueu4 due to: Unexpected error while running command. Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] Exit code: 32 Reason: - Stdout: Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware. 2020-01-14 20:48:01,549 - __init__.py[DEBUG]: Datasource DataSourceNoCloud [seed=None][dsmode=net] not updated for events: New instance first boot 2020-01-14 20:48:01,549 - handlers.py[DEBUG]: finish: init-local/search-NoCloud: SUCCESS: no local data found from DataSourceNoCloud 2020-01-14 20:48:01,549 - main.py[DEBUG]: No local datasource found ... The relevant code is mount_cb() in util.py which is calling mounts() to read /proc/mounts then attempting to mount the device when it's not found (leading to a classic race between the check and the action). It might be argued that it should simply attempt the mount, and check /proc/mounts in the event of failure, but I'm wary that mounting FS' is a potentially costly exercise (can the seed be on a remote mount?), and there's probably edge cases here I'm unaware of (is it possible, perhaps dangerous, to double-mount certain devices?). Perhaps it could be adjusted to check /proc/mounts, attempt the mount, then *re-check* /proc/mounts in the case of failure? Nevertheless, there's a trivial work-around in our case: just add an override for "RequiresMountFor=/boot/firmware" to our systemd config for cloud-init-local, thus ensuring the mount is definitely available prior to cloud-init looking for its seed. This in turn argues the issue could simply be viewed as mis-configuration on our part: we *know* the seed is on /dev/mmcblk0p1 as that's where we've configured it to be; should we not also configure cloud-init in such a way that we guarantee the device containing its seed is definitely mounted prior to it running? [Impact] We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. This can basically lead to situations where the device will not boot. [Test Case] Boot the image built using the bionic-proposed livecd-rootfs multiple times and make sure it is booting up correctly. TODO: find a better test case? [Regression Potential] The change seems relatively safe and doesn't seem like there are any obvious regressions it could cause, but one should be on a lookout for any cloud-init weirdness or issues on boot-time. [Original Description] We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. I'm not entirely sure this is a bug - it could well be argued this is mis-configuration on our part, and there's a trivial workaround, but I'll leave the cloud-init devs to make that call. The configuration is as follows: 1. On recent Ubuntu classic images for the Raspberry Pi, cloud-init's seed is located on the boot partition (/dev/mmcblk0p1, labelled "system-boot"). This is desirable as it's a straight-forward FAT partition, and thus easily accessible and editable on any OS (as opposed to the original seed location on the root ext4 partition). 2. Cloud-init is configured to look for its seed with "fs_label: system-boot" 3. Systemd is configured (via /etc/fstab) to mount /dev/disk/by-label/system-boot (/dev/mmcblk0p1) on /boot/firmware The result on boot-up is one of three situations: 1. cloud-init mounts /dev/mmcblk0p1 on a temp path. While attempting to read its seed, systemd attempts to mount /dev/mmcblk0p1 on /boot/firmware but fails because the device is already mounted elsewhere. cloud-init succeeds in reading its seed, umounts /dev/mmcblk0p1 and the system boots successfully, but /boot/firmware isn't mounted (leading to issues further down the line with other things like flash-kernel that expect it to be mounted). 2. cloud-init checks for /dev/mmcblk0p1 in /proc/mounts, and doesn't see it there. It goes to mount /dev/mmcblk0p1 on a temp path, but before it can do so, systemd mounts it on /boot/firmware. cloud-init's mount fails (device already mounted), so it fails to read its seed and uses a default config instead (which isn't entirely desirable for us as the default involves a long wait if ethernet is not connected; our default seed overrides that). 3. Everything works fine. This occurs when either a) systemd mounts /boot/firmware first, then cloud-init sees this mount in /proc/mounts and uses it or b) cloud-init mounts /dev/mmcblk0p1 on a temp path, reads its seed and unmounts it, *then* systemd mounts /boot/firmware. Here's the relevant snippet of traceback from when the second result occurs: ... 2020-01-14 20:48:01,337 - util.py[DEBUG]: Read 386 bytes from /etc/os-release 2020-01-14 20:48:01,338 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,348 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,357 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,415 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,432 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=SYSTEM-BOOT', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,448 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=system-boot', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,466 - DataSourceNoCloud.py[DEBUG]: Attempting to use data from /dev/mmcblk0p1 2020-01-14 20:48:01,467 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False) 2020-01-14 20:48:01,467 - util.py[DEBUG]: Read 2062 bytes from /proc/mounts 2020-01-14 20:48:01,468 - util.py[DEBUG]: Fetched {'sysfs': {'fstype': 'sysfs', 'mountpoint': '/sys', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'proc': {'fstype': 'proc', 'mountpoint': '/proc', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'udev': {'fstype': 'devtmpfs', 'mountpoint': '/dev', 'opts': 'rw,nosuid,relatime,size=1913472k,nr_inodes=156992,mode=755'}, 'devpts': {'fstype': 'devpts', 'mountpoint': '/dev/pts', 'opts': 'rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000'}, 'tmpfs': {'fstype': 'tmpfs', 'mountpoint': '/etc/machine-id', 'opts': 'ro,relatime,size=391432k,mode=755'}, '/dev/mmcblk0p2': {'fstype': 'ext4', 'mountpoint': '/', 'opts': 'rw,relatime'}, 'securityfs': {'fstype': 'securityfs', 'mountpoint': '/sys/kernel/security', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'cgroup': {'fstype': 'cgroup', 'mountpoint': '/sys/fs/cgroup/freezer', 'opts': 'rw,nosuid,nodev,noexec,relatime,freezer'}, 'pstore': {'fstype': 'pstore', 'mountpoint': '/sys/fs/pstore', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'systemd-1': {'fstype': 'autofs', 'mountpoint': '/proc/sys/fs/binfmt_misc', 'opts': 'rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1728'}, 'debugfs': {'fstype': 'debugfs', 'mountpoint': '/sys/kernel/debug', 'opts': 'rw,relatime'}, 'mqueue': {'fstype': 'mqueue', 'mountpoint': '/dev/mqueue', 'opts': 'rw,relatime'}, 'tracefs': {'fstype': 'tracefs', 'mountpoint': '/sys/kernel/debug/tracing', 'opts': 'rw,relatime'}, 'configfs': {'fstype': 'configfs', 'mountpoint': '/sys/kernel/config', 'opts': 'rw,relatime'}, 'fusectl': {'fstype': 'fusectl', 'mountpoint': '/sys/fs/fuse/connections', 'opts': 'rw,relatime'}} mounts from proc 2020-01-14 20:48:01,468 - util.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] with allowed return codes [0] (shell=False, capture=True) 2020-01-14 20:48:01,529 - util.py[DEBUG]: Failed mount of '/dev/mmcblk0p1' as 'auto': Unexpected error while running command. Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] Exit code: 32 Reason: - Stdout: Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware. 2020-01-14 20:48:01,530 - util.py[WARNING]: Failed to mount /dev/mmcblk0p1 when looking for data 2020-01-14 20:48:01,533 - util.py[DEBUG]: Failed to mount /dev/mmcblk0p1 when looking for data Traceback (most recent call last):   File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data     pp2d_kwargs)   File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb     (device, tmpd, failure_reason)) cloudinit.util.MountFailedError: Failed mounting /dev/mmcblk0p1 to /run/cloud-init/tmp/tmp2z8rueu4 due to: Unexpected error while running command. Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] Exit code: 32 Reason: - Stdout: Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware. 2020-01-14 20:48:01,549 - __init__.py[DEBUG]: Datasource DataSourceNoCloud [seed=None][dsmode=net] not updated for events: New instance first boot 2020-01-14 20:48:01,549 - handlers.py[DEBUG]: finish: init-local/search-NoCloud: SUCCESS: no local data found from DataSourceNoCloud 2020-01-14 20:48:01,549 - main.py[DEBUG]: No local datasource found ... The relevant code is mount_cb() in util.py which is calling mounts() to read /proc/mounts then attempting to mount the device when it's not found (leading to a classic race between the check and the action). It might be argued that it should simply attempt the mount, and check /proc/mounts in the event of failure, but I'm wary that mounting FS' is a potentially costly exercise (can the seed be on a remote mount?), and there's probably edge cases here I'm unaware of (is it possible, perhaps dangerous, to double-mount certain devices?). Perhaps it could be adjusted to check /proc/mounts, attempt the mount, then *re-check* /proc/mounts in the case of failure? Nevertheless, there's a trivial work-around in our case: just add an override for "RequiresMountFor=/boot/firmware" to our systemd config for cloud-init-local, thus ensuring the mount is definitely available prior to cloud-init looking for its seed. This in turn argues the issue could simply be viewed as mis-configuration on our part: we *know* the seed is on /dev/mmcblk0p1 as that's where we've configured it to be; should we not also configure cloud-init in such a way that we guarantee the device containing its seed is definitely mounted prior to it running?
2020-01-24 18:08:40 Łukasz Zemczak cloud-init (Ubuntu): status Triaged Fix Committed
2020-01-24 18:08:43 Łukasz Zemczak cloud-init (Ubuntu): status Fix Committed Invalid
2020-01-24 18:08:52 Łukasz Zemczak bug task added livecd-rootfs (Ubuntu)
2020-01-24 18:08:58 Łukasz Zemczak livecd-rootfs (Ubuntu): status New Fix Committed
2020-01-24 18:09:05 Łukasz Zemczak nominated for series Ubuntu Bionic
2020-01-24 18:09:05 Łukasz Zemczak bug task added livecd-rootfs (Ubuntu Bionic)
2020-01-24 18:09:05 Łukasz Zemczak bug task added cloud-init (Ubuntu Bionic)
2020-01-24 18:09:12 Łukasz Zemczak livecd-rootfs (Ubuntu Bionic): status New In Progress
2020-01-24 18:09:15 Łukasz Zemczak livecd-rootfs (Ubuntu Bionic): importance Undecided Critical
2020-01-24 18:09:18 Łukasz Zemczak livecd-rootfs (Ubuntu): importance Undecided High
2020-01-24 18:10:31 Łukasz Zemczak livecd-rootfs (Ubuntu Bionic): status In Progress Fix Committed
2020-01-24 18:10:33 Łukasz Zemczak bug added subscriber Ubuntu Stable Release Updates Team
2020-01-24 18:10:35 Łukasz Zemczak bug added subscriber SRU Verification
2020-01-24 18:10:37 Łukasz Zemczak tags id-5db0502bcb3f2112497e1b92 id-5db0502bcb3f2112497e1b92 verification-needed verification-needed-bionic
2020-01-25 09:37:19 Mathew Hodson bug added subscriber Mathew Hodson
2020-01-27 10:05:53 Launchpad Janitor livecd-rootfs (Ubuntu): status Fix Committed Fix Released
2020-01-30 08:39:07 Łukasz Zemczak tags id-5db0502bcb3f2112497e1b92 verification-needed verification-needed-bionic id-5db0502bcb3f2112497e1b92 verification-done verification-done-bionic
2020-01-30 09:24:26 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2020-01-30 09:34:19 Launchpad Janitor livecd-rootfs (Ubuntu Bionic): status Fix Committed Fix Released
2020-02-06 01:53:23 Mathew Hodson removed subscriber Mathew Hodson
2020-02-06 01:53:29 Mathew Hodson bug task deleted cloud-init (Ubuntu Bionic)
2020-02-06 01:53:35 Mathew Hodson bug task deleted cloud-init (Ubuntu)