Reformatting of ephemeral drive fails on resize of Azure VM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
High
|
Unassigned | ||
cloud-init (Ubuntu) |
Fix Released
|
High
|
Scott Moser | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Yakkety |
Fix Released
|
Medium
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Impact]
In some cases, cloud-init writes entries to /etc/fstab, and on azure it will
even format a disk for mounting and then write the entry for that 'ephemeral'
disk there.
A supported operation on Azure is to "resize" the system. When you do this
the system is shut down, resized (given larger/faster disks and more CPU) and
then brought back up. In that process, the "ephemeral" disk re-initialized
to its original NTFS format. The designed goal is for cloud-init to recognize
this situation and re-format the disk to ext4.
The problem is that the mount of that disk happens before cloud-init can
reformat. Thats because the entry in fstab has 'auto' and is automatically
mounted. The end result is that after resize operation the user will be left
with the ephemeral disk mounted at /mnt and having a ntfs filesystem rather
than ext4.
[Test Case]
The text in comment 3 describes how to recreate by the original reporter.
Another way to do this is to just re-format the ephemeral disk as
ntfs and then reboot. The result *should* be that after reboot it
comes back up and has an ext4 filesystem on it.
1.) boot system on azure
(for this, i use https:/
use web ui or any other way).
Save output of
journalctl --no-pager > journalctl.orig
systemctl status --no-pager > systemctl-
systemctl --no-pager > systemctl.orig
2.) unmount the ephemeral disk
$ umount /mnt
3.) repartition it so that mkfs.ntfs does less and is faster
This is not strictly necessary, but mkfs.ntfs can take upwards of
20 minutes. shrinking /dev/sdb2 to be 200M means it will finish
in < 1 minute.
$ disk=/dev/
$ part=/dev/
$ echo "2048,$
$ time mkfs.ntfs --quick "$part"
4.) reboot
5.) expect that /proc/mounts has /dev/disk/
and that fstab has x-systemd.requires in it.
$ awk '$2 == "/mnt" { print $0 }' /proc/mounts
/dev/sdb1 /mnt ext4 rw,relatime,
$ awk '$2 == "/mnt" { print $0 }' /etc/fstab
/dev/sdb1 /mnt auto defaults,
6.) collect journal and systemctl information as described in step 1 above.
Compare output, specifically looking for case insensitve "breaks"
[Regression Potential]
Regression is unlikely. Likely failure case is just that the problem is not
correctly fixed, and the user ends up with either an NTFS formated disk that
is mounted at /mnt or there is nothing mounted at /mnt.
=== End SRU Template ===
After resizing a 16.04 VM on Azure, the VM is presented with a new ephemeral drive (of a different size), which initially is NTFS formatted. Cloud-init tries to format the appropriate partition ext4, but fails because it is mounted. Cloud-init has unmount logic for exactly this case in the get_data call on the Azure data source, but this is never called because fresh cache is found.
Jun 27 19:07:47 azubuntu1604arm [CLOUDINIT] handlers.py[DEBUG]: start: init-network/
Jun 27 19:07:47 azubuntu1604arm [CLOUDINIT] util.py[DEBUG]: Reading from /var/lib/
Jun 27 19:07:47 azubuntu1604arm [CLOUDINIT] util.py[DEBUG]: Read 5950 bytes from /var/lib/
Jun 27 19:07:47 azubuntu1604arm [CLOUDINIT] stages.py[DEBUG]: restored from cache: DataSourceAzureNet [seed=/dev/sr0]
Jun 27 19:07:47 azubuntu1604arm [CLOUDINIT] handlers.py[DEBUG]: finish: init-network/
...
Jun 27 19:07:48 azubuntu1604arm [CLOUDINIT] cc_disk_
Jun 27 19:07:48 azubuntu1604arm [CLOUDINIT] cc_disk_
Jun 27 19:07:48 azubuntu1604arm [CLOUDINIT] util.py[DEBUG]: Running command ['/sbin/mkfs.ext4', '/dev/sdb1'] with allowed return codes [0] (shell=False, capture=True)
Jun 27 19:07:48 azubuntu1604arm [CLOUDINIT] util.py[DEBUG]: Creating fs for /dev/disk/
Jun 27 19:07:48 azubuntu1604arm [CLOUDINIT] util.py[WARNING]: Failed during filesystem operation#012Failed to exec of '['/sbin/
$ lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04
$ cat /etc/cloud/
build_name: server
serial: 20160721
~$ dpkg -l cloud-init
Desired=
| Status=
|/ Err?=(none)
||/ Name Version Architecture Description
+++-===
ii cloud-init 0.7.7~bzr1256-
We're seeing ~100% repro of this bug on resize, where the only success cases are caused by another bug that messes up fstab and prevents mounting of the drive.
Related bugs:
bug 1629868: cloud-init times out because of no dbus
bug 1603222: Azure: incorrect entry in fstab for ephemeral disk
Related branches
- cloud-init Commiters: Pending requested
-
Diff: 381 lines (+136/-107)6 files modifiedcloudinit/cmd/main.py (+3/-0)
cloudinit/config/cc_mounts.py (+9/-3)
cloudinit/sources/DataSourceAzure.py (+104/-95)
cloudinit/sources/__init__.py (+12/-0)
cloudinit/stages.py (+7/-0)
tests/unittests/test_datasource/test_azure.py (+1/-9)
- cloud-init Commiters: Pending requested
-
Diff: 122 lines (+24/-10)5 files modifiedcloudinit/config/cc_mounts.py (+2/-2)
cloudinit/sources/DataSourceAzure.py (+5/-2)
config/cloud.cfg (+2/-2)
systemd/cloud-init-local.service (+2/-1)
systemd/cloud-init.service (+13/-3)
Changed in cloud-init (Ubuntu): | |
assignee: | nobody → Scott Moser (smoser) |
Changed in cloud-init: | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu): | |
status: | New → Confirmed |
Changed in cloud-init: | |
importance: | Undecided → High |
Changed in cloud-init (Ubuntu): | |
importance: | Undecided → High |
description: | updated |
Changed in cloud-init: | |
status: | Confirmed → Fix Committed |
description: | updated |
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
description: | updated |
Changed in cloud-init (Ubuntu): | |
status: | Fix Released → Confirmed |
Changed in cloud-init (Ubuntu Xenial): | |
status: | Fix Committed → Confirmed |
description: | updated |
Changed in cloud-init (Ubuntu Yakkety): | |
importance: | Undecided → Medium |
Hi Paul,
Could you give me steps that I can follow to reproduce this issue (ideally using the Azure CLI)? That'll make it easier for us to test fixes.
Thanks,
Dan