Cloud-init fails to write ext4 filesystem to Azure Ephemeral Drive
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
Medium
|
Unassigned | ||
cloud-init (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Yakkety |
Fix Released
|
Medium
|
Unassigned | ||
Zesty |
Fix Released
|
Medium
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Impact]
There is a race condition that occurs when cloud-init tries to partition a
block device (/dev/sdb) and then put a filesystem on a partition on it. It is
possible that cloud-init tries to run mkfs on /dev/sdb1 after partitioning the
device /dev/sdb but before the partition device node '/dev/sdb1' exists.
When this race condition occurs, cloud-init will fail to make the "ephemeral"
device available to the user on Azure.
[Test Case]
A reliable reproduce test case is hard to come by here. The failure case
is believed to be well understood.
[Regression Potential]
There should be very little chance for regression, as essentially all the change
does is change:
1. sgdisk -n 1:0:0 /dev/sdb
2. mkfs.ext4 /dev/sdb1
to
1. sgdisk -n 1:0:0 /dev/sdb
1a udevadm settle
1b blockdev --rereadpt
1c udevadm settle
2. mkfs.ext4 /dev/sdb1
The steps '1b' and '1c' above are not necessary, but were present already in
the method. They serve here as additional wait.
[Other Info]
The change that fixes this is viewable at [1]. For context, viewin all of
cc_disk_setup.py [2]. Basically we just add a call to read_parttbl [3] to
exec_mkpart_gpt after invoking a sgdisk command that partitions a disk.
read_partbl basically does a udevadm settle which fixes the race condition that
was seen.
[1] https:/
[2] https:/
[3] https:/
=== End SRU Template ===
The symptom is similar to bug 1611074 but the cause is different. In this case it seems there is an error accessing /dev/sdb1 when lsblk is run, possibly because sgdisk isn't done creating the partition. The specific error message is "/dev/sdb1: not a block device." A simple wait and retry here may resolve the issue.
util.py[DEBUG]: Running command ['/sbin/sgdisk', '-p', '/dev/sdb'] with allowed return codes [0] (shell=False, capture=True)
cc_disk_
util.py[DEBUG]: Creating partition on /dev/disk/
cc_disk_
cc_disk_
cc_disk_
cc_disk_
cc_disk_
cc_disk_
util.py[DEBUG]: Running command ['/sbin/blkid', '-c', '/dev/null', '/dev/sdb1'] with allowed return codes [0, 2] (shell=False, capture=True)
cc_disk_
cc_disk_
cc_disk_
util.py[DEBUG]: Running command ['/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,
util.py[DEBUG]: Creating fs for /dev/disk/
util.py[WARNING]: Failed during filesystem operation#012Failed during disk check for /dev/sdb1#
Changed in cloud-init: | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu): | |
status: | New → Confirmed |
Changed in cloud-init: | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu): | |
importance: | Undecided → Medium |
Changed in cloud-init: | |
status: | Confirmed → Fix Committed |
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Yakkety): | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Yakkety): | |
importance: | Undecided → Medium |
description: | updated |
description: | updated |
quick read of above log does look like we might need a udevadm settle in there.