Azure: incorrect entry in fstab for ephemeral disk

Bug #1603222 reported by Stephen A. Zarkos
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Undecided
Dan Watkins
cloud-init (Ubuntu)
Fix Released
High
Dan Watkins
Precise
Won't Fix
Medium
Unassigned
Trusty
Fix Released
High
Dan Watkins

Bug Description

[Impact]
There is a chance that Azure users' ephemeral disks will not be mounted properly if the device names change after a reboot.

[Test Case]

1) Provision an Ubuntu VM on Azure (I tested with 14.04.4)
2) The fstab entry for the ephemeral disk (/mnt) correctly points to /dev/disk/cloud/azure_resource
3) Reboot the VM (sudo reboot)
4) The fstab entry still points to /dev/disk/cloud/azure_resource (and not /dev/sdb)

[Regression Potential]

This introduces new udev rules to the cloud-init package. These will only cause entries to appear on Azure. It also makes a minor change to how cloud-init will write to /etc/fstab on Azure.

Both of these changes will have no impact outside of Azure, and are the intended behaviour on Azure.

[Original Bug Report]

During provisioning cloud-init adds an entry for the ephemeral disk in /etc/fstab. After provisioning this entry is correct and points to "/dev/disk/azure/resource-part1". This symlink is created dynamically by 66-azure-storage.rules.

For some reason after the first reboot cloud-init overwrites the fstab entry and changes the "/dev/disk/azure/resource-part1" to the device name that it points to, i.e. /dev/sdb1. However, this is incorrect since /dev/sd* device names are not persistent.

Repro:

1) Provision an Ubuntu VM on Azure (I tested with 14.04.4)
2) The fstab entry for the ephemeral disk (/mnt) correctly points to "/dev/disk/azure/resource-part1".
3) Reboot the VM (sudo reboot)
4) The fstab entry now incorrectly points to /dev/sdb1 instead of the symlink.

Impact:
There is a chance that the customer's ephemeral disk will not be mounted properly if the device names change after a reboot.

Related bugs:
 * bug 1611074: Reformatting of ephemeral drive fails on resize of Azure VM

Related branches

Revision history for this message
Stephen A. Zarkos (stevez) wrote :
Revision history for this message
Stephen A. Zarkos (stevez) wrote :

I also see these messages in cloud-init.log. The first set is for the first provision, and the second is after the first reboot:

[CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0
[CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/disk/azure/resource
[CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/disk/azure/resource-part1

[CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/sdb
[CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/sdb1

Changed in cloud-init (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

I think this is a matter of bug 1411582 actually not having been SRU'd correctly.
$ lsb_release -sc
trusty
$ dpkg-query --show cloud-init
cloud-init 0.7.5-0ubuntu1.19
$ dpkg -L cloud-init | grep udev
$ dpkg -L cloud-init | grep udev || echo no files named udev
no files named udev

I also checked that the version that was marked as SRU'd in (0.7.5-0ubuntu1.8) does not have any udev files. So it seems that it just never got in, versus having regressed since 0.7.5-0ubuntu1.8.

$ wget https://launchpad.net/ubuntu/+archive/primary/+files/cloud-init_0.7.5-0ubuntu1.8_all.deb
$ dpkg -c cloud-init_0.7.5-0ubuntu1.8_all.deb | grep udev || echo no files named udev
no files named udev

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Released
no longer affects: cloud-init (Ubuntu Yakkety)
no longer affects: cloud-init (Ubuntu Xenial)
Changed in cloud-init (Ubuntu Trusty):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Dan Watkins (oddbloke) wrote :

On trusty systems (at least), walinuxagent ships the udev rules to produce the devices that cloud-init expects to find. So that isn't the source of this bug.

Instead, the problem is that the Azure data source defaults to using /dev/sdb for ephemeral0 ('disk_aliases': {'ephemeral0': '/dev/sdb'} at [0]). Iff we detect a _fabric-formatted_ (i.e. NTFS) ephemeral disk, then the data source updates this default to instead point at that ephemeral disk (which will, correctly, be /dev/disk/azure/...). This happens fine on every first boot, but on subsequent boots, we don't find a fabric-formatted ephemeral disk (because we reformatted it on first boot), so we don't update the default, so we end up rewriting the mounts to point at /dev/sdb.

(I'll give fixing this some thought, and then comment again with suggestions.)

[0] https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceAzure.py#n57

Revision history for this message
Dan Watkins (oddbloke) wrote :

Adding the following snippet to /etc/cloud/cloud.cfg before rebooting for the first time seems to fix the issue, which supports my analysis.

datasource:
  Azure:
    disk_aliases:
      ephemeral0: /dev/disk/azure/resource

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, I've given this some thought. I think we can probably just modify the default, as udev rules will ensure that it exists.

As we'll be matching udev rules, we'll need to modify it to /dev/disk/azure/resource in places where we don't ship udev rules with cloud-init (i.e. trusty), and to /dev/disk/cloud/azure_resource in places that we do (e.g. xenial and later).

We _could_ backport the cloud-init udev rules to trusty, but I don't think we need to.

Changed in cloud-init (Ubuntu Wily):
status: New → Won't Fix
Dan Watkins (oddbloke)
Changed in cloud-init (Ubuntu):
status: Fix Released → Confirmed
Changed in cloud-init:
status: New → In Progress
assignee: nobody → Dan Watkins (daniel-thewatkins)
Changed in cloud-init (Ubuntu):
assignee: nobody → Dan Watkins (daniel-thewatkins)
Changed in cloud-init (Ubuntu Trusty):
assignee: nobody → Dan Watkins (daniel-thewatkins)
no longer affects: cloud-init (Ubuntu Wily)
Revision history for this message
Scott Moser (smoser) wrote :

This is fixed in commit
 9e904bbc3336b96475bfd00fb3bf1262ae4de49f
https://git.launchpad.net/cloud-init/commit/?id=9e904bbc3336b96475bfd00fb3bf1262ae4de49f

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Changed in cloud-init (Ubuntu Xenial):
status: New → Fix Committed
importance: Undecided → Medium
Changed in cloud-init:
status: In Progress → Fix Committed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
importance: Undecided → Medium
description: updated
Revision history for this message
Dan Watkins (oddbloke) wrote :

As suggested to me by smoser in #cloud-init, this doesn't actually reproduce on xenial and yakkety; I'll focus on fixing it in trusty.

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Invalid
Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Invalid
Dan Watkins (oddbloke)
Changed in cloud-init (Ubuntu Precise):
status: New → Confirmed
Dan Watkins (oddbloke)
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Stephen, or anyone else affected,

Accepted cloud-init into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.5-0ubuntu1.21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Dan Watkins (oddbloke) wrote :

I have built a trusty Azure image using -proposed, and confirmed that this now behaves as expected.

tags: added: verification-done-trusty
removed: verification-needed
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.5-0ubuntu1.21

---------------
cloud-init (0.7.5-0ubuntu1.21) trusty; urgency=medium

  * Microsoft Azure:
    - Install udev rules to create /dev/disk/cloud entries for Azure ephemeral
      disk.
    - debian/patches/lp-1603222-fix-ephemeral-disk-fstab.patch:
      - Use /dev/disk/cloud entries for ephemeral disk (LP: #1603222)

 -- Daniel Watkins <email address hidden> Fri, 25 Nov 2016 10:12:20 +0000

Changed in cloud-init (Ubuntu Trusty):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
no longer affects: cloud-init (Ubuntu Xenial)
no longer affects: cloud-init (Ubuntu Yakkety)
Changed in cloud-init (Ubuntu Precise):
importance: Undecided → Medium
Revision history for this message
Amit (amityo) wrote :

We also experience failure to mount ephemeral disk to /mnt in Yakkety (both with 0.7.8-15 and 0.7.8-49 prerelease).

Seems like a race condition, can't reproduce 100% of the time.

the /etc/fstab is generated before the symlink /dev/disk/cloud/azure_resource-part1 and contains:

cat /etc/fstab
/dev/disk/cloud/azure_resource /mnt auto ...

cloud-init logs -
cloud-init[1177]: [CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0
cloud-init[1177]: [CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/disk/cloud/azure_resource
cloud-init[1177]: [CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/disk/cloud/azure_resource

change time of relevant files -
2016-12-12 17:35:43.031397400 +0000 /etc/fstab
2016-12-12 17:35:43.067397400 +0000 /run/systemd/generator/mnt.mount
2016-12-12 17:35:43.175397400 +0000 /dev/disk/cloud/azure_resource -> ../../sdb
2016-12-12 17:35:43.223397400 +0000 /dev/disk/cloud/azure_resource-part1 -> ../../sdb1

systemctl cat mnt.mount
...
[Mount]
What=/dev/disk/cloud/azure_resource
Where=/mnt
...

Revision history for this message
Amit (amityo) wrote :

After revisiting the problem, 0.7.8-49 seems to work fine. The problem was that I didn't de-provision and started the machine again.

Changed in cloud-init (Ubuntu Precise):
status: Confirmed → Won't Fix
Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.