Azure: incorrect entry in fstab for ephemeral disk

Bug #1603222 reported by Stephen A. Zarkos on 2016-07-14
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Undecided
Dan Watkins
cloud-init (Ubuntu)
High
Dan Watkins
Precise
Medium
Unassigned
Trusty
High
Dan Watkins

Bug Description

[Impact]
There is a chance that Azure users' ephemeral disks will not be mounted properly if the device names change after a reboot.

[Test Case]

1) Provision an Ubuntu VM on Azure (I tested with 14.04.4)
2) The fstab entry for the ephemeral disk (/mnt) correctly points to /dev/disk/cloud/azure_resource
3) Reboot the VM (sudo reboot)
4) The fstab entry still points to /dev/disk/cloud/azure_resource (and not /dev/sdb)

[Regression Potential]

This introduces new udev rules to the cloud-init package. These will only cause entries to appear on Azure. It also makes a minor change to how cloud-init will write to /etc/fstab on Azure.

Both of these changes will have no impact outside of Azure, and are the intended behaviour on Azure.

[Original Bug Report]

During provisioning cloud-init adds an entry for the ephemeral disk in /etc/fstab. After provisioning this entry is correct and points to "/dev/disk/azure/resource-part1". This symlink is created dynamically by 66-azure-storage.rules.

For some reason after the first reboot cloud-init overwrites the fstab entry and changes the "/dev/disk/azure/resource-part1" to the device name that it points to, i.e. /dev/sdb1. However, this is incorrect since /dev/sd* device names are not persistent.

Repro:

1) Provision an Ubuntu VM on Azure (I tested with 14.04.4)
2) The fstab entry for the ephemeral disk (/mnt) correctly points to "/dev/disk/azure/resource-part1".
3) Reboot the VM (sudo reboot)
4) The fstab entry now incorrectly points to /dev/sdb1 instead of the symlink.

Impact:
There is a chance that the customer's ephemeral disk will not be mounted properly if the device names change after a reboot.

Related bugs:
 * bug 1611074: Reformatting of ephemeral drive fails on resize of Azure VM

Related branches

Stephen A. Zarkos (stevez) wrote :

I also see these messages in cloud-init.log. The first set is for the first provision, and the second is after the first reboot:

[CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0
[CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/disk/azure/resource
[CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/disk/azure/resource-part1

[CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/sdb
[CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/sdb1

Changed in cloud-init (Ubuntu):
importance: Undecided → High
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Scott Moser (smoser) wrote :

I think this is a matter of bug 1411582 actually not having been SRU'd correctly.
$ lsb_release -sc
trusty
$ dpkg-query --show cloud-init
cloud-init 0.7.5-0ubuntu1.19
$ dpkg -L cloud-init | grep udev
$ dpkg -L cloud-init | grep udev || echo no files named udev
no files named udev

I also checked that the version that was marked as SRU'd in (0.7.5-0ubuntu1.8) does not have any udev files. So it seems that it just never got in, versus having regressed since 0.7.5-0ubuntu1.8.

$ wget https://launchpad.net/ubuntu/+archive/primary/+files/cloud-init_0.7.5-0ubuntu1.8_all.deb
$ dpkg -c cloud-init_0.7.5-0ubuntu1.8_all.deb | grep udev || echo no files named udev
no files named udev

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Released
no longer affects: cloud-init (Ubuntu Yakkety)
no longer affects: cloud-init (Ubuntu Xenial)
Changed in cloud-init (Ubuntu Trusty):
importance: Undecided → High
status: New → Confirmed
Dan Watkins (daniel-thewatkins) wrote :

On trusty systems (at least), walinuxagent ships the udev rules to produce the devices that cloud-init expects to find. So that isn't the source of this bug.

Instead, the problem is that the Azure data source defaults to using /dev/sdb for ephemeral0 ('disk_aliases': {'ephemeral0': '/dev/sdb'} at [0]). Iff we detect a _fabric-formatted_ (i.e. NTFS) ephemeral disk, then the data source updates this default to instead point at that ephemeral disk (which will, correctly, be /dev/disk/azure/...). This happens fine on every first boot, but on subsequent boots, we don't find a fabric-formatted ephemeral disk (because we reformatted it on first boot), so we don't update the default, so we end up rewriting the mounts to point at /dev/sdb.

(I'll give fixing this some thought, and then comment again with suggestions.)

[0] https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceAzure.py#n57

Dan Watkins (daniel-thewatkins) wrote :

Adding the following snippet to /etc/cloud/cloud.cfg before rebooting for the first time seems to fix the issue, which supports my analysis.

datasource:
  Azure:
    disk_aliases:
      ephemeral0: /dev/disk/azure/resource

Dan Watkins (daniel-thewatkins) wrote :

OK, I've given this some thought. I think we can probably just modify the default, as udev rules will ensure that it exists.

As we'll be matching udev rules, we'll need to modify it to /dev/disk/azure/resource in places where we don't ship udev rules with cloud-init (i.e. trusty), and to /dev/disk/cloud/azure_resource in places that we do (e.g. xenial and later).

We _could_ backport the cloud-init udev rules to trusty, but I don't think we need to.

Changed in cloud-init (Ubuntu Wily):
status: New → Won't Fix
Changed in cloud-init (Ubuntu):
status: Fix Released → Confirmed
Changed in cloud-init:
status: New → In Progress
assignee: nobody → Dan Watkins (daniel-thewatkins)
Changed in cloud-init (Ubuntu):
assignee: nobody → Dan Watkins (daniel-thewatkins)
Changed in cloud-init (Ubuntu Trusty):
assignee: nobody → Dan Watkins (daniel-thewatkins)
no longer affects: cloud-init (Ubuntu Wily)
Scott Moser (smoser) wrote :

This is fixed in commit
 9e904bbc3336b96475bfd00fb3bf1262ae4de49f
https://git.launchpad.net/cloud-init/commit/?id=9e904bbc3336b96475bfd00fb3bf1262ae4de49f

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Changed in cloud-init (Ubuntu Xenial):
status: New → Fix Committed
importance: Undecided → Medium
Changed in cloud-init:
status: In Progress → Fix Committed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
importance: Undecided → Medium
description: updated
Dan Watkins (daniel-thewatkins) wrote :

As suggested to me by smoser in #cloud-init, this doesn't actually reproduce on xenial and yakkety; I'll focus on fixing it in trusty.

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Invalid
Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Invalid
Changed in cloud-init (Ubuntu Precise):
status: New → Confirmed
description: updated

Hello Stephen, or anyone else affected,

Accepted cloud-init into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.5-0ubuntu1.21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed

I have built a trusty Azure image using -proposed, and confirmed that this now behaves as expected.

tags: added: verification-done-trusty
removed: verification-needed

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.5-0ubuntu1.21

---------------
cloud-init (0.7.5-0ubuntu1.21) trusty; urgency=medium

  * Microsoft Azure:
    - Install udev rules to create /dev/disk/cloud entries for Azure ephemeral
      disk.
    - debian/patches/lp-1603222-fix-ephemeral-disk-fstab.patch:
      - Use /dev/disk/cloud entries for ephemeral disk (LP: #1603222)

 -- Daniel Watkins <email address hidden> Fri, 25 Nov 2016 10:12:20 +0000

Changed in cloud-init (Ubuntu Trusty):
status: Fix Committed → Fix Released
no longer affects: cloud-init (Ubuntu Xenial)
no longer affects: cloud-init (Ubuntu Yakkety)
Changed in cloud-init (Ubuntu Precise):
importance: Undecided → Medium
Amit (amityo) wrote :

We also experience failure to mount ephemeral disk to /mnt in Yakkety (both with 0.7.8-15 and 0.7.8-49 prerelease).

Seems like a race condition, can't reproduce 100% of the time.

the /etc/fstab is generated before the symlink /dev/disk/cloud/azure_resource-part1 and contains:

cat /etc/fstab
/dev/disk/cloud/azure_resource /mnt auto ...

cloud-init logs -
cloud-init[1177]: [CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0
cloud-init[1177]: [CLOUDINIT] cc_mounts.py[DEBUG]: Mapped metadata name ephemeral0 to /dev/disk/cloud/azure_resource
cloud-init[1177]: [CLOUDINIT] cc_mounts.py[DEBUG]: changed default device ephemeral0 => /dev/disk/cloud/azure_resource

change time of relevant files -
2016-12-12 17:35:43.031397400 +0000 /etc/fstab
2016-12-12 17:35:43.067397400 +0000 /run/systemd/generator/mnt.mount
2016-12-12 17:35:43.175397400 +0000 /dev/disk/cloud/azure_resource -> ../../sdb
2016-12-12 17:35:43.223397400 +0000 /dev/disk/cloud/azure_resource-part1 -> ../../sdb1

systemctl cat mnt.mount
...
[Mount]
What=/dev/disk/cloud/azure_resource
Where=/mnt
...

Amit (amityo) wrote :

After revisiting the problem, 0.7.8-49 seems to work fine. The problem was that I didn't de-provision and started the machine again.

Changed in cloud-init (Ubuntu Precise):
status: Confirmed → Won't Fix
Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers