Cloud init is re-executing fs and disk setup during reboot

Bug #1692093 reported by Anubhuti Manohar on 2017-05-19
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Paul Meyer
cloud-init (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned
Zesty
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
VMs on MS Azure have an ephemeral disk attached to them.
On first boot, cloud-init properly notices the empty ntfs filesystem and
reformats it ext4.

After deallocating the instance or moving to a new azure host,
the filesystem reformat is logged, but isn't actually performed because
the udev device creation may not have settled.

[Test Case]
Test cases:
 1. Deploy an instance VM on Azure
 2. Log in and ensure that the ephemeral disk is formatted and mounted to /mnt
 3. Via the portal you can "Redeploy" the VM to a new Azure Host (or alternatively stop and deallocate the VM for some time, and then restart/reallocate the VM).

Expected Results:a
 - Check cloud-init.log expecting to see logs from cc_disk_setup about the mount.
 - After reallocation we expect the ephemeral disk to be formatted and mounted to /mnt.

Actual Results:
 - After reallocation /mnt is not mounted and there are errors in the cloud-init log.

[Regression Potential]
Regression potential should be extremely low on this change.
Essentially the change was to add the function 'assert_and_settle_device' and
then to use it. See the commit linked below.

The most likely regression is just slower boot. The most likely unexpected change in behavior is getting a RunTimeError due to 'assert_and_settle_device' realizing that there is no device present. That would have ultimately just resulted in a different error with less obvious error message.

[Other Info]
Upstream commit at
  https://git.launchpad.net/cloud-init/commit/?id=1815c6d801933c47a01f1a94a8e689824f6797b4

=== End SRU Template ===

Cloud Provider: Azure
dpkg-query -W -f='${Version}' cloud-init output: 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1

When the following is specified in cloud init it seems to be re-executing fs and disk setup (even though run command does not seem to re run)
disk_setup:
  /dev/sdc:
      table_type: gpt
      layout: true
      overwrite: false

fs_setup:
- label: etcd_disk
  filesystem: ext4
  device: /dev/sdc1
  extra_opts:
    - "-F"
    - "-E"
    - "lazy_itable_init=1,lazy_journal_init=1"

mounts:
- - /dev/sdc1
  - /var/lib/etcddisk

From cloud-init-output.log:

Cloud-init v. 0.7.9 running 'modules:final' at Mon, 15 May 2017 18:33:15 +0000. Up 64.24 seconds.
Cloud-init v. 0.7.9 finished at Mon, 15 May 2017 18:34:46 +0000. Datasource DataSourceAzureNet [seed=/dev/sr0]. Up 155.34 seconds
Cloud-init v. 0.7.9 running 'init-local' at Tue, 16 May 2017 01:52:37 +0000. Up 10.33 seconds.
Cloud-init v. 0.7.9 running 'init' at Tue, 16 May 2017 01:52:39 +0000. Up 12.06 seconds.

From cloud-init.log for the initial provision:

2017-05-15 18:32:46,820 - cc_disk_setup.py[DEBUG]: Creating file system etcd_disk on /dev/sdc1
2017-05-15 18:32:46,820 - cc_disk_setup.py[DEBUG]: Using cmd: /sbin/mkfs.ext4 /dev/sdc1 -L etcd_disk -F -E lazy_itable_init=1,lazy_journal_init=1
2017-05-15 18:32:46,820 - util.py[DEBUG]: Running command ['/sbin/mkfs.ext4', '/dev/sdc1', '-L', 'etcd_disk', '-F', '-E', 'lazy_itable_init=1,lazy_journal_init=1'] with allowed return codes [0] (shell=False, capture=True)
2017-05-15 18:33:04,054 - util.py[DEBUG]: Creating fs for /dev/sdc1 took 17.237 seconds

and after reboot (cloud-init.log)

2017-05-16 01:52:40,245 - cc_disk_setup.py[DEBUG]: Creating file system etcd_disk on /dev/sdc1
2017-05-16 01:52:40,246 - cc_disk_setup.py[DEBUG]: Using cmd: /sbin/mkfs.ext4 /dev/sdc1 -L etcd_disk -F -E lazy_itable_init=1,lazy_journal_init=1
2017-05-16 01:52:40,246 - util.py[DEBUG]: Running command ['/sbin/mkfs.ext4', '/dev/sdc1', '-L', 'etcd_disk', '-F', '-E', 'lazy_itable_init=1,lazy_journal_init=1'] with allowed return codes [0] (shell=False, capture=True)

Related branches

Anubhuti Manohar (amanohar) wrote :
Paul Meyer (paul-meyer) wrote :

Repro script with fix: https://gist.github.com/paulmey/1a4f35a687d7559dca612a0eda8d5793

This is apparently a race condition that leaves lsblk without FSTYPE and LABEL. Cloud-init is then unable to verify that the partition is already formatted correctly and can be reused. This can be fixed by doing a 'udevadm settle' before doing 'lsblk' in enumerate_disk in cc_disk_setup.py.

Changed in cloud-init:
status: New → Confirmed
assignee: nobody → Paul Meyer (paul-meyer)
Scott Moser (smoser) wrote :

The config:
fs_setup:
- label: etcd_disk
  filesystem: ext4
  device: /dev/sdc1
  extra_opts:
    - "-F"
    - "-E"
    - "lazy_itable_init=1,lazy_journal_init=1"

is kind of wierd.

Looking at the code I think if you specifically said:
 device: /dev/sdc
 partition: 1

Then you'd get what you wanted.
The code definitely is buggy in the path you've exposed, but I'd be interested in knowing if changing the device specification to above makes it behave differently.

Scott Moser (smoser) on 2017-05-25
Changed in cloud-init:
importance: Undecided → Medium
Scott Moser (smoser) on 2017-05-26
Changed in cloud-init (Ubuntu):
status: New → Fix Released
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Zesty):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Zesty):
importance: Undecided → Medium
Scott Moser (smoser) on 2017-06-02
Changed in cloud-init:
status: Confirmed → Fix Committed
Chad Smith (chad.smith) on 2017-06-02
description: updated
Scott Moser (smoser) on 2017-06-02
description: updated
Scott Moser (smoser) on 2017-06-13
description: updated

Hello Anubhuti, or anyone else affected,

Accepted cloud-init into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~17.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Zesty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Brian Murray (brian-murray) wrote :

Hello Anubhuti, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
Brian Murray (brian-murray) wrote :

Hello Anubhuti, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
Joshua Powers (powersj) wrote :

@paul-meyer thank you for the repro script!

Tested on using the script in #1

* xenial (0.7.9-153-g16a7302f-0ubuntu1~16.04.1)
* yakkety (0.7.9-153-g16a7302f-0ubuntu1~16.10.1)
* zesty (0.7.9-153-g16a7302f-0ubuntu1~17.04.1)

First reproduced and verified the files were not persisting in each reboot. Then, upgraded to the version in proposed and re-ran the script on the upgraded vm to confirm that the files that were written to disk (e.g. hello, retry) persisted after each reboot.

Marking verificaiton done.

tags: added: verification-done-xenial verification-done-yakkety verification-done-zesty
removed: verification-needed
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~17.04.1

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~17.04.1) zesty-proposed; urgency=medium

  * New upstream snapshot.
    - net: fix reading and rendering addresses in cidr format.
      [Dimitri John Ledkov] (LP: #1689346, #1684349)
    - disk_setup: udev settle before attempting partitioning or fs creation.
      (LP: #1692093)
    - GCE: Update the attribute used to find instance SSH keys.
      [Daniel Watkins] (LP: #1693582)
    - nplan: For bonds, allow dashed or underscore names of keys.
      [Dimitri John Ledkov] (LP: #1690480)
    - tests: python2.6: fix unit tests usage of assertNone and format.
    - tests: update docstring on test_configured_list_with_none
    - fix tools/ds-identify to not write None twice.
    - tox/build: do not package depend on style requirements.
    - tests: ntp: Restructure cc_ntp unit tests. [Chad Smith]
    - flake8: move the pinned version of flake8 up to 3.3.0
    - tests: Apply workaround for snapd bug in test case. [Joshua Powers]
    - RHEL/CentOS: Fix dual stack IPv4/IPv6 configuration. [Andreas Karis]
    - disk_setup: fix several issues with gpt disk partitions. (LP: #1692087)
    - function spelling & docstring update [Joshua Powers]
    - tests: Fix unittest bug in ntp tests. [Joshua Powers]
    - tox: move pylint target to 1.7.1
    - Fix get_interfaces_by_mac for empty macs (LP: #1692028)
    - DigitalOcean: remove routes except for the public interface.
      [Ben Howard] (LP: #1681531.)
    - netplan: pass macaddress, when specified, for vlans
      [Dimitri John Ledkov] (LP: #1690388)
    - doc: various improvements for the docs on cc_users_groups.
      [Felix Dreissig]
    - cc_ntp: write template before installing and add service restart
      [Ryan Harper] (LP: #1645644)
    - tests: fix cloudstack unit tests to avoid accessing
      /var/lib/NetworkManager [Lars Kellogg-Stedman]
    - tests: fix hardcoded path to mkfs.ext4 [Joshua Powers] (LP: #1691517)
    - Actually skip warnings when .skip file is present.
      [Chris Brinker] (LP: #1691551)
    - netplan: fix netplan render_network_state signature.
      [Dimitri John Ledkov] (LP: #1685944)
    - Azure: fix reformatting of ephemeral disks on resize to large types.
      (LP: #1686514)
    - make deb: Add devscripts dependency for make deb.
      Cleanup packages/bddeb. [Chad Smith] (LP: #1685935)
    - openstack: fix log message copy/paste typo in _get_url_settings
      [Lars Kellogg-Stedman]
    - unittests: fix unittests run on centos [Joshua Powers]
    - Improve detection of snappy to include os-release and kernel cmdline.
      (LP: #1689944)
    - Add address to config entry generated by _klibc_to_config_entry.
      [Julien Castets] (LP: #1691135)
    - sysconfig: Raise ValueError when multiple default gateways are present.
      [Chad Smith] (LP: #1687485)
    - FreeBSD: improvements and fixes for use on Azure
      [Hongjiang Zhang] (LP: #1636345)
    - Add unit tests for ds-identify, fix Ec2 bug found.
    - fs_setup: if cmd is specified, use shell interpretation.
      [Paul Meyer] (LP: #1687712)
    - doc: document network c...

Read more...

Changed in cloud-init (Ubuntu Zesty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~16.10.1

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~16.10.1) yakkety-proposed; urgency=medium

  * New upstream snapshot.
    - net: fix reading and rendering addresses in cidr format.
      [Dimitri John Ledkov] (LP: #1689346, #1684349)
    - disk_setup: udev settle before attempting partitioning or fs creation.
      (LP: #1692093)
    - GCE: Update the attribute used to find instance SSH keys.
      [Daniel Watkins] (LP: #1693582)
    - nplan: For bonds, allow dashed or underscore names of keys.
      [Dimitri John Ledkov] (LP: #1690480)
    - tests: python2.6: fix unit tests usage of assertNone and format.
    - tests: update docstring on test_configured_list_with_none
    - fix tools/ds-identify to not write None twice.
    - tox/build: do not package depend on style requirements.
    - tests: ntp: Restructure cc_ntp unit tests. [Chad Smith]
    - flake8: move the pinned version of flake8 up to 3.3.0
    - tests: Apply workaround for snapd bug in test case. [Joshua Powers]
    - RHEL/CentOS: Fix dual stack IPv4/IPv6 configuration. [Andreas Karis]
    - disk_setup: fix several issues with gpt disk partitions. (LP: #1692087)
    - function spelling & docstring update [Joshua Powers]
    - tests: Fix unittest bug in ntp tests. [Joshua Powers]
    - tox: move pylint target to 1.7.1
    - Fix get_interfaces_by_mac for empty macs (LP: #1692028)
    - DigitalOcean: remove routes except for the public interface.
      [Ben Howard] (LP: #1681531.)
    - netplan: pass macaddress, when specified, for vlans
      [Dimitri John Ledkov] (LP: #1690388)
    - doc: various improvements for the docs on cc_users_groups.
      [Felix Dreissig]
    - cc_ntp: write template before installing and add service restart
      [Ryan Harper] (LP: #1645644)
    - tests: fix cloudstack unit tests to avoid accessing
      /var/lib/NetworkManager [Lars Kellogg-Stedman]
    - tests: fix hardcoded path to mkfs.ext4 [Joshua Powers] (LP: #1691517)
    - Actually skip warnings when .skip file is present.
      [Chris Brinker] (LP: #1691551)
    - netplan: fix netplan render_network_state signature.
      [Dimitri John Ledkov] (LP: #1685944)
    - Azure: fix reformatting of ephemeral disks on resize to large types.
      (LP: #1686514)
    - make deb: Add devscripts dependency for make deb.
      Cleanup packages/bddeb. [Chad Smith] (LP: #1685935)
    - openstack: fix log message copy/paste typo in _get_url_settings
      [Lars Kellogg-Stedman]
    - unittests: fix unittests run on centos [Joshua Powers]
    - Improve detection of snappy to include os-release and kernel cmdline.
      (LP: #1689944)
    - Add address to config entry generated by _klibc_to_config_entry.
      [Julien Castets] (LP: #1691135)
    - sysconfig: Raise ValueError when multiple default gateways are present.
      [Chad Smith] (LP: #1687485)
    - FreeBSD: improvements and fixes for use on Azure
      [Hongjiang Zhang] (LP: #1636345)
    - Add unit tests for ds-identify, fix Ec2 bug found.
    - fs_setup: if cmd is specified, use shell interpretation.
      [Paul Meyer] (LP: #1687712)
    - doc: document network...

Read more...

Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~16.04.1

---------------
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * New upstream snapshot.
    - net: fix reading and rendering addresses in cidr format.
      [Dimitri John Ledkov] (LP: #1689346, #1684349)
    - disk_setup: udev settle before attempting partitioning or fs creation.
      (LP: #1692093)
    - GCE: Update the attribute used to find instance SSH keys.
      [Daniel Watkins] (LP: #1693582)
    - nplan: For bonds, allow dashed or underscore names of keys.
      [Dimitri John Ledkov] (LP: #1690480)
    - tests: python2.6: fix unit tests usage of assertNone and format.
    - tests: update docstring on test_configured_list_with_none
    - fix tools/ds-identify to not write None twice.
    - tox/build: do not package depend on style requirements.
    - tests: ntp: Restructure cc_ntp unit tests. [Chad Smith]
    - flake8: move the pinned version of flake8 up to 3.3.0
    - tests: Apply workaround for snapd bug in test case. [Joshua Powers]
    - RHEL/CentOS: Fix dual stack IPv4/IPv6 configuration. [Andreas Karis]
    - disk_setup: fix several issues with gpt disk partitions. (LP: #1692087)
    - function spelling & docstring update [Joshua Powers]
    - tests: Fix unittest bug in ntp tests. [Joshua Powers]
    - tox: move pylint target to 1.7.1
    - Fix get_interfaces_by_mac for empty macs (LP: #1692028)
    - DigitalOcean: remove routes except for the public interface.
      [Ben Howard] (LP: #1681531.)
    - netplan: pass macaddress, when specified, for vlans
      [Dimitri John Ledkov] (LP: #1690388)
    - doc: various improvements for the docs on cc_users_groups.
      [Felix Dreissig]
    - cc_ntp: write template before installing and add service restart
      [Ryan Harper] (LP: #1645644)
    - tests: fix cloudstack unit tests to avoid accessing
      /var/lib/NetworkManager [Lars Kellogg-Stedman]
    - tests: fix hardcoded path to mkfs.ext4 [Joshua Powers] (LP: #1691517)
    - Actually skip warnings when .skip file is present.
      [Chris Brinker] (LP: #1691551)
    - netplan: fix netplan render_network_state signature.
      [Dimitri John Ledkov] (LP: #1685944)
    - Azure: fix reformatting of ephemeral disks on resize to large types.
      (LP: #1686514)
    - make deb: Add devscripts dependency for make deb.
      Cleanup packages/bddeb. [Chad Smith] (LP: #1685935)
    - openstack: fix log message copy/paste typo in _get_url_settings
      [Lars Kellogg-Stedman]
    - unittests: fix unittests run on centos [Joshua Powers]
    - Improve detection of snappy to include os-release and kernel cmdline.
      (LP: #1689944)
    - Add address to config entry generated by _klibc_to_config_entry.
      [Julien Castets] (LP: #1691135)
    - sysconfig: Raise ValueError when multiple default gateways are present.
      [Chad Smith] (LP: #1687485)
    - FreeBSD: improvements and fixes for use on Azure
      [Hongjiang Zhang] (LP: #1636345)
    - Add unit tests for ds-identify, fix Ec2 bug found.
    - fs_setup: if cmd is specified, use shell interpretation.
      [Paul Meyer] (LP: #1687712)
    - doc: document network ...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers