17.1 update breaks EC2 nodes

Bug #1732917 reported by James Ravn on 2017-11-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
High
Unassigned
cloud-init (Ubuntu)
Medium
Unassigned

Bug Description

We updated from 0.7.9 to 17.1 on Ubuntu 16.04. After that, cloud-init fails to start.

Update:
Start-Date: 2017-11-15 06:03:19
Commandline: /usr/bin/unattended-upgrade
Upgrade: cloud-init:amd64 (0.7.9-233-ge586fe35-0ubuntu1~16.04.2, 17.1-27-geb292c18-0ubuntu1~16.04.1), ubuntu-standard:amd64 (1.361, 1.361.1), ubuntu-server:amd64 (1.361, 1.361.1), grub-legacy-ec2:amd64 (0.7.9-233-ge586fe35-0ubuntu1~16.04.2, 17.1-27-geb292c18-0ubuntu1~16.04.1), ubuntu-minimal:amd64 (1.361, 1.361.1)
End-Date: 2017-11-15 06:03:23

Failure:
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: 2017-11-17 13:53:12,947 - util.py[WARNING]: failed stage init
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: failed run of stage init
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: ------------------------------------------------------------
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: Traceback (most recent call last):
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 638, in status_wrapper
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: ret = functor(name, args)
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 357, in main_init
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 635, in apply_network_config
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: netcfg, src = self._find_networking_config()
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 622, in _find_networking_config
Nov 17 13:53:12 ip-10-50-198-224 systemd[1]: cloud-init.service: Unit entered failed state.
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: if self.datasource and hasattr(self.datasource, 'network_config'):
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceEc2.py", line 307, in network_config
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: net.get_interface_mac(self.fallback_nic): self.fallback_nic}
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 506, in get_interface_mac
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: if os.path.isdir(sys_dev_path(ifname, "bonding_slave")):
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 38, in sys_dev_path
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: return get_sys_class_path() + devname + "/" + path
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: TypeError: Can't convert 'NoneType' object to str implicitly
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: ------------------------------------------------------------
Nov 17 13:53:12 ip-10-50-198-224 systemd[1]: cloud-init.service: Failed with result 'exit-code'.

This is pretty serious. Is it normal to do a major version update on an LTS release?

Related branches

James Ravn (james-ravn) wrote :

This only seems to happen when trying to restart cloud-init on a running node after the update. Maybe some transient state that is incompatible after the update? After rebooting the node, cloud-init can be restarted successfully.

Scott Moser (smoser) wrote :

James,
Thanks for taking the time to report this.

We'll look at this asap.

could you post the full /var/log/cloud-init.log ?

Thanks.
Scott

Chad Smith (chad.smith) on 2017-11-17
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Scott Moser (smoser) wrote :

James,
Chad and I have been unsuccessful in reproducing this issue.
Could you provide some info?
Running 'cloud-init collect' should get most of the info we want.

Were you actually running this on Amazon AWS (as opposed to another cloud that immitates the EC2 Metadata service) ?

Thanks,
Scott

James Ravn (james-ravn) wrote :

Hi Scott, GMT time zone here so excuse the delay. Yes this is running on AWS, on t2 instances. I'll try to gather the logs for you.

James Ravn (james-ravn) wrote :
James Ravn (james-ravn) wrote :

I tried running 'cloud-init collect-logs' but the command just hangs.

Ryan Harper (raharper) wrote :

Is is possible for you to paste the entire /var/log/cloud-init.log file? The attached log only shows one stage of cloud-init boot.

In particular, cloud-init local mode will attempt to bring up a network interface to talk to Ec2 metadata service to collect network config; Given the failure path in the logs, it would indicate that init-local should fail as well and we'd like to confirm this path.

James Ravn (james-ravn) wrote :

Hi Ryan, all I did was remove entries prior to restarting cloud-init (the prior entries ended at Nov 15). But I can paste the entire thing here if it helps.

James Ravn (james-ravn) wrote :

The additional failures in that log starting at 21:10 are me trying to debug it by adding some print statements in the associated libraries. Nothing obvious - for whatever reason the fallback_nic is set to None which causes the error.

Scott Moser (smoser) on 2017-11-20
Changed in cloud-init:
status: New → Fix Committed
importance: Undecided → High
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: High → Medium
Scott Moser (smoser) wrote :

James,
We believe we have a fix for this now in trunk and will be uploading eventually to Ubuntu 16.04.
Your use case of running 'cloud-init init' after upgrade (without first running 'cloud-init init --local') is something we're confused by.

Is there a reason that you're doing that? Normally cloud-init is just run on boot.

Thanks.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 17.1-41-g76243487-0ubuntu1

---------------
cloud-init (17.1-41-g76243487-0ubuntu1) bionic; urgency=medium

  * debian/cloud-init.templates: Fix capitilazation in 'AliYun' name.
    (LP: #1728186)
  * New upstream snapshot.
    - integration test: replace curtin test ppa with cloud-init test ppa.
    - EC2: Fix bug using fallback_nic and metadata when restoring from cache.
      (LP: #1732917)
    - EC2: Kill dhclient process used in sandbox dhclient. (LP: #1732964)
    - ntp: fix configuration template rendering for openSUSE and SLES
      (LP: #1726572)
    - centos: Provide the failed #include url in error messages
    - Catch UrlError when #include'ing URLs [Andrew Jorgensen]
    - hosts: Fix openSUSE and SLES setup for /etc/hosts and clarify docs.
      [Robert Schweikert] (LP: #1731022)
    - rh_subscription: Perform null checks for enabled and disabled repos.
      [Dave Mulford]
    - Improve warning message when a template is not found.
      [Robert Schweikert] (LP: #1731035)
    - Replace the temporary i9n.brickies.net with i9n.cloud-init.io.
    - Azure: don't generate network configuration for SRIOV devices
      (LP: #1721579)
    - tests: address some minor feedback missed in last merge.
    - tests: integration test cleanup and full pass of nocloud-kvm.
    - Gentoo: chmod +x on all files in sysvinit/gentoo/
      [ckonstanski] (LP: #1727126)

 -- Chad Smith <email address hidden> Mon, 20 Nov 2017 15:18:52 -0700

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
James Ravn (james-ravn) wrote :

Thanks Scott. We set `manage_etc_hosts` to Yes when we configure new hosts. That requires a restart of cloud-init to pick up the changes, as far as I know.

Scott Moser (smoser) wrote :

James,
I do not think that manage_etc_hosts has any implication on cloud-init after an 'update'.
Unless your hostname were to change during that update.

I think the best thing for you to do is
a.) stop doing that
b.) figure out why you needed to do it and file a bug.

Basically we do not expect that you should need to run 'cloud-init init' on a running system. So... any reason for doing so is probably a bug, but we need more information.

Thanks!

James Ravn (james-ravn) wrote :

Changing manage_etc_hosts seems to do something. If I set it to "yes", and do "systemctl restart cloud-init", my /etc/hosts gets updated. The main reason I do this is as an easy way to get the local hostname+fqdn inside /etc/hosts, for resolution purposes. I could modify /etc/hosts directly, but I thought cloud-init would be the best way to do it. Is it not?

Cheers, James

This bug is believed to be fixed in cloud-init in 1705804. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers