17.1 update breaks EC2 nodes

Bug #1732917 reported by James Ravn
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
High
Unassigned
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

We updated from 0.7.9 to 17.1 on Ubuntu 16.04. After that, cloud-init fails to start.

Update:
Start-Date: 2017-11-15 06:03:19
Commandline: /usr/bin/unattended-upgrade
Upgrade: cloud-init:amd64 (0.7.9-233-ge586fe35-0ubuntu1~16.04.2, 17.1-27-geb292c18-0ubuntu1~16.04.1), ubuntu-standard:amd64 (1.361, 1.361.1), ubuntu-server:amd64 (1.361, 1.361.1), grub-legacy-ec2:amd64 (0.7.9-233-ge586fe35-0ubuntu1~16.04.2, 17.1-27-geb292c18-0ubuntu1~16.04.1), ubuntu-minimal:amd64 (1.361, 1.361.1)
End-Date: 2017-11-15 06:03:23

Failure:
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: 2017-11-17 13:53:12,947 - util.py[WARNING]: failed stage init
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: failed run of stage init
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: ------------------------------------------------------------
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: Traceback (most recent call last):
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 638, in status_wrapper
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: ret = functor(name, args)
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 357, in main_init
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 635, in apply_network_config
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: netcfg, src = self._find_networking_config()
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 622, in _find_networking_config
Nov 17 13:53:12 ip-10-50-198-224 systemd[1]: cloud-init.service: Unit entered failed state.
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: if self.datasource and hasattr(self.datasource, 'network_config'):
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceEc2.py", line 307, in network_config
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: net.get_interface_mac(self.fallback_nic): self.fallback_nic}
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 506, in get_interface_mac
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: if os.path.isdir(sys_dev_path(ifname, "bonding_slave")):
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 38, in sys_dev_path
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: return get_sys_class_path() + devname + "/" + path
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: TypeError: Can't convert 'NoneType' object to str implicitly
Nov 17 13:53:13 ip-10-50-198-224 cloud-init[11795]: ------------------------------------------------------------
Nov 17 13:53:12 ip-10-50-198-224 systemd[1]: cloud-init.service: Failed with result 'exit-code'.

This is pretty serious. Is it normal to do a major version update on an LTS release?

Related branches

Revision history for this message
James Ravn (james-ravn) wrote :

This only seems to happen when trying to restart cloud-init on a running node after the update. Maybe some transient state that is incompatible after the update? After rebooting the node, cloud-init can be restarted successfully.

Revision history for this message
Scott Moser (smoser) wrote :

James,
Thanks for taking the time to report this.

We'll look at this asap.

could you post the full /var/log/cloud-init.log ?

Thanks.
Scott

Chad Smith (chad.smith)
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Revision history for this message
Scott Moser (smoser) wrote :

James,
Chad and I have been unsuccessful in reproducing this issue.
Could you provide some info?
Running 'cloud-init collect' should get most of the info we want.

Were you actually running this on Amazon AWS (as opposed to another cloud that immitates the EC2 Metadata service) ?

Thanks,
Scott

Revision history for this message
James Ravn (james-ravn) wrote :

Hi Scott, GMT time zone here so excuse the delay. Yes this is running on AWS, on t2 instances. I'll try to gather the logs for you.

Revision history for this message
James Ravn (james-ravn) wrote :
Revision history for this message
James Ravn (james-ravn) wrote :

I tried running 'cloud-init collect-logs' but the command just hangs.

Revision history for this message
Ryan Harper (raharper) wrote :

Is is possible for you to paste the entire /var/log/cloud-init.log file? The attached log only shows one stage of cloud-init boot.

In particular, cloud-init local mode will attempt to bring up a network interface to talk to Ec2 metadata service to collect network config; Given the failure path in the logs, it would indicate that init-local should fail as well and we'd like to confirm this path.

Revision history for this message
James Ravn (james-ravn) wrote :

Hi Ryan, all I did was remove entries prior to restarting cloud-init (the prior entries ended at Nov 15). But I can paste the entire thing here if it helps.

Revision history for this message
James Ravn (james-ravn) wrote :

The additional failures in that log starting at 21:10 are me trying to debug it by adding some print statements in the associated libraries. Nothing obvious - for whatever reason the fallback_nic is set to None which causes the error.

Scott Moser (smoser)
Changed in cloud-init:
status: New → Fix Committed
importance: Undecided → High
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: High → Medium
Revision history for this message
Scott Moser (smoser) wrote :

James,
We believe we have a fix for this now in trunk and will be uploading eventually to Ubuntu 16.04.
Your use case of running 'cloud-init init' after upgrade (without first running 'cloud-init init --local') is something we're confused by.

Is there a reason that you're doing that? Normally cloud-init is just run on boot.

Thanks.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 17.1-41-g76243487-0ubuntu1

---------------
cloud-init (17.1-41-g76243487-0ubuntu1) bionic; urgency=medium

  * debian/cloud-init.templates: Fix capitilazation in 'AliYun' name.
    (LP: #1728186)
  * New upstream snapshot.
    - integration test: replace curtin test ppa with cloud-init test ppa.
    - EC2: Fix bug using fallback_nic and metadata when restoring from cache.
      (LP: #1732917)
    - EC2: Kill dhclient process used in sandbox dhclient. (LP: #1732964)
    - ntp: fix configuration template rendering for openSUSE and SLES
      (LP: #1726572)
    - centos: Provide the failed #include url in error messages
    - Catch UrlError when #include'ing URLs [Andrew Jorgensen]
    - hosts: Fix openSUSE and SLES setup for /etc/hosts and clarify docs.
      [Robert Schweikert] (LP: #1731022)
    - rh_subscription: Perform null checks for enabled and disabled repos.
      [Dave Mulford]
    - Improve warning message when a template is not found.
      [Robert Schweikert] (LP: #1731035)
    - Replace the temporary i9n.brickies.net with i9n.cloud-init.io.
    - Azure: don't generate network configuration for SRIOV devices
      (LP: #1721579)
    - tests: address some minor feedback missed in last merge.
    - tests: integration test cleanup and full pass of nocloud-kvm.
    - Gentoo: chmod +x on all files in sysvinit/gentoo/
      [ckonstanski] (LP: #1727126)

 -- Chad Smith <email address hidden> Mon, 20 Nov 2017 15:18:52 -0700

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
James Ravn (james-ravn) wrote :

Thanks Scott. We set `manage_etc_hosts` to Yes when we configure new hosts. That requires a restart of cloud-init to pick up the changes, as far as I know.

Revision history for this message
Scott Moser (smoser) wrote :

James,
I do not think that manage_etc_hosts has any implication on cloud-init after an 'update'.
Unless your hostname were to change during that update.

I think the best thing for you to do is
a.) stop doing that
b.) figure out why you needed to do it and file a bug.

Basically we do not expect that you should need to run 'cloud-init init' on a running system. So... any reason for doing so is probably a bug, but we need more information.

Thanks!

Revision history for this message
James Ravn (james-ravn) wrote :

Changing manage_etc_hosts seems to do something. If I set it to "yes", and do "systemctl restart cloud-init", my /etc/hosts gets updated. The main reason I do this is as an easy way to get the local hostname+fqdn inside /etc/hosts, for resolution purposes. I could modify /etc/hosts directly, but I thought cloud-init would be the best way to do it. Is it not?

Cheers, James

Revision history for this message
Scott Moser (smoser) wrote : Fixed in Cloud-init 1705804

This bug is believed to be fixed in cloud-init in 1705804. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.