cloud-init on vivid upgrade causes sigterm, which aborts 'runcmd' execution

Bug #1438520 reported by David Britton on 2015-03-31
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
High
Unassigned

Bug Description

I used an openstack infrastructure with a vivid beta2 image. End result, if there is cloud-init upgrade available, it installs and abort parts of the cloud-init execution. (bad news for my user scripts!) I'm not sure of all the fallout, but at least my 'runcmd' section was not executed (grep the logs for 'runcmd').

From cloud-init-output.log:

Preparing to unpack .../cryptsetup-bin_2%3a1.6.1-1ubuntu7_amd64.deb ...^M
Unpacking cryptsetup-bin (2:1.6.1-1ubuntu7) over (2:1.6.1-1ubuntu5) ...^M
Preparing to unpack .../cryptsetup_2%3a1.6.1-1ubuntu7_amd64.deb ...^M
Unpacking cryptsetup (2:1.6.1-1ubuntu7) over (2:1.6.1-1ubuntu5) ...^M
Preparing to unpack .../cloud-init_0.7.7~bzr1087-0ubuntu1_all.deb ...^M
Cloud-init v. 0.7.7 running 'modules:final' at Tue, 31 Mar 2015 05:09:42 +0000. Up 848.15 seconds.
Cloud-init v. 0.7.7 finished at Tue, 31 Mar 2015 05:09:44 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 850.19 seconds

From cloud-init.log:

Mar 31 04:57:38 ubuntu [CLOUDINIT] util.py[DEBUG]: Running command ['eatmydata', 'apt-get', '--option=Dpkg::Options::=--force-confold', '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes', '--quiet', 'dist-upgrade'] with allowed return codes [0] (shell=False, capture=False)
Mar 31 05:09:41 ubuntu [CLOUDINIT] util.py[DEBUG]: Cloud-init 0.7.7 received SIGTERM, exiting...#012 Filename: /usr/lib/python3.4/subprocess.py#012 Function: _eintr_retry_call#012 Line number: 491#012 Filename: /usr/lib/python3.4/subprocess.py#012 Function: _try_wait#012 Line number: 1514#012 Filename: /usr/lib/python3.4/subprocess.py#012 Function: wait#012 Line number: 1566
Mar 31 05:09:41 ubuntu [CLOUDINIT] util.py[DEBUG]: apt-upgrade [eatmydata apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet dist-upgrade] took 722.766 seconds
Mar 31 05:09:41 ubuntu [CLOUDINIT] util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
Mar 31 05:09:41 ubuntu [CLOUDINIT] util.py[DEBUG]: Read 12 bytes from /proc/uptime
Mar 31 05:09:41 ubuntu [CLOUDINIT] util.py[DEBUG]: cloud-init mode 'modules' took 761.227 seconds (761.23)
Mar 31 05:09:42 ubuntu [CLOUDINIT] util.py[DEBUG]: Cloud-init v. 0.7.7 running 'modules:final' at Tue, 31 Mar 2015 05:09:42 +0000. Up 848.15 seconds.
Mar 31 05:09:44 ubuntu [CLOUDINIT] stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>
Mar 31 05:09:44 ubuntu [CLOUDINIT] stages.py[DEBUG]: Running module rightscale_userdata (<module 'cloudinit.config.cc_rightscale_userdata' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_rightscale_userdata.py'>) with frequency once-per-instance

I'll attach full cloud-init logs and the userdata. I used the following command to boot the instance:

nova boot --key-name dpb --user-data ~/test.txt --image fc7aedfd-f465-48b9-9fc6-c826f3a0e81b --flavor 2 vivid-test

and the image is this:

ubuntu-released/ubuntu-vivid-15.04-beta2-amd64-server-20150325-disk1.img

Related branches

David Britton (davidpbritton) wrote :
no longer affects: juju-core
Scott Moser (smoser) on 2015-04-01
Changed in cloud-init:
status: New → Confirmed
Scott Moser (smoser) wrote :
Download full text (5.6 KiB)

booted ubuntu-released/ubuntu-vivid-15.04-beta2-amd64-server-20150325-disk1.img
with user-data:
 |#cloud-config
 |package_upgrade: True
 |runcmd:
 | - [sh, '-c', 'echo ====== hi world ======= 2>&1 | tee /run/greeting.log']

the console (full attached) shows:
Cloud-init 0.7.7 received SIGTERM, exiting...
  Filename: /usr/lib/python3.4/subprocess.py
  Function: _eintr_retry_call
  Line number: 491
    Filename: /usr/lib/python3.4/subprocess.py
    Function: _try_wait
    Line number: 1514
      Filename: /usr/lib/python3.4/subprocess.py
      Function: wait
      Line number: 1566

/run/cloud-init/status.json shows:

$ sudo cat /run/cloud-init/status.json
{
 "v1": {
  "datasource": "DataSourceOpenStack [net,ver=2]",
  "init-local": {
   "start": 1427996872.914337,
   "errors": [],
   "finished": 1427996873.4352942
  },
  "init": {
   "start": 1427996873.7333322,
   "errors": [],
   "finished": 1427996878.1853242
  },
  "stage": null,
  "modules-config": {
   "start": 1427996878.593198,
   "errors": [],
   "finished": null
  },
  "modules-final": {
   "start": 1427996921.499426,
   "errors": [],
   "finished": 1427996921.752855
  }
 }
}

$ sudo cat /run/cloud-init/result.json
{
 "v1": {
  "errors": [],
  "datasource": "DataSourceOpenStack [net,ver=2]"
 }
}

Note, that 'modules-config' does not have a finished. and unfortunately, we see no 'errors' recordedin result.json

$ systemctl status cloud-config -l
● cloud-config.service - Apply the settings specified in cloud-config
   Loaded: loaded (/lib/systemd/system/cloud-config.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2015-04-02 17:48:40 UTC; 6min ago
 Main PID: 843 (code=exited, status=1/FAILURE)

Apr 02 17:48:40 inst-20150402-174733 cloud-init[843]: Preparing to unpack .../cloud-init_0.7.7~bzr1088-0ubuntu1_all.deb ...
Apr 02 17:48:40 inst-20150402-174733 cloud-init[843]: [CLOUDINIT] util.py[DEBUG]: Cloud-init 0.7.7 received SIGTERM, exiting...
                                                        Filename: /usr/lib/python3.4/subprocess.py
                                                        Function: _eintr_retry_call
                                                        Line number: 491
                                                          Filename: /usr/lib/python3.4/subprocess.py
                                                          Function: _try_wait
                                                          Line number: 1514
                                                            Filename: /usr/lib/python3.4/subprocess.py
                                                            Function: wait
                                                            Line number: 1566
Apr 02 17:48:40 inst-20150402-174733 cloud-init[843]: [CLOUDINIT] util.py[DEBUG]: apt-upgrade [eatmydata apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet dist-upgrade] took 31.555 seconds
Apr 02 17:48:40 inst-20150402-174733 cloud-init[843]: [CLOUDINIT] util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
Apr 02 17:48:40 inst-20150402-174733 cloud-init[843]: [CLOUDINIT] util.py[DEBUG]: R...

Read more...

Scott Moser (smoser) wrote :

Above was long winded. A summary.
 * cloud-init has 4 services and 1 target . see those at [1]
 * cloud-config.service invoked 'apt-get dist-upgrade', and received a SIGTERM
 * cloud-config.service *would* have subsequently written a file into /var/lib/cloud/instance/scripts/
 * cloud-final.service did run. Had it seen the script written it would have run it, but nothing was there to run.

So I think what happened is that the upgrade somehow sent SIGTERM to cloud-config.service which made it stop prematurely. I dont think that happened with upstart.

Any ideas here?
--
[1] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/files/head:/systemd/

Changed in cloud-init:
importance: Undecided → High
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: Undecided → High
tags: added: cloud-images systemd-boot
Didier Roche (didrocks) wrote :

 I actually think it's easier than what it seems.

Your cloud-init.service dist-upgrade and stops when it's running cloud-init postinst. This is because the prerm and then postinst contaings default systemd instructions:
prerm:
 deb-systemd-invoke stop cloud-final.service cloud-config.target cloud-config.service cloud-init-local.service cloud-init.service >/dev/null
-> stopping the service
postinst:
 deb-systemd-invoke start cloud-final.service cloud-config.target cloud-config.service cloud-init-local.service cloud-init.service >/dev/null || true
-> restarting the service

This will restart the service if already running (or start on install). I think you don't really want this.

Here is a patch to ensure we don't have that behavior. However, you still have the old prerm in the previous version of the package, so you need an upload/an image + another upload to test it I guess?

tags: added: patch
Scott Moser (smoser) wrote :

Talking with didrocks, there is no clear way to stop the installed prerm from running. We could definitely come up with some solution for addressing this problem, but have decided its easiest and acceptable to just upload and accept that images (not instances) that have the older version of cloud-init inside them will fail.

This change will essentially put us back to the behavior we had with upstart, which worked acceptbly for 4 years.

The end result is:
 * we'll release new dailies and new released images with this version inside.
 * As soon as there are daily images and released images available with this fixed version of cloud-init
 * users who run older images (beta-2) will still fail.
 * future upgrades of cloud-init will not suffer this issue.

Other things we could have done to address this:
 * cloud-init ignore sigterm and finish what it was doing. this would (if it was too slow) still get killed by systemd eventually, so not really a solution.
 * cloud-init not upgrade itself (apt-pin itself, then remove the pin).
 * cloud-init final could realize thta cloud-config hadn't finished and cloud-config's modules before running its own.

Scott Moser (smoser) wrote :

For doc and future reference, here is what I did to test that we'll be fine in the future.

 - launch an instance of vivid with unpatched cloud-init (0.7.7~bzr1088-0ubuntu1) and no user-data.
 - dpkg -i install a version with this fix in place (0ubuntu2~sm0)
 - upload to ppa an even newer version (0ubuntu2~sm1)
 - write the following into /etc/cloud/cloud.cfg.d/99-smoser.cfg
   |#cloud-config
   |package_upgrade: true
   |apt_sources:
   | - source: "ppa:smoser/ppa"
   |runcmd:
   | - [sh, '-c', 'echo ====== hi world ======= 2>&1 | tee /run/greeting.log']
 - rm -Rf /var/lib/cloud/ /var/log/cloud-init*

So at this point, the system now has a patched cloud-init installed, and its marker files signalling "first instance boot" have been removed. On reboot, it will add the ppa and attempt to upgrade, which will see the new cloud-init.

So to verify, we just ssh back in after reboot and check to make sure /run/greeting.log exists. It did.

Note, we could have also just added the ppa before reboot with 'add-apt-repository', which might have removed some moving parts.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.7~bzr1088-0ubuntu2

---------------
cloud-init (0.7.7~bzr1088-0ubuntu2) vivid; urgency=medium

  [ Didier Roche ]
  * Don't start or restart cloud-init services on install and upgrade
    (LP: #1438520)

  [ Scott Moser ]
  * d/control: Build-Depends on iproute2 (tests)
  * d/control: Only Recommend (not both Depend and Recommend)
    software-properties-common
 -- Scott Moser <email address hidden> Fri, 03 Apr 2015 11:13:28 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
affects: cloud-init → ubuntu
no longer affects: ubuntu
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers