cloud-final.service does not run due to dependency cycle

Bug #1623868 reported by Martin Pitt
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Undecided
Unassigned
cloud-init (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

==== Begin SRU Template ====
[Impact]
As part of the change in bug 1576692, we made cloud-final.target
run After multi-user.target. That created a dependency loop between
cloud-init.target and multi-user.target and cloud-final.target.

Most of the time systemd would break that loop by dropping
cloud-init.target. But sometimes, it would break the loop by dropping
cloud-final.target, which would mean that user scripts do not run
and generally cloud-init doesn't finish.

[Test Case]
## Failure in a xenial image can only be reproduced by
## patching an image up to the previous xenial-proposed upload
## (0.7.7-31-g65ace7b-0ubuntu1~16.04.1), then cleaning it
## and then restarting. We will focus on verifying there is not
## a problem.

## Launch an instance and patch it up to xenial-proposed
$ release=xenial
$ name=x1
$ lxc launch ubuntu-daily:$release $name

# wait for it to boot
$ while ! lxc exec $name -- [ -e /run/cloud-init/result.json ]; do sleep 1; done

## Now update container, clean and reboot to show first boot
$ lxc exec $name -- sh -c '
    p=/etc/apt/sources.list.d/proposed.list
    echo deb http://archive.ubuntu.com/ubuntu xenial-proposed main > "$p" &&
    apt-get update -q && apt-get -qy install cloud-init'
$ lxc exec $name -- sh -c '
    cd /var/lib/cloud && for d in *; do [ "$d" = "seed" ] || rm -Rf "$d"; done
    rm -Rf /var/log/cloud-init*'

## This is like first boot now.
$ lxc exec $name reboot
$ while ! lxc exec $name -- [ -e /run/cloud-init/result.json ]; do sleep 1; done

## The services should show active
$ lxc exec $name -- journalctl | grep Break || echo Good, no breaks
$ lxc exec $name -- systemctl --no-pager status cloud-final.service
$ lxc exec $name -- systemctl --no-pager status cloud-init.target

[Regression Potential]
Playing with boot order can cause problems. Regression would be around
some targets not running. On a booted system this would show itself inx
  journalctl | grep -i Break
or
  journalctl | grep -i ordering

==== End SRU Template ====

With current yakkety cloud images (at least in Scalingstack), I often run into this dependency cycle at boot:

Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found ordering cycle on cloud-init.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-final.service/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on multi-user.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-init.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

● cloud-final.service - Execute cloud user/final scripts
   Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

Thus /var/lib/cloud/instance/boot-finished never gets written and thus waiting for an instance to init just times out.

This is with the most recent https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1

Related bugs:
 * bug 1576692: fully support package installation in systemd

Martin Pitt (pitti)
tags: added: regression-release yakkety
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Revision history for this message
Martin Pitt (pitti) wrote :

Most of the time (at least in LXD, timing issue) cloud-init.target is inactive instead, which is also unintended. This is because it is installed into multi-user.target and thus gets an implied Before=multi-user.target as well. With explicitly adding After=multi-user.target (since one part of it, cloud-final.service, runs after it) all units start and I cannot reproduce the cycle any more.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Running into the same since today.
Seems alto reproducible with uvtool.

uvt-kvm create --memory 2048 --cpu 4 --password=ubuntu yakkety-test-not-finalizing-cloud-init release=yakkety arch=amd64 label=daily

I see the final stage completing in the log:
journalctl | grep "final: SUCCESS"
Sep 15 12:37:26 yakkety-test-not-finalizing-cloud-init cloud-init[1450]: [CLOUDINIT] handlers.py[DEBUG]: finish: modules-final: SUCCESS: running modules for final

Yet I don't get /var/lib/cloud/instance/boot-finished created.

My case seems to match Martins second finding that cloud-init.target seems to be inactive.
systemctl status cloud-init.target
● cloud-init.target - Cloud-init target
   Loaded: loaded (/lib/systemd/system/cloud-init.target; static; vendor preset: enabled)
   Active: inactive (dead)

OTOH - The cloud-final.service looks good:
● cloud-final.service - Execute cloud user/final scripts
   Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
   Active: active (exited) since Thu 2016-09-15 12:37:26 UTC; 2min 20s ago
  Process: 1450 ExecStart=/usr/bin/cloud-init modules --mode=final (code=exited, status=0/SUCCESS)
 Main PID: 1450 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   Memory: 276.0K
      CPU: 1.654s
   CGroup: /system.slice/cloud-final.service

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

For completeness, I've attached journalctl of 2 vms.
Both are the same yakkety image, launched with no user-data within seconds of each other.
In one, systemd decided to:
   Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start

In the other:
   Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start

Experience right now is that it is much more common for systemd to delete cloud-init.target than cloud-final.service, but both do exist.

Revision history for this message
Scott Moser (smoser) wrote :

Just for ease of reading:

== deleting job cloud-init.target ==
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found ordering cycle on multi-user.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-init.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start
Sep 15 12:59:32 ubuntu systemd[1]: cloud-init.target: Job cloud-init.target/start deleted to break ordering cycle starting with multi-user.target/start

== deleting job cloud-final.target ==
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found ordering cycle on cloud-init.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-final.service/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on multi-user.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-init.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Martin, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Scott Moser (smoser)
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1

---------------
cloud-init (0.7.8-1-g3705bb5-0ubuntu1) yakkety; urgency=medium

  * New upstream release 0.7.8.
  * New upstream snapshot.
    - systemd: put cloud-init.target After multi-user.target (LP: #1623868)

 -- Scott Moser <email address hidden> Thu, 15 Sep 2016 09:47:11 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

verified with xenial-proposed and 0.7.8-1-g3705bb5-0ubuntu1~16.04.1 as described in bug.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.9 KiB)

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1

---------------
cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * New upstream release 0.7.8.
  * New upstream snapshot.
    - systemd: put cloud-init.target After multi-user.target (LP: #1623868)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  * debian/control: add Breaks of older versions of walinuxagent (LP: #1623570)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/control: fix missing dependency on python3-serial,
    and make SmartOS datasource work.
  * debian/cloud-init.templates fix capitalisation in template so
    dpkg-reconfigure works to select OpenStack. (LP: #1575727)
  * d/README.source, d/control, d/new-upstream-snapshot, d/rules: sync
    with yakkety for changes due to move to git.
  * d/rules: change PYVER=python3 to PYVER=3 to adjust to upstream change.
  * debian/rules, debian/cloud-init.install: remove install file
    to ensure expected files are collected into cloud-init deb.
    (LP: #1615745)
  * debian/dirs: remove obsolete / unused file.
  * upstream move from bzr to git.
  * New upstream snapshot.
    - Allow link type of null in network_data.json [Jon Grimm] (LP: #1621968)
    - DataSourceOVF: fix user-data as base64 with python3 (LP: #1619394)
    - remove obsolete .bzrignore
    - systemd: Better support package and upgrade. (LP: #1576692, #1621336)
    - tests: cleanup tempdirs in apt_source tests
    - apt config conversion: treat empty string as not provided. (LP: #1621180)
    - Fix typo in default keys for phone_home [Roland Sommer] (LP: #1607810)
    - salt minion: update default pki directory for newer salt minion.
      (LP: #1609899)
    - bddeb: add --release flag to specify the release in changelog.
    - apt-config: allow both old and new format to be present.
      [Christian Ehrhardt] (LP: #1616831)
    - python2.6: fix dict comprehension usage in _lsb_release. [Joshua Harlow]
    - Add a module that can configure spacewalk. [Joshua Harlow]
    - add install option for openrc [Matthew Thode]
    - Generate a dummy bond name for OpenStack (LP: #1605749)
    - network: fix get_interface_mac for bond slave, read_sys_net for ENOTDIR
    - azure dhclient-hook cleanups
    - Minor cleanups to atomic_helper and add unit tests.
    - Fix Gentoo net config generation [Matthew Thode]
    - distros: fix get_primary_arch method use of os.uname [Andrew Jorgensen]
    - Apt: add new apt configuration format [Christian Ehrhardt]
    - Get Azure endpoint server from DHCP client [Brent Baude]
    - DigitalOcean: use the v1.json endpoint [Ben Howard]
    - MAAS: add vendor-data support (LP: #1612313)
    - Upgrade to a configobj package new enough to work [Joshua Harlow]
    - ConfigDrive: recognize 'tap' as a link type. (LP: #1610784)
    - NoCloud: fix bug providing network-interfaces via meta-data.
      (LP: 1577982)
    - Add distro tags on config modules that should have it [Joshua Harlow]
    - ChangeLog: update changelog for previous commit.
    - add ntp config module [Ryan Harper]
    - SmartOS: more improvement...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
John Keenleyside (jkeenleyside) wrote :

What is the recommended way to have an application systemd service run after cloud-final.service ?

Revision history for this message
Scott Moser (smoser) wrote :

This is fixed in cloud-init 0.7.9.

Changed in cloud-init:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.