cloud-final.service does not run due to dependency cycle

Bug #1623868 reported by Martin Pitt on 2016-09-15
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Undecided
Unassigned
cloud-init (Ubuntu)
High
Unassigned
Xenial
Undecided
Unassigned

Bug Description

==== Begin SRU Template ====
[Impact]
As part of the change in bug 1576692, we made cloud-final.target
run After multi-user.target. That created a dependency loop between
cloud-init.target and multi-user.target and cloud-final.target.

Most of the time systemd would break that loop by dropping
cloud-init.target. But sometimes, it would break the loop by dropping
cloud-final.target, which would mean that user scripts do not run
and generally cloud-init doesn't finish.

[Test Case]
## Failure in a xenial image can only be reproduced by
## patching an image up to the previous xenial-proposed upload
## (0.7.7-31-g65ace7b-0ubuntu1~16.04.1), then cleaning it
## and then restarting. We will focus on verifying there is not
## a problem.

## Launch an instance and patch it up to xenial-proposed
$ release=xenial
$ name=x1
$ lxc launch ubuntu-daily:$release $name

# wait for it to boot
$ while ! lxc exec $name -- [ -e /run/cloud-init/result.json ]; do sleep 1; done

## Now update container, clean and reboot to show first boot
$ lxc exec $name -- sh -c '
    p=/etc/apt/sources.list.d/proposed.list
    echo deb http://archive.ubuntu.com/ubuntu xenial-proposed main > "$p" &&
    apt-get update -q && apt-get -qy install cloud-init'
$ lxc exec $name -- sh -c '
    cd /var/lib/cloud && for d in *; do [ "$d" = "seed" ] || rm -Rf "$d"; done
    rm -Rf /var/log/cloud-init*'

## This is like first boot now.
$ lxc exec $name reboot
$ while ! lxc exec $name -- [ -e /run/cloud-init/result.json ]; do sleep 1; done

## The services should show active
$ lxc exec $name -- journalctl | grep Break || echo Good, no breaks
$ lxc exec $name -- systemctl --no-pager status cloud-final.service
$ lxc exec $name -- systemctl --no-pager status cloud-init.target

[Regression Potential]
Playing with boot order can cause problems. Regression would be around
some targets not running. On a booted system this would show itself inx
  journalctl | grep -i Break
or
  journalctl | grep -i ordering

==== End SRU Template ====

With current yakkety cloud images (at least in Scalingstack), I often run into this dependency cycle at boot:

Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found ordering cycle on cloud-init.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-final.service/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on multi-user.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-init.target/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start
Sep 15 09:28:51 ubuntu systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

● cloud-final.service - Execute cloud user/final scripts
   Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

Thus /var/lib/cloud/instance/boot-finished never gets written and thus waiting for an instance to init just times out.

This is with the most recent https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1

Related bugs:
 * bug 1576692: fully support package installation in systemd

Martin Pitt (pitti) on 2016-09-15
tags: added: regression-release yakkety
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Martin Pitt (pitti) wrote :

Most of the time (at least in LXD, timing issue) cloud-init.target is inactive instead, which is also unintended. This is because it is installed into multi-user.target and thus gets an implied Before=multi-user.target as well. With explicitly adding After=multi-user.target (since one part of it, cloud-final.service, runs after it) all units start and I cannot reproduce the cycle any more.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed

Running into the same since today.
Seems alto reproducible with uvtool.

uvt-kvm create --memory 2048 --cpu 4 --password=ubuntu yakkety-test-not-finalizing-cloud-init release=yakkety arch=amd64 label=daily

I see the final stage completing in the log:
journalctl | grep "final: SUCCESS"
Sep 15 12:37:26 yakkety-test-not-finalizing-cloud-init cloud-init[1450]: [CLOUDINIT] handlers.py[DEBUG]: finish: modules-final: SUCCESS: running modules for final

Yet I don't get /var/lib/cloud/instance/boot-finished created.

My case seems to match Martins second finding that cloud-init.target seems to be inactive.
systemctl status cloud-init.target
● cloud-init.target - Cloud-init target
   Loaded: loaded (/lib/systemd/system/cloud-init.target; static; vendor preset: enabled)
   Active: inactive (dead)

OTOH - The cloud-final.service looks good:
● cloud-final.service - Execute cloud user/final scripts
   Loaded: loaded (/lib/systemd/system/cloud-final.service; enabled; vendor preset: enabled)
   Active: active (exited) since Thu 2016-09-15 12:37:26 UTC; 2min 20s ago
  Process: 1450 ExecStart=/usr/bin/cloud-init modules --mode=final (code=exited, status=0/SUCCESS)
 Main PID: 1450 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   Memory: 276.0K
      CPU: 1.654s
   CGroup: /system.slice/cloud-final.service

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

For completeness, I've attached journalctl of 2 vms.
Both are the same yakkety image, launched with no user-data within seconds of each other.
In one, systemd decided to:
   Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start

In the other:
   Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start

Experience right now is that it is much more common for systemd to delete cloud-init.target than cloud-final.service, but both do exist.

Scott Moser (smoser) wrote :

Just for ease of reading:

== deleting job cloud-init.target ==
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found ordering cycle on multi-user.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-init.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Sep 15 12:59:32 ubuntu systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start
Sep 15 12:59:32 ubuntu systemd[1]: cloud-init.target: Job cloud-init.target/start deleted to break ordering cycle starting with multi-user.target/start

== deleting job cloud-final.target ==
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found ordering cycle on cloud-init.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-final.service/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on multi-user.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Found dependency on cloud-init.target/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-init.target: Breaking ordering cycle by deleting job cloud-final.service/start
Sep 15 12:59:34 ubuntu systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

Hello Martin, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Scott Moser (smoser) on 2016-09-15
description: updated
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1

---------------
cloud-init (0.7.8-1-g3705bb5-0ubuntu1) yakkety; urgency=medium

  * New upstream release 0.7.8.
  * New upstream snapshot.
    - systemd: put cloud-init.target After multi-user.target (LP: #1623868)

 -- Scott Moser <email address hidden> Thu, 15 Sep 2016 09:47:11 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Scott Moser (smoser) wrote :

verified with xenial-proposed and 0.7.8-1-g3705bb5-0ubuntu1~16.04.1 as described in bug.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :
Download full text (3.9 KiB)

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1

---------------
cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * New upstream release 0.7.8.
  * New upstream snapshot.
    - systemd: put cloud-init.target After multi-user.target (LP: #1623868)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  * debian/control: add Breaks of older versions of walinuxagent (LP: #1623570)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/control: fix missing dependency on python3-serial,
    and make SmartOS datasource work.
  * debian/cloud-init.templates fix capitalisation in template so
    dpkg-reconfigure works to select OpenStack. (LP: #1575727)
  * d/README.source, d/control, d/new-upstream-snapshot, d/rules: sync
    with yakkety for changes due to move to git.
  * d/rules: change PYVER=python3 to PYVER=3 to adjust to upstream change.
  * debian/rules, debian/cloud-init.install: remove install file
    to ensure expected files are collected into cloud-init deb.
    (LP: #1615745)
  * debian/dirs: remove obsolete / unused file.
  * upstream move from bzr to git.
  * New upstream snapshot.
    - Allow link type of null in network_data.json [Jon Grimm] (LP: #1621968)
    - DataSourceOVF: fix user-data as base64 with python3 (LP: #1619394)
    - remove obsolete .bzrignore
    - systemd: Better support package and upgrade. (LP: #1576692, #1621336)
    - tests: cleanup tempdirs in apt_source tests
    - apt config conversion: treat empty string as not provided. (LP: #1621180)
    - Fix typo in default keys for phone_home [Roland Sommer] (LP: #1607810)
    - salt minion: update default pki directory for newer salt minion.
      (LP: #1609899)
    - bddeb: add --release flag to specify the release in changelog.
    - apt-config: allow both old and new format to be present.
      [Christian Ehrhardt] (LP: #1616831)
    - python2.6: fix dict comprehension usage in _lsb_release. [Joshua Harlow]
    - Add a module that can configure spacewalk. [Joshua Harlow]
    - add install option for openrc [Matthew Thode]
    - Generate a dummy bond name for OpenStack (LP: #1605749)
    - network: fix get_interface_mac for bond slave, read_sys_net for ENOTDIR
    - azure dhclient-hook cleanups
    - Minor cleanups to atomic_helper and add unit tests.
    - Fix Gentoo net config generation [Matthew Thode]
    - distros: fix get_primary_arch method use of os.uname [Andrew Jorgensen]
    - Apt: add new apt configuration format [Christian Ehrhardt]
    - Get Azure endpoint server from DHCP client [Brent Baude]
    - DigitalOcean: use the v1.json endpoint [Ben Howard]
    - MAAS: add vendor-data support (LP: #1612313)
    - Upgrade to a configobj package new enough to work [Joshua Harlow]
    - ConfigDrive: recognize 'tap' as a link type. (LP: #1610784)
    - NoCloud: fix bug providing network-interfaces via meta-data.
      (LP: 1577982)
    - Add distro tags on config modules that should have it [Joshua Harlow]
    - ChangeLog: update changelog for previous commit.
    - add ntp config module [Ryan Harper]
    - SmartOS: more improvement...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

What is the recommended way to have an application systemd service run after cloud-final.service ?

Scott Moser (smoser) wrote :

This is fixed in cloud-init 0.7.9.

Changed in cloud-init:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers