Azure: cannot start walinux agent (Transaction order is cyclic.)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
walinuxagent (Ubuntu) |
Fix Released
|
High
|
Scott Moser | ||
Xenial |
Fix Released
|
Medium
|
Unassigned |
Bug Description
==== Begin SRU Template ====
[Impact]
On Azure (and only on Azure), cloud-init utilizes the walinuxagent
to do some of the instance initialization. A change to cloud-init
to support installation of packages on a systemd system (bug 1576692),
exposed an issue in the walinuxagent.
The end result is freshly booted instances on azure with the cloud-init
fix and the old version of walinuxagent would fail to import ssh keys
that were provided to the instance.
[Test Case]
* Start an instance on Azure
* upgrade to cloud-init at 0.7.7-31-g65ace7b or better
this is in both xenial-proposed and yakkety.
* clean up, so it looks like a first boot.
sudo sh -c 'rm -Rf /var/lib/cloud /var/log/
* At this point a reboot would cause failure, and you would see WARN messages in the logs.
* upgrade walinuxagent with a fix (2.1.5-0ubuntu2 in yakkety)
* reboot
* ssh back in,
* There should be no WARN in /var/log/
grep WARN /var/log/
* There should exist a file /var/lib/
* walinuxagent.
systemctl status walinuxagent.
Note, that in event of failure you can still get into the system by
providing user-data that populates your ssh keys:
#!/bin/sh
ssh-import-id smoser
[Regression Potential]
Changing order of things in boot is scary. This is no different.
This would in theory allow walinux-agent to start earlier in the boot.
In practice, however, it is started by cloud-init.service so it was
already running that early in the boot.
==== End SRU Template ====
When bringing up the Azure datasource in cloud-init.service, cloud-init tries 'service start walinuxagent'.
That previously worked fine, and the agent would start and then would produce the certificate files that cloud-init needed (for ssh keys and things).
I found this when testing SRU for 0.7.7-31-
but it is likely present also in 0.7.7-31-
Now, however we see a log like:
Sep 14 14:53:18 smoser0914x [CLOUDINIT] DataSourceAzure
Sep 14 14:53:18 smoser0914x [CLOUDINIT] util.py[DEBUG]: Running command hostname with allowed return codes [0] (shell=False, capture=True)
Sep 14 14:53:18 smoser0914x [CLOUDINIT] DataSourceAzure
Sep 14 14:53:18 smoser0914x [CLOUDINIT] util.py[DEBUG]: Running command ['service', 'walinuxagent', 'start'] with allowed return codes [0] (shell=False, capture=True)
Sep 14 14:53:18 smoser0914x [CLOUDINIT] util.py[WARNING]: agent command '['service', 'walinuxagent', 'start']' failed.
Sep 14 14:53:19 smoser0914x [CLOUDINIT] util.py[DEBUG]: agent command '['service', 'walinuxagent', 'start']' failed.
Traceback (most recent call last):
File "/usr/lib/
invoke_
File "/usr/lib/
util.subp(cmd, shell=(not isinstance(cmd, list)))
File "/usr/lib/
cmd=args)
cloudinit.
Command: ['service', 'walinuxagent', 'start']
Exit code: 1
Reason: -
Stdout: ''
Stderr: "
Failed to start walinuxagent.
See system logs and 'systemctl status walinuxagent.
I believe the relevant change is in 34a26f7f
https:/
That added multi-user.target to the list of After for cloud-init-
Related bugs:
* bug 1576692: fully support package installation in systemd
Changed in cloud-init: | |
status: | New → Confirmed |
Changed in cloud-init (Ubuntu): | |
status: | New → Confirmed |
no longer affects: | ubuntu (Ubuntu) |
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in cloud-init (Ubuntu): | |
importance: | Undecided → High |
description: | updated |
Changed in walinuxagent (Ubuntu): | |
status: | New → In Progress |
assignee: | nobody → Scott Moser (smoser) |
importance: | Undecided → High |
Changed in walinuxagent (Ubuntu Xenial): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
description: | updated |
no longer affects: | cloud-init |
description: | updated |
description: | updated |
I backed out the single change below, and this functions again as it was. I've also tried using '--job- mode=ignore- dependencies' , but i still get the:
Failed to start walinuxagent. service: Transaction order is cyclic. See system logs for details. service' for details.
See system logs and 'systemctl status walinuxagent.
$ git diff 0.7.7..0.7.8 systemd/ cloud-final. service cloud-final. service b/systemd/ cloud-final. service cloud-final. service cloud-final. service Execute cloud user/final scripts network- online. target cloud-config. service rc-local.service network- online. target cloud-config. service rc-local.service multi-user.target network- online. target cloud-config. service
diff --git a/systemd/
index 3927710..b8f69b7 100644
--- a/systemd/
+++ b/systemd/
@@ -1,6 +1,6 @@
[Unit]
Description=
-After=
+After=
Wants=
[Service]