If /var/lib/cloud disappears in vsphere deployed units, subsequent reboots cause jujud to uninstall itself.
The relevant issue seems to be that jujud can't access the vsphere endpoint because of an absent configuration file in under /var/lib/cloud and as a result is queues up it's on uninstallation.
This occurs during testing of cloud-init development packages. I need to remove /var/lib/cloud so cloud-init thinks the system is 'greenfield' or 'new' and re-runs through all cloud-init boot stages. The issue appears to be the user-data that juju provides writes out systemd unit and service files which don't wait on the completion of cloud-init. So, in cloud-init's 'fresh boot' scenario, jujud beats cloud-init setup and is missing some user-data it needs to properly get access vsphere api. As a result, it thinks it's improperly configured and removes itself.
Steps to reproduce:
juju bootstrap yourvspherecloud
juju add-unit ubuntu
juju ssh ubuntu/1 'sudo rm -rf /var/lib/cloud; sudo reboot'
juju status # node ubuntu/1 jujud agent will never report active again
juju ssh ubuntu/1 # won't ever connect
If jujud has a dependency on files in /var/lib/cloud, the fix would be to order jujud systemd services/units after cloud-init completes this would prevent jujud from running before cloud-init has written user-data/metadata artifacts in /var/lib/cloud.
I was able to validate that juju units don't remove themselves across clean-reboot by adding
"After=cloud-final.service" to the [Unit] section of the systemd service/unit files
/var/lib/juju/init/jujud-machine-11/jujud-machine-11.service.
Excerpt of logs obtained via vsphere system console of ubuntu unit in error:
If /var/lib/cloud disappears in vsphere deployed units, subsequent reboots cause jujud to uninstall itself.
The relevant issue seems to be that jujud can't access the vsphere endpoint because of an absent configuration file in under /var/lib/cloud and as a result is queues up it's on uninstallation.
This occurs during testing of cloud-init development packages. I need to remove /var/lib/cloud so cloud-init thinks the system is 'greenfield' or 'new' and re-runs through all cloud-init boot stages. The issue appears to be the user-data that juju provides writes out systemd unit and service files which don't wait on the completion of cloud-init. So, in cloud-init's 'fresh boot' scenario, jujud beats cloud-init setup and is missing some user-data it needs to properly get access vsphere api. As a result, it thinks it's improperly configured and removes itself.
Steps to reproduce:
juju bootstrap yourvspherecloud
juju add-unit ubuntu
juju ssh ubuntu/1 'sudo rm -rf /var/lib/cloud; sudo reboot'
juju status # node ubuntu/1 jujud agent will never report active again
juju ssh ubuntu/1 # won't ever connect
juju's user-data link lines 198-203: https:/ /paste. ubuntu. com/p/r3TjJ6Ssv s/
If jujud has a dependency on files in /var/lib/cloud, the fix would be to order jujud systemd services/units after cloud-init completes this would prevent jujud from running before cloud-init has written user-data/metadata artifacts in /var/lib/cloud.
I was able to validate that juju units don't remove themselves across clean-reboot by adding cloud-final. service" to the [Unit] section of the systemd service/unit files juju/init/ jujud-machine- 11/jujud- machine- 11.service.
"After=
/var/lib/
Excerpt of logs obtained via vsphere system console of ubuntu unit in error:
2018-02-22 17:28:21 DEBUG juju.api apiclient.go:843 successfully dialed "wss:// 10.245. 201.51: 17070/model/ c559be05- e095-4eab- 8cf9-bd904e5878 96/api" 10.245. 201.51: 17070/model/ c559be05- e095-4eab- 8cf9-bd904e5878 96/api" dependency engine.go:504 "migration- fortress" manifold worker stopped: "upgrade- check-flag" not set: dependency not available apicaller connect.go:152 failed to connect dependency engine.go:504 "api-caller" manifold worker stopped: cannot open api: connection permanently impossible dependency engine.go:504 "api-config- watcher" manifold worker stopped: <nil> dependency engine.go:504 "clock" manifold worker stopped: <nil> dependency engine.go:504 "upgrade- check-flag" manifold worker stopped: <nil> dependency engine.go:504 "upgrade- check-gate" manifold worker stopped: <nil> dependency engine.go:504 "upgrade- steps-gate" manifold worker stopped: <nil> dependency engine.go:504 "agent" manifold worker stopped: <nil> stateconfigwatc her manifold.go:122 tomb dying dependency engine.go:504 "state- config- watcher" manifold worker stopped: <nil> dependency engine.go:504 "upgrade- steps-flag" manifold worker stopped: <nil> dependency engine.go:504 "termination- signal- handler" manifold worker stopped: <nil> introspection socket.go:125 stats worker closing listener introspection socket.go:119 stats worker serving finished introspection socket.go:129 stats worker finished systemd service.go:377 service "jujud-machine-11" successfully removed systemd service.go:329 service "juju-db" not running systemd service.go:371 service "juju-db" not installed juju/agents/ unit-ubuntu- 14: directory not empty] com/juju/ juju/cmd/ jujud/agent/ machine. go:1755: uninstall failed: [remove /var/lib/ juju/agents/ unit-ubuntu- 14: directory not empty]
2018-02-22 17:28:21 INFO juju.api apiclient.go:597 connection established to "wss://
2018-02-22 17:28:21 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 INFO juju.agent uninstall.go:36 marking agent ready for uninstall
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 INFO juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 INFO juju.worker runner.go:483 stopped "engine", err: agent should be terminated
2018-02-22 17:28:22 DEBUG juju.cmd.jujud introspection.go:64 engine stopped, stopping introspection
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.worker.
2018-02-22 17:28:22 DEBUG juju.cmd.jujud introspection.go:67 introspection stopped
2018-02-22 17:28:22 DEBUG juju.worker runner.go:332 "engine" done: agent should be terminated
2018-02-22 17:28:22 ERROR juju.worker runner.go:381 fatal "engine": agent should be terminated
2018-02-22 17:28:22 INFO juju.agent uninstall.go:47 agent already marked ready for uninstall
2018-02-22 17:28:22 INFO juju.cmd.jujud machine.go:1712 uninstalling agent
2018-02-22 17:28:22 DEBUG juju.service discovery.go:115 discovered init system "systemd" from local host
2018-02-22 17:28:22 DEBUG juju.service.
2018-02-22 17:28:22 DEBUG juju.service discovery.go:115 discovered init system "systemd" from local host
2018-02-22 17:28:22 DEBUG juju.service.
2018-02-22 17:28:22 DEBUG juju.service.
ERROR uninstall failed: [remove /var/lib/
2018-02-22 17:28:22 DEBUG cmd supercommand.go:459 error stack:
github.
2018-02-22 17:28:22 DEBUG juju.cmd.jujud main.go:187 jujud complete, code 0, err <nil>
2018-02-22 17:28:31 INFO juju.cmd supercommand.go:56 running jujud [2.3.4 gc go1.9.2]
Full logs: paste.ubuntu. com/p/2HWR94JFh h/ paste.ubuntu. com/p/rDXqz2ZWN S/
machine http://
unit http://