Sending a SIGABRT to jujud process causes jujud to uninstall (wiping /var/lib/juju)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Fix Released
|
High
|
Andrew Wilkins | ||
1.25 |
Fix Released
|
High
|
Andrew Wilkins |
Bug Description
[Environment]
This has been observed in 2 different environments:
Both Trusty 14.04.2
Juju-core 1.23.3
Juju-core 1.20.9
[Description]
We initially faced this issue by running the following sequence on the bootstrap node with 1.23.3,
this is not a normal operation made on any juju installation, but this leaded to discover the issue.
0) unlink /var/lib/
1) ln -s /var/lib/
2) Edit the agent.conf on the machine, pointing the upgradedTo: 1.23.2
3) $ restart jujud-machine-0
Then the following log entries were printed:
2015-06-10 10:03:45 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-10 10:03:45 ERROR juju.worker runner.go:207 fatal "state": agent should be terminated
2015-06-10 10:03:45 DEBUG juju.worker runner.go:241 killing "statestarter"
2015-06-10 10:03:45 DEBUG juju.worker runner.go:241 killing "termination"
2015-06-10 10:03:47 INFO juju.worker runner.go:260 start "api"
2015-06-10 10:03:47 INFO juju.state.api apiclient.go:242 dialing "wss://
2015-06-10 10:03:47 INFO juju.state.api apiclient.go:250 error dialing "wss://
2015-06-10 10:03:47 ERROR juju.worker runner.go:218 exited "api": unable to connect to "wss://
2015-06-10 10:03:50 ERROR juju.cmd supercommand.go:323 uninstall failed: [remove /var/lib/juju: directory not empty]
/bin/sh: 1: exec: /var/lib/
From this point juju was uninstalled, we discovered that sending a 'killall -SIGABRT jujud' causes juju to uninstall.
machine-2[20270]: 2015-06-11 15:18:22 ERROR juju.worker runner.go:219 exited "api": watcher has been stopped
unit-percona-
unit-percona-
unit-percona-
unit-percona-
unit-percona-
unit-percona-
unit-percona-
unit-percona-
unit-percona-
At this point /var/lib/juju has been removed from the system.
[ Suggestion ]
Currently the provisioner has a 'provisioner-
juju to take over an environment in case of any failure.
I would like to suggest to have something similar for the machine agent workers. 'workers-safe-mode' ? that prevents
jujud to uninstall itself in case of any worker error.
tags: | added: cts |
tags: | added: sts |
tags: | removed: cts |
Changed in juju-core: | |
importance: | Medium → High |
Changed in juju-core: | |
milestone: | none → 1.26-alpha1 |
Changed in juju-core: | |
status: | Triaged → In Progress |
Changed in juju-core: | |
status: | In Progress → Fix Committed |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
I was able to repro this on 1.24 using a local environment, and running killall -SIGABRT jujud in one of the containers brought up by add-machine.
This is the key line in the log:
2015-06-11 16:19:57 ERROR juju.worker runner.go:208 fatal "termination": agent should be terminated