juju needs to correctly report failure to clean up manual machine

Bug #1645446 reported by james beedy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Medium
Unassigned

Bug Description

Using the manual provider, and juju-2.0 -> http://paste.ubuntu.com/23549833/

I'm getting this error after adding and removing a machine multiple times -> http://paste.ubuntu.com/23549809/

I've tried ssh'ing in and 'rm -rf /var/{log,lib}/juju`, and `sudo deluser --remove-all-files ubuntu` to no avail.

Revision history for this message
Richard Harding (rharding) wrote :

Looks like this is mongodb state.

Per
https://github.com/juju/juju/blob/6cf1bc9d917a4c56d0034fd6e0d6f394d6eddb6e/environs/manual/provisioner.go#L167

It's looking if it's provisioned and that's using the hostname as the nonce for the document below:

https://github.com/juju/juju/blob/ef17f71281d245540a9e5ed54a00095610ac0797/state/machine.go#L1450

Which means that document didn't get cleaned with removing of the machine at some point.

Not sure what failed on the clean up end.

Do you have any notes on how the machine was added/removed leading up to this?

Revision history for this message
james beedy (jamesbeedy) wrote :

Controller logs

Revision history for this message
james beedy (jamesbeedy) wrote :

Update

Whilst troubleshooting this issue, I killed my controller and rebootstrapped to the same machine (these machines are not ephemeral), to my dismay, I still get this error even after bootstrapping a new controller :-(

Revision history for this message
james beedy (jamesbeedy) wrote :

debug output from add-machine -> http://paste.ubuntu.com/23549954/

Revision history for this message
Richard Harding (rharding) wrote :

Update, I was wrong. It was hitting that there were left over upstart init scripts for

jujud-machine-17
jujud-unit-elasticsearch-5

Those were left over causing it to not be able to add-machine again.

I think we should make sure to make the logging clear on what we found as well as checking the code that removes those for how we might have had them left over.

Revision history for this message
james beedy (jamesbeedy) wrote :

manually removing /etc/init/juju* on the target machine fixes

Revision history for this message
Curtis Hovey (sinzui) wrote :

See http://bazaar.launchpad.net/~juju-qa/juju-ci-tools/trunk/view/head:/remove-manual-juju.bash for an example of how to properly check and clean if juju left something behind after a manual provisioning.

This issue might relate to bug 1642295 which CI sees from time to time.

tags: added: manual-provider
Revision history for this message
Curtis Hovey (sinzui) wrote :

SO to be clear, Juju *correctly* reported the host was provisioned. It neglected to state that this jight be because Juju failed to clean up.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Triaged to medium as there a manual workaround.

summary: - juju incorrectly reports machine already provisioned
+ juju needs to correctly report failure to clean up manual machine
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status: Triaged → Expired
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.