When add-machine exits with nonzero status, rerunning add-machine can produce a duplicate

Bug #1606711 reported by Aaron Bentley
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Unassigned

Bug Description

As seen here:
http://reports.vapour.ws/releases/4175/job/manual-deploy-precise-amd64/attempt/3742

We ran "juju --debug add-machine -m manual-deploy-precise-amd64:manual-deploy-precise-amd64 ssh:ec2-54-242-93-248.compute-1.amazonaws.com". That exited with nonzero:

2016-07-26 19:20:48 ERROR juju.environs.manual provisioner.go:115 cannot obtain provisioning script
2016-07-26 19:20:48 DEBUG juju.api apiclient.go:546 health ping failed: connection is shut down
2016-07-26 19:20:48 DEBUG juju.api apiclient.go:546 health ping failed: connection is shut down
2016-07-26 19:20:48 ERROR cmd supercommand.go:458 connection is shut down
2016-07-26 19:20:48 DEBUG cmd supercommand.go:459 (error details: [{github.com/juju/juju/api/apiclient.go:584: } {github.com/juju/juju/rpc/client.go:149: } {github.com/juju/juju/rpc/client.go:12: connection is shut down}])

So we re-ran it:
WARNING add-machine failed. Will retry.
Sleeping for 30 seconds.
INFO juju --debug add-machine -m manual-deploy-precise-amd64:manual-deploy-precise-amd64 ssh:ec2-54-242-93-248.compute-1.amazonaws.com

And then we had two copies of the same machine:
  "0":
    juju-status:
      current: pending
      since: 26 Jul 2016 19:19:10Z
    dns-name: ec2-54-242-93-248.compute-1.amazonaws.com
    instance-id: manual:ec2-54-242-93-248.compute-1.amazonaws.com
    machine-status:
      current: pending
      since: 26 Jul 2016 19:19:10Z
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=7450M
  "1":
    juju-status:
      current: started
      since: 26 Jul 2016 19:21:12Z
      version: 2.0-beta14
    dns-name: ec2-54-242-93-248.compute-1.amazonaws.com
    instance-id: manual:ec2-54-242-93-248.compute-1.amazonaws.com
    machine-status:
      current: pending
      since: 26 Jul 2016 19:19:48Z
    series: precise
    hardware: arch=amd64 cpu-cores=1 mem=7450M

If add-machine fails, it should be safe to re-run it.

Revision history for this message
Aaron Bentley (abentley) wrote :

I think perhaps the best option would be if running add-machine the second time would retry adding it as machine 0.

Revision history for this message
Curtis Hovey (sinzui) wrote :

This issue is related to or maybe a duplicate of bug 1259496 and bug 1301565. Add-machines failed, but the state-server/controller still has a record of the machine. Juju failed to remove the failed machine.

affects: juju-core → juju
Revision history for this message
Anastasia (anastasia-macmood) wrote :

This is a very old report and I can no longer reproduce this. My guess is that the intermittent failure was addressed as part of the work that we have done since this bug was filed.

Changed in juju:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.