juju 2.0-beta12 ERROR unable to contact api server after 61 attempts: upgrade in progress (upgrade in progress)

Bug #1605313 reported by Felipe Reyes on 2016-07-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Undecided
Unassigned

Bug Description

I have an environment which consists of 3 VMs registered in MAAS that I have been using since approximately 2.0-beta7, and since I upgraded to beta12 in the bootstrap process I'm getting the following error:

$ juju bootstrap --keep-broken --upload-tools --no-gui --constraints tags=bootstrap mymaas mymaas
Creating Juju controller "mymaas" on mymaas
Bootstrapping model "controller"
Starting new instance for initial controller
Launching instance
WARNING no architecture was specified, acquiring an arbitrary node
 - 4y3h7s
Building tools to upload (2.0-beta12.1-xenial-amd64)
Installing Juju agent on bootstrap instance
Juju GUI installation has been disabled
Waiting for address
Attempting to connect to 192.168.30.3:22
Attempting to connect to fd39:6c94:2e21:a491:5054:ff:fe47:9fb5:22
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:5lZm8VTwGRekXIFJfVfHPckLGBmC57ZJ11BPcrUGCPs.
Please contact your system administrator.
Add correct host key in /home/ubuntu/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/ubuntu/.ssh/known_hosts:6
  remove with:
  ssh-keygen -f "/home/ubuntu/.ssh/known_hosts" -R 192.168.30.3
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: cloud-utils
Installing package: cloud-image-utils
Installing package: tmux
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
Bootstrap agent installed
Waiting for API to become available: upgrade in progress (upgrade in progress)
[...]
Waiting for API to become available: upgrade in progress (upgrade in progress)
ERROR unable to contact api server after 61 attempts: upgrade in progress (upgrade in progress)

Probably this is race condition that my env is triggering, because I could successfully deploy once.

In the machine-0.log ( http://paste.ubuntu.com/20323646/ ) there are a lot of "juju.api.watcher watcher.go:86 error trying to stop watcher: connection is shut down" and "juju.rpc server.go:540 error writing response: write tcp 127.0.0.1:17070->127.0.0.1:56230: write: broken pipe" errors.

Also inspecting /var/log/syslog I found a "exception: E11000 duplicate key error collection" error[0], but I'm not sure if this is something that juju internally manages properly or not.

[Other info]

* juju-2.0, version: 2.0-beta12~20160715~4141~abcd123-20160715+4141+abcd123~16.04
* attaching /var/log/ directory from the controller node.

[0] Jul 21 14:55:22 ubuntu mongod.37017[5332]: [conn22] command juju.ip.addresses command: insert { insert: "ip.addresses", documents: [ { _id: "f77a530a-e744-4dd2-83c8-60f84ca449a2:m#0#d#ens3#ip#192.168.30.3", model-uuid: "f77a530a-e744-4dd2-83c8-60f84ca449a2", providerid: "525", device-name: "ens3", machine-id: "0", subnet-cidr: "192.168.30.0/24", config-method: "manual", value: "192.168.30.3", dns-servers: [ "192.168.30.1" ], gateway-address: "192.168.30.1", txn-revno: 2, txn-queue: [ "5790e25a44e8db15873c746c_69c65be8" ] } ], writeConcern: { getLastError: 1, j: true }, ordered: true } ninserted:0 keyUpdates:0 writeConflicts:0 exception: E11000 duplicate key error collection: juju.ip.addresses index: _id_ dup key: { : "f77a530a-e744-4dd2-83c8-60f84ca449a2:m#0#d#ens3#ip#192.168.30.3" } code:11000 numYields:0 reslen:309 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 112777 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 123ms

Tags: sts Edit Tag help
Felipe Reyes (freyes) wrote :
tags: added: sts
Adam Stokes (adam-stokes) wrote :

I've been battling with the E11000 error a bunch, can you try the binaries from https://bugs.launchpad.net/juju-core/+bug/1604644/comments/11

I haven't run into this issue with those, make sure to use --upload-tools

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments