Bug #1339866 “HA: juju behaves incorrectly when mongo on master ...” : Bugs : juju-core

Michael Foord (mfoord) on 2014-07-09

description:	updated
Changed in juju-core:
milestone:	none → 1.21-alpha1

Curtis Hovey (sinzui) on 2014-07-10

Changed in juju-core:
importance:	Undecided → High
status:	New → Triaged

Michael Foord (mfoord) on 2014-07-10

Changed in juju-core:
status:	Triaged → Invalid

Revision history for this message

Michael Foord (mfoord) wrote on 2014-07-10:

#1

After further experimentation, and verification of the *actual* specified behaviour, I can confirm that juju does behave correctly when mongo on the primary HA state server (or on a secondary) dies.

The symptom we saw that caused us to believe it didn't behave correctly was that the machine agent.conf was not rewritten, and the now-dead machine is still listed as an api server. However, this is actually the expected behaviour. When mongo goes down jujud remains up - but if it is the master it does shut down all the relevant jobs and workers (verified from the machine log) and the mongo primary fails over to a new machine which becomes the juju master. The old machine is left in the mongo replica set, and still listed as a valid apiserver, as it *may* come back. Running "juju ensure-availability" again will remove its entry (and also shut down the instance it runs on I believe).

Clients and machine agents have a list of all api servers, and if contacting one fails (e.g. our down machine) then they will automatically try the other entries in the list. So this behaviour is "as specified" and not a problem.

Ian Booth (wallyworld) on 2014-07-11

Changed in juju-core:
milestone:	1.21-alpha1 → none

Revision history for this message

julian wang (zeratul-j) wrote on 2014-10-19:

#2

Download full text (4.7 KiB)

We are trying to deploy juju HA on customer site.
With same scenario,
$ juju bootstrap
$ juju ensure-availability
Then kill mongo on machine 0 (master state server).
Juju stop working. (juju status not respond.) We think this is a bug for HA.

===== juju log ============
ubuntu@maas-trusty:~$ juju status --debug
2014-10-19 17:42:14 INFO juju.cmd supercommand.go:37 running juju [1.20.8-trusty-amd64 gc]
2014-10-19 17:42:14 DEBUG juju.conn api.go:187 trying cached API connection settings
2014-10-19 17:42:14 INFO juju.conn api.go:270 connecting to API addresses: [bootstrap-trusty-01.beijing.cts.canonical.com:17070 bootstrap-trusty-01.beijing.cts.canonical.com:17070 10.231.64.39:17070 bootstrap-trusty-02.beijing.cts.canonical.com:17070 bootstrap-trusty-02.beijing.cts.canonical.com:17070 10.231.64.87:17070 bootstrap-trusty-03.beijing.cts.canonical.com:17070 bootstrap-trusty-03.beijing.cts.canonical.com:17070 10.231.64.88:17070]
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-01.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-01.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.39:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-02.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-02.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.87:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 DEBUG juju.state.api apiclient.go:248 error dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api", will retry: websocket.Dial wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api: dial tcp 10.231.64.88:17070: connection refused
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 DEBUG juju.state.api apiclient.go:248 error dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api", will retry: websocket.Dial wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api: dial tcp 10.231.64.88:17070: connection refused
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.88:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:4...

We are trying to deploy juju HA on customer site.
With same scenario,
$ juju bootstrap
$ juju ensure-availability
Then kill mongo on machine 0 (master state server).
Juju stop working. (juju status not respond.) We think this is a bug for HA.

===== juju log ============
ubuntu@maas-trusty:~$ juju status --debug
2014-10-19 17:42:14 INFO juju.cmd supercommand.go:37 running juju [1.20.8-trusty-amd64 gc]
2014-10-19 17:42:14 DEBUG juju.conn api.go:187 trying cached API connection settings
2014-10-19 17:42:14 INFO juju.conn api.go:270 connecting to API addresses: [bootstrap-trusty-01.beijing.cts.canonical.com:17070 bootstrap-trusty-01.beijing.cts.canonical.com:17070 10.231.64.39:17070 bootstrap-trusty-02.beijing.cts.canonical.com:17070 bootstrap-trusty-02.beijing.cts.canonical.com:17070 10.231.64.87:17070 bootstrap-trusty-03.beijing.cts.canonical.com:17070 bootstrap-trusty-03.beijing.cts.canonical.com:17070 10.231.64.88:17070]
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-01.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-01.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.39:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-02.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-02.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.87:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 DEBUG juju.state.api apiclient.go:248 error dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api", will retry: websocket.Dial wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api: dial tcp 10.231.64.88:17070: connection refused
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 DEBUG juju.state.api apiclient.go:248 error dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api", will retry: websocket.Dial wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api: dial tcp 10.231.64.88:17070: connection refused
2014-10-19 17:42:14 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.88:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:14 DEBUG juju.state.api apiclient.go:248 error dialing "wss://10.231.64.88:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api", will retry: websocket.Dial wss://10.231.64.88:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api: dial tcp 10.231.64.88:17070: connection refused
2014-10-19 17:42:16 DEBUG juju.provider.maas environprovider.go:30 opening environment "maas".
2014-10-19 17:42:16 DEBUG juju.provider.common state.go:83 waiting for addresses of state server instances [/MAAS/api/1.0/nodes/node-cf194ab4-072b-11e4-beb5-00163e0f4f3c/]
2014-10-19 17:42:16 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-01.beijing.cts.canonical.com:17070/"
2014-10-19 17:42:16 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-01.beijing.cts.canonical.com:17070/"
2014-10-19 17:42:16 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"
2014-10-19 17:42:16 DEBUG juju.state.api apiclient.go:248 error dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api", will retry: websocket.Dial wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api: dial tcp 10.231.64.88:17070: connection refused
2014-10-19 17:42:16 INFO juju.state.api apiclient.go:242 dialing "wss://10.231.64.39:17070/"
2014-10-19 17:42:16 INFO juju.state.api apiclient.go:242 dialing "wss://bootstrap-trusty-03.beijing.cts.canonical.com:17070/environment/cf1c570b-0611-4585-8915-fc3fb53024d1/api"

Changed in juju-core:
status:	Invalid → Confirmed

julian wang (zeratul-j) on 2014-10-19

tags:

added: cts

Curtis Hovey (sinzui) on 2014-10-28

tags:	added: cts-cloud-escalation removed: cts
Changed in juju-core:
status:	Confirmed → Triaged
importance:	High → Medium

Revision history for this message

Alexis Bruemmer (alexis-bruemmer) wrote on 2016-09-22:

#3

please reopen this bug if it is still an issue on 2.0

Changed in juju-core:
status:	Triaged → Won't Fix

juju-core

HA: juju behaves incorrectly when mongo on master state server dies

Bug Description

Other bug subscribers

Related blueprints

Remote bug watches