Canonical Juju

Bug #1720251
Comment #19

Comment 19 for bug 1720251

Revision history for this message

John A Meinel (jameinel) wrote on 2017-12-13:

#19

On a machine demonstrating this, we get:
rs.status()
{
        "set" : "juju",
        "date" : ISODate("2017-12-13T16:35:05.020Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 1,
                        "name" : "10.245.208.204:37017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 76881,
                        "optime" : {
                                "ts" : Timestamp(1513182904, 5),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-12-13T16:35:04Z"),
                        "electionTime" : Timestamp(1513106027, 1),
                        "electionDate" : ISODate("2017-12-12T19:13:47Z"),
                        "configVersion" : 1,
                        "self" : true
                }
        ],
        "ok" : 1
}

and
> rs.config()
{
        "_id" : "juju",
        "version" : 1,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 1,
                        "host" : "10.245.208.204:37017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
                                "juju-machine-id" : "0"
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("5a302a6ac77b4ca613f5d47d")
        }
}

and machine-0.log is saying:

2017-12-13 16:35:43 DEBUG juju.worker.peergrouper publish.go:43 API host ports have not changed
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:38 calculating desired peer group
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:43 members: ...
&peergrouper.machine{id: "0", wantsVote: true, hostPorts: [10.245.208.204:37017 127.0.0.1:37017 [::1]:37017]}: rs_id=1, rs_addr=10.245.208.204:37017
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:44 extra: []replicaset.Member(nil)
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:45 maxId: 1
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:117 assessing possible peer group changes:
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:124 machine "0" is already voting
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:142 assessed
2017-12-13 16:35:43 DEBUG juju.mongo mongo.go:288 selecting mongo peer hostPort by scope from [10.245.208.204:37017 127.0.0.1:37017 [::1]:37017]
2017-12-13 16:35:43 DEBUG juju.network address.go:370 selected "10.245.208.204:37017" as controller host:port, using scope selection
2017-12-13 16:35:43 DEBUG juju.mongo mongo.go:288 selecting mongo peer hostPort by scope from [10.245.208.204:37017 127.0.0.1:37017 [::1]:37017]
2017-12-13 16:35:43 DEBUG juju.network address.go:370 selected "10.245.208.204:37017" as controller host:port, using scope selection
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper worker.go:494 no change in desired peer group, voting:
0: true

At this point, that is only a couple of minutes old.

It does claim that it wasn't able to find a space for the controllers, but it also seems to want to put 0, 1, 2 into a replicaset:
> db.controllers.find({"_id": "e"}).pretty()
{
        "_id" : "e",
        "cloud" : "foundations-maas",
        "model-uuid" : "f27da689-56d0-4c2f-8e80-4485b7e9d074",
        "machineids" : [
                "0"
        ],
        "votingmachineids" : [
                "0",
                "1",
                "2"
        ],
        "mongo-space-name" : "",
        "mongo-space-state" : "invalid",
        "txn-revno" : NumberLong(5),
        "txn-queue" : [ ]
}

$ juju show-controller
...
  controller-machines:
    "0":
      instance-id: wccabc
      ha-status: ha-enabled
    "1":
      instance-id: fk44nr
      ha-status: ha-pending
    "2":
      instance-id: xcnwhm
      ha-status: ha-pending
...

So far, nothing obvious. They all look to be up and happy.

$ juju status -m controller
...
Machine State DNS Inst id Series AZ Message
0 started 10.245.208.204 wccabc xenial default Deployed
1 started 10.245.208.200 fk44nr xenial zone2 Deployed
2 started 10.245.208.201 xcnwhm xenial zone3 Deployed

All are flagged green, so should be happy.

I have to figure out why the peergrouper isn't thinking they should be part of the group.
I don't see any errors happening, it just doesn't seem to think they are up and happy.

still needs some digging.

On a machine demonstrating this, we get:
 rs.status()
{
        "set" : "juju",
        "date" : ISODate("2017-12-13T16:35:05.020Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 1,
                        "name" : "10.245.208.204:37017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 76881,
                        "optime" : {
                                "ts" : Timestamp(1513182904, 5),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-12-13T16:35:04Z"),
                        "electionTime" : Timestamp(1513106027, 1),
                        "electionDate" : ISODate("2017-12-12T19:13:47Z"),
                        "configVersion" : 1,
                        "self" : true
                }
        ],
        "ok" : 1
}

and machine-0.log is saying:

2017-12-13 16:35:43 DEBUG juju.worker.peergrouper publish.go:43 API host ports have not changed
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:38 calculating desired peer group
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:43 members: ...
   &peergrouper.machine{id: "0", wantsVote: true, hostPorts: [10.245.208.204:37017 127.0.0.1:37017 [::1]:37017]}: rs_id=1, rs_addr=10.245.208.204:37017
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:44 extra: []replicaset.Member(nil)
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:45 maxId: 1
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:117 assessing possible peer group changes:
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:124 machine "0" is already voting
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper desired.go:142 assessed
2017-12-13 16:35:43 DEBUG juju.mongo mongo.go:288 selecting mongo peer hostPort by scope from [10.245.208.204:37017 127.0.0.1:37017 [::1]:37017]
2017-12-13 16:35:43 DEBUG juju.network address.go:370 selected "10.245.208.204:37017" as controller host:port, using scope selection
2017-12-13 16:35:43 DEBUG juju.mongo mongo.go:288 selecting mongo peer hostPort by scope from [10.245.208.204:37017 127.0.0.1:37017 [::1]:37017]
2017-12-13 16:35:43 DEBUG juju.network address.go:370 selected "10.245.208.204:37017" as controller host:port, using scope selection
2017-12-13 16:35:43 DEBUG juju.worker.peergrouper worker.go:494 no change in desired peer group, voting:
  0: true

At this point, that is only a couple of minutes old.

$ juju show-controller 
...
  controller-machines:
    "0":
      instance-id: wccabc
      ha-status: ha-enabled
    "1":
      instance-id: fk44nr
      ha-status: ha-pending
    "2":
      instance-id: xcnwhm
      ha-status: ha-pending
...

So far, nothing obvious. They all look to be up and happy.

$ juju status -m controller
...
Machine  State    DNS             Inst id  Series  AZ       Message
0        started  10.245.208.204  wccabc   xenial  default  Deployed
1        started  10.245.208.200  fk44nr   xenial  zone2    Deployed
2        started  10.245.208.201  xcnwhm   xenial  zone3    Deployed

All are flagged green, so should be happy.

I have to figure out why the peergrouper isn't thinking they should be part of the group.
I don't see any errors happening, it just doesn't seem to think they are up and happy.

still needs some digging.