bootstrap on slow node fails: "ERROR juju.cmd supercommand.go:304 can't dial mongo to initiate replicaset: no reachable servers"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Fix Released
|
High
|
Ian Booth |
Bug Description
I'm trying to use juju using a simulated system. The simulated environment is much slower than a real VM, and this appears to cause a timeout to expire while trying to connect to the state server:
2014-05-19 17:23:44 DEBUG juju.worker.
2014-05-19 17:23:44 DEBUG juju.state open.go:128 connection failed, will retry: dial tcp 127.0.0.1:37017: connection refused
2014-05-19 17:23:47 DEBUG juju.state open.go:128 connection failed, will retry: dial tcp 127.0.0.1:37017: connection refused
[...]
2014-05-19 17:24:14 DEBUG juju.state open.go:128 connection failed, will retry: dial tcp 127.0.0.1:37017: connection refused
2014-05-19 17:24:14 DEBUG juju.state open.go:128 connection failed, will retry: dial tcp 127.0.0.1:37017: connection refused
2014-05-19 17:24:15 INFO juju.worker.
2014-05-19 17:24:15 ERROR juju.cmd supercommand.go:304 can't dial mongo to initiate replicaset: no reachable servers
2014-05-19 17:24:16 ERROR juju.provider.
Stopping instance...
2014-05-19 17:24:17 INFO juju.cmd cmd.go:113 Bootstrap failed, destroying environment
2014-05-19 17:24:17 INFO juju.provider.
2014-05-19 17:24:19 ERROR juju.cmd supercommand.go:304 subprocess encountered error code 1
I've tried bumping up the mongoSocketTimeout and defaultDialTimeout constants in src/launchpad.
Changed in juju-core: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 1.19.3 |
Changed in juju-core: | |
assignee: | nobody → Ian Booth (wallyworld) |
status: | Triaged → In Progress |
Changed in juju-core: | |
status: | In Progress → Fix Committed |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
Slow provisioners need to extend the timeouts. has this been tried. Given that my test of arm64 images were 10x slower than amd64 on ec2, so long timeouts are probably needed.
Environments that need more time to provision an instance can configure 3 options the environments.yaml. MAAS environments often need to set bootstrap- timeout to 1800.
bootstrap-timeout (default: 600s) retry-delay (default: 5s) addresses- delay (default: 10s)
bootstrap-
bootstrap-