Juju add-unit stuck on new lxc containers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Expired
|
Undecided
|
Unassigned | ||
1.20 |
Won't Fix
|
Medium
|
Cheryl Jennings |
Bug Description
[Environment]
Trusty 14.04.2
Juju 1.20.11
[Description]
Juju add-unit hangs for new lxc containers.
- This environment has 3 state servers running, and 3 hypervisor machines for adding LXC guests.
- Jujud is running in all of the agents/states , listening correctly on 17070.
1) Running juju add-unit
ubuntu@customer:~$ juju -v --debug add-unit rabbitmq-slo --to lxc:3
2015-06-17 16:22:21 INFO juju.cmd supercommand.go:37 running juju [1.20.11-
2015-06-17 16:22:21 DEBUG juju.conn api.go:187 trying cached API connection settings
2015-06-17 16:22:21 INFO juju.conn api.go:270 connecting to API addresses: [bootstrap.
2015-06-17 16:22:21 INFO juju.state.api apiclient.go:242 dialing "wss://
2015-06-17 16:22:21 INFO juju.state.api apiclient.go:176 connection established to "wss://
2015-06-17 16:22:22 DEBUG juju.conn api.go:407 API addresses changed from ["bootstrap.
2015-06-17 16:22:22 INFO juju.conn api.go:418 updated API connection settings cache
- What's printed at that point on the state server juju log:
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 14:18:51 ERROR juju.rpc server.go:554 error writing response: EOF
2015-06-17 15:18:07 INFO juju.cmd supercommand.go:37 running jujud [1.20.11.
2015-06-17 15:18:07 INFO juju.cmd.jujud machine.go:158 machine agent machine-0 start (1.20.11.
2015-06-17 15:18:07 DEBUG juju.agent agent.go:377 read agent config, format "1.18"
2015-06-17 15:18:07 INFO juju.cmd.jujud machine.go:169 no upgrade steps required or upgrade steps for 1.20.11.1 have already be
en run.
2015-06-17 15:18:07 INFO juju.worker runner.go:260 start "api"
2015-06-17 15:18:07 INFO juju.worker runner.go:260 start "statestarter"
2015-06-17 15:18:07 INFO juju.worker runner.go:260 start "termination"
2015-06-17 15:18:07 INFO juju.state.api apiclient.go:242 dialing "wss://
2015-06-17 15:18:07 INFO juju.worker runner.go:260 start "state"
2015-06-17 15:18:07 INFO juju.state.api apiclient.go:250 error dialing "wss://
:17070/: dial tcp 127.0.0.1:17070: connection refused
2015-06-17 15:18:07 ERROR juju.worker runner.go:218 exited "api": unable to connect to "wss://
2015-06-17 15:18:07 INFO juju.worker runner.go:252 restarting "api" in 3s
2015-06-17 15:18:07 INFO juju.mongo mongo.go:171 Ensuring mongo server is running; data directory /var/lib/juju; port 37017
2015-06-17 15:18:07 INFO juju.mongo mongo.go:326 installing juju-mongodb
2015-06-17 15:18:07 INFO juju.utils.apt apt.go:132 Running: [apt-get --option=
tions::
2015-06-17 15:18:08 DEBUG juju.mongo mongo.go:275 using mongod: /usr/lib/
17 15:18:08.599 git version: nogitversion\n"
2015-06-17 15:18:08 DEBUG juju.mongo mongo.go:201 mongo exists as expected
2015-06-17 15:18:08 INFO juju.state open.go:47 opening state, mongo addresses: ["127.0.
2015-06-17 15:18:08 DEBUG juju.state open.go:48 dialing mongo
2015-06-17 15:18:08 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:08 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:08 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:08 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:08 DEBUG juju.state open.go:53 connection established
2015-06-17 15:18:08 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:09 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:10 INFO juju.worker.
2015-06-17 15:18:10 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 2
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "instancepoller"
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "peergrouper"
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "apiserver"
2015-06-17 15:18:10 INFO juju.state.
2015-06-17 15:18:10 INFO juju.worker.
2015-06-17 15:18:10 DEBUG juju.provider.maas environprovider
2015-06-17 15:18:10 INFO juju.worker.
2015-06-17 15:18:10 INFO juju.worker.
2015-06-17 15:18:10 INFO juju.worker.
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "cleaner"
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "resumer"
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "minunitsworker"
2015-06-17 15:18:10 DEBUG juju.provider.maas environprovider
2015-06-17 15:18:10 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:10 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:10 DEBUG juju.worker.
2015-06-17 15:18:10 DEBUG juju.worker.
2015-06-17 15:18:10 DEBUG juju.worker.
2015-06-17 15:18:10 INFO juju.mongo open.go:104 dialled mongo successfully
2015-06-17 15:18:10 INFO juju.worker runner.go:260 start "api"
- What's printed on the machine 3 ( the hypervisor guest ):
2015-06-17 15:49:00 INFO juju.worker runner.go:260 start "api"
2015-06-17 15:49:00 INFO juju.worker runner.go:260 start "statestarter"
2015-06-17 15:49:00 INFO juju.state.api apiclient.go:242 dialing "wss://
2015-06-17 15:49:00 INFO juju.worker runner.go:260 start "termination"
2015-06-17 15:49:00 INFO juju.state.api apiclient.go:242 dialing "wss://
2015-06-17 15:49:00 INFO juju.state.api apiclient.go:242 dialing "wss://
2015-06-17 15:49:00 INFO juju.state.api apiclient.go:176 connection established to "wss://
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "upgrader"
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "upgrade-steps"
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "machiner"
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "apiaddressupdater"
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "logger"
2015-06-17 15:49:01 DEBUG juju.worker.logger logger.go:35 initial log config: "<root>=DEBUG"
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "machineenviron
2015-06-17 15:49:01 DEBUG juju.worker.logger logger.go:60 logger setup
2015-06-17 15:49:01 DEBUG juju.worker.
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "rsyslog"
2015-06-17 15:49:01 INFO juju.worker runner.go:260 start "authentication
2015-06-17 15:49:01 DEBUG juju.worker.rsyslog worker.go:75 starting rsyslog worker mode 1 for "machine-3" ""
2015-06-17 15:49:01 DEBUG juju.container.kvm kvm.go:69 kvm-ok output:
INFO: /dev/kvm exists
KVM acceleration can be used
2015-06-17 15:49:01 DEBUG juju.worker.logger logger.go:45 reconfiguring logging from "<root>=DEBUG" to "juju.container
2015-06-17 16:18:42 ERROR juju.provisioner container_
2015-06-17 16:18:42 WARNING juju.provisioner container_
2015-06-17 16:18:42 ERROR juju.provisioner container_
2015-06-17 16:18:42 ERROR juju.provisioner container_
2015-06-17 16:18:42 WARNING juju.provisioner container_
2015-06-17 16:18:42 ERROR juju.provisioner container_
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.state.
2015-06-17 16:18:42 ERROR juju.worker runner.go:207 fatal "upgrader": connection is shut down
2015-06-17 16:18:42 ERROR juju.worker runner.go:207 fatal "authentication
2015-06-17 16:18:42 ERROR juju.worker runner.go:207 fatal "machiner": connection is shut down
2015-06-17 16:18:42 ERROR juju.worker runner.go:207 fatal "logger": connection is shut down
Changed in juju-core: | |
status: | New → Incomplete |
tags: | removed: sts |
The container is failing to start because it cannot get the tools required to do so. It is getting an error that the connection to the state server is shut down. It is troubling that:
1 - this happens every time, and
2 - this makes the add-unit command hang, instead of just failing.
I see that there are rpc errors from the state server that indicate it could not write a response, and I am wondering if there's something about this specific response that's causing the rpc connection to shut down.
I have asked for a recreate with trace level logging with logs from the state servers and machine 3. In the meantime, I am still digging through the code and attempting a local recreate.