Encountered a problem where jujud stops listening on 17070/tcp while still running on machine 0. All juju clients stop working as they can't communicate to WSS on machine 0
Juju agent version: 1.17.4
1- We can see the following processes. So mongod is running fine, and the machine agent is running:
root@juju-cs2-machine-0:~# ps auxww | grep juju
root 5439 2.4 4.7 483664 48564 ? Ssl Mar03 41:48 mongod --auth --dbpath=/var/lib/juju/db --sslOnNormalPorts --sslPEMKeyFile /var/lib/juju/server.pem --sslPEMKeyPassword xxxxxxx --bind_ip 0.0.0.0 --port 37017 --noprealloc --syslog --smallfiles
root 5465 9.6 11.4 1167408 116616 ? Ssl Mar03 162:46 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
root 12661 0.3 1.8 640832 19180 ? Ssl Mar03 4:52 /var/lib/juju/tools/unit-ci-juju-gui-0/jujud unit --data-dir /var/lib/juju --unit-name ci-juju-gui/0 --debug
root 13040 0.0 2.2 153120 23188 ? Ss Mar03 0:00 /usr/bin/python /usr/local/bin/runserver.py --logging=info --guiroot=/var/lib/juju-gui/juju-gui/build-prod --sslpath=/etc/ssl/juju-gui --charmworldurl=https://manage.jujucharms.com/ --apiurl=wss://10.55.32.46:17070 --apiversion=go
2- However port 17070/tcp isn't open by jujud machine0 agent:
root@juju-cs2-machine-0:~# netstat -anp | grep 17070
3- In the logs we have hundreds of:
014-03-03 12:18:28 ERROR juju.cmd.jujud agent.go:241 closeWorker: close error: error receiving message: write tcp 127.0.0.1:17070: connection reset by peer
2014-03-03 12:18:28 ERROR juju runner.go:220 worker: exited "api": watcher iteration error: read tcp 127.0.0.1:37017: i/o timeout
2014-03-03 12:18:31 ERROR juju apiclient.go:118 state/api: websocket.Dial wss://localhost:17070/: dial tcp 127.0.0.1:17070: connection refused
2014-03-03 12:18:31 ERROR juju runner.go:220 worker: exited "api": websocket.Dial wss://localhost:17070/: dial tcp 127.0.0.1:17070: connection refused
4- upon restart of jujud machine 0 agent all processes started to work fine and juju status worked. I could see that now jujud was listening on port 17070:
tcp6 0 0 :::17070 :::* LISTEN 1634/jujud
The process listening is 1634, which is compatible with the below:
root@juju-cs2-machine-0:~# ps auxwww | grep juju
root 1634 1.6 5.0 957760 51716 ? Ssl 13:54 0:23 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
root 5439 2.4 5.0 549200 51584 ? Ssl Mar03 42:25 mongod --auth --dbpath=/var/lib/juju/db --sslOnNormalPorts --sslPEMKeyFile /var/lib/juju/server.pem --sslPEMKeyPassword xxxxxxx --bind_ip 0.0.0.0 --port 37017 --noprealloc --syslog --smallfiles
Full log from machine-0 agent attached.
thanks,
Eduardo.
fwiw this is roughly the similiar to the issue behind bug:1284183 (disconnect all clients on juju) except normally restarts the api server there.