Comment 24 for bug 1510651

Revision history for this message
Tim Penhey (thumper) wrote :

I have replicated this on 1.26 with lxd with the following observations:

* firstly, after about 20 - 30 minutes it appears to have all settled itself down (not actually true)
* I killed machine-0 and it always shows as started in status, never down
* both machines 3 and 4 (new state server machines) only list the machine-0 address for state addresses
  - something running in those agents is attempting to connect to just the down machine-0 address for mongo
* the machine agent for machine-1 never shown as lost, but for some reason took 15 minutes to return from dialing the API server addresses.

2015-11-24 01:30:58 INFO juju.worker runner.go:269 start "api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.49:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.202:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.218:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:31:01 INFO juju.api apiclient.go:484 error dialing "wss://10.0.3.49:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api": websocket.Dial wss://10.0.3.49:17070/environment/a0b3e91b-e80f
-4ac8-8264-3b250e8f7599/api: dial tcp 10.0.3.49:17070: getsockopt: no route to host
2015-11-24 01:44:25 INFO juju.api apiclient.go:269 connection established to "wss://10.0.3.202:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"

* the dependency engine used by the unit agent seemed to get itself into a pathological state, and took some time to reconnect everything when the server went away.
* the unit agent log file has timestamps recorded out of order (weird) - although it appears that this is only during the time the machine-0 got stopped
* it appears that as the uniter restarted in the unit agent, it re-ran the config-changed hook
* the "rsyslog-config-updater" worker is stuck restarting as it is only trying to look at the old machine-0 address