Another instance, no upgrades have happened recently (in fact, I'm trying to prep the site for one). Site is running 1.25.13 on Trusty. Juju is HA on machines 0 1 and 2, and there are several other machines. Cloud is Maas 1.9. The units residing on machine 2 (not in LXC's but on the machine itself) are in state 'failed', I have tried restarting the machine and unit agents, the machines on 0 and 1 as well, all the juju-db's, and all the rsyslog daemons. I ran mgopurge (1.6) with all the state servers stopped. In the logs for the unit (with the log set to TRACE) I see the following when I try to run the following: juju run --unit ceph/1 'uptime' 2018-03-06 03:10:12 DEBUG juju.worker.uniter runlistener.go:61 RunCommands: {Commands:uptime RelationId:-1 RemoteUnitName: ForceRemoteUnit:false} 2018-03-06 03:10:12 TRACE juju.worker.uniter uniter.go:336 run commands: uptime However the command never returns, the agents don't move away from failed status, and hooks don't run. I don't see anything in the machine log that looks related at all (can attach but there's potentially sensitive info would need scrubbing). Also, I note there's a number of rsyslog connection attempts and frequent disconnects which could be a red herring or could be significant - e.g. 2018-03-06 03:15:08 INFO juju.worker.dependency engine.go:352 "rsyslog-config-updater" manifold worker stopped: dial tcp 10.28.16.13:6514: getsockopt: connection refused 2018-03-06 03:15:08 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold 2018-03-06 03:15:08 INFO juju.worker.dependency engine.go:294 starting "rsyslog-config-updater" manifold worker in 3s... 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:302 starting "rsyslog-config-updater" manifold worker 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "agent" resource 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "api-caller" resource 2018-03-06 03:15:11 DEBUG juju.worker.rsyslog worker.go:108 starting rsyslog worker mode 1 for "unit-os-cs-1" "" 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:309 running "rsyslog-config-updater" manifold worker 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:315 registered "rsyslog-config-updater" manifold worker 2018-03-06 03:15:11 INFO juju.worker.dependency engine.go:339 "rsyslog-config-updater" manifold worker started 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold 2018-03-06 03:15:11 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.13:6514 2018-03-06 03:15:11 INFO juju.worker.dependency engine.go:352 "rsyslog-config-updater" manifold worker stopped: dial tcp 10.28.16.13:6514: getsockopt: connection refused 2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold 2018-03-06 03:15:11 INFO juju.worker.dependency engine.go:294 starting "rsyslog-config-updater" manifold worker in 3s... 2018-03-06 03:15:12 DEBUG juju.worker.leadership tracker.go:138 os-cs/1 renewing lease for os-cs leadership 2018-03-06 03:15:12 DEBUG juju.worker.leadership tracker.go:165 checking os-cs/1 for os-cs leadership 2018-03-06 03:15:13 DEBUG juju.worker.leadership tracker.go:180 os-cs/1 confirmed for os-cs leadership until 2018-03-06 03:16:12.552651545 +0000 UTC 2018-03-06 03:15:13 INFO juju.worker.leadership tracker.go:182 os-cs/1 will renew os-cs leadership at 2018-03-06 03:15:42.552651545 +0000 UTC 2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:302 starting "rsyslog-config-updater" manifold worker 2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "agent" resource 2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "api-caller" resource 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:108 starting rsyslog worker mode 1 for "unit-os-cs-1" "" 2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:309 running "rsyslog-config-updater" manifold worker 2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:315 registered "rsyslog-config-updater" manifold worker 2018-03-06 03:15:14 INFO juju.worker.dependency engine.go:339 "rsyslog-config-updater" manifold worker started 2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.13:6514 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.2.22:6514 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.24.13:6514 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.6.13:6514 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.8.13:6514 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.2.20:6514 2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.12:6514 At a similar time in syslog: Mar 6 03:15:08 hostname rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1778857" x-info="http://www.rsyslog.com"] exiting on signal 15. Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 14 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1788531" x-info="http://www.rsyslog.com"] start Mar 6 03:15:12 hostname rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ] Mar 6 03:15:12 hostname rsyslogd-2221: module 'imuxsock' already in this config, cannot be added [try http://www.rsyslog.com/e/2221 ] Mar 6 03:15:12 hostname rsyslogd: rsyslogd's groupid changed to 104 Mar 6 03:15:12 hostname rsyslogd: rsyslogd's userid changed to 101 Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 4 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 5 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 6 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 7 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 8 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 9 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 11 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 10 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 12 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 13 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ] Mar 6 03:15:15 hostname rsyslogd-2083: gnutls returned error on handshake: A TLS warning alert has been received. [try http://www.rsyslog.com/e/2083 ] Mar 6 03:15:22 hostname rsyslogd-2027: imfile: could not persist state file machine-2 - data may be repeated on next startup. Is WorkDirectory set? [try http://www.rsyslog.com/e/2027 ] I tried clearing out the rsyslog config from /etc/rsyslog.d/25-juju.conf, emptying out /var/spool/rsyslog to clean out any broken files (with rsyslog stopped), and restarting the machine agent, but the .qi etc files all came back immediately as did these errors.