Hi,
Recently upgraded a set of controllers from 2.7.6 to 2.8.1. On one of them, it looks like it crashed and restarted itself.
| root 11195 0.0 0.0 22084 3524 ? Ss 05:46 0:00 bash /etc/systemd/system/jujud-machine-1-exec-start.sh
| root 11209 1.5 1.3 825828 113204 ? Sl 05:46 0:12 \_ /var/lib/juju/tools/machine-1/jujud machine --data-dir
Last logging lines:
| 2020-08-14 05:45:27 ERROR juju.worker.dependency engine.go:671 "firewaller" manifold worker returned unexpected error: machine 2 not provisioned
| 2020-08-14 05:45:28 ERROR juju.worker.dependency engine.go:671 "firewaller" manifold worker returned unexpected error: cannot respond to units changes for "machine-0": instances not found
| 2020-08-14 05:45:42 ERROR juju.worker.dependency engine.go:671 "environ-tracker" manifold worker returned unexpected error: cannot create environ: authentication failed.: authentication failed
| caused by: requesting token: Unauthorised URL https://keystone.bos01.canonistack.canonical.com:5000/v3/auth/tokens
| caused by: request (https://keystone.bos01.canonistack.canonical.com:5000/v3/auth/tokens) returned unexpected status: 401; error info: Failed: 401 error: The request you have made requires authentication.
| 2020-08-14 05:45:45 ERROR juju.worker.dependency engine.go:671 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: read tcp 10.48.128.113:40964->10.48.128.113:37017: i/o timeout
| 2020-08-14 05:45:45 ERROR juju.apiserver.instancepoller instancepoller.go:174 link layer device merge attempt for machine 4 failed due to error: read tcp 127.0.0.1:43758->127.0.0.1:37017: i/o timeout; waiting until next instance-poller run to retry
| 2020-08-14 05:45:53 ERROR juju.worker.dependency engine.go:671 "environ-tracker" manifold worker returned unexpected error: cannot create environ: authentication failed.: authentication failed
| caused by: requesting token: Unauthorised URL https://keystone.bos01.canonistack.canonical.com:5000/v3/auth/tokens
| caused by: request (https://keystone.bos01.canonistack.canonical.com:5000/v3/auth/tokens) returned unexpected status: 401; error info: Failed: 401 error: The request you have made requires authentication.
| 2020-08-14 05:45:56 ERROR juju.state status.go:431 failed to update status history: read tcp 127.0.0.1:44126->127.0.0.1:37017: i/o timeout
| 2020-08-14 05:46:21 INFO juju.cmd supercommand.go:91 running jujud [2.8.1 0 16439b3d1c528b7a0e019a16c2122ccfcf6aa41f gc go1.14.4]
| 2020-08-14 05:46:21 DEBUG juju.cmd supercommand.go:92 args: []string{"/var/lib/juju/tools/machine-1/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "1", "--debug"}
| 2020-08-14 05:46:21 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 4
| 2020-08-14 05:46:21 DEBUG juju.agent agent.go:583 read agent config, format "2.0"
| 2020-08-14 05:46:21 INFO juju.cmd.jujud agent.go:138 setting logging config to "<root>=WARNING;unit=DEBUG"
| 2020-08-14 05:46:45 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:46:50 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:46:55 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:47:01 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:47:10 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:47:19 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:47:31 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:47:44 ERROR juju.service.systemd service.go:127 failed to list services for application "juju-db": List failed (/bin/systemctl list-unit-files --no-legend --no-page -l -t service | grep -o -P '^\w[\S]*(?=\.service)'): error executing "/bin/systemctl": Failed to list unit files: Connection timed out;
| 2020-08-14 05:47:44 ERROR juju.worker.dependency engine.go:671 "state" manifold worker returned unexpected error: failed to list services for application "juju-db": List failed (/bin/systemctl list-unit-files --no-legend --no-page -l -t service | grep -o -P '^\w[\S]*(?=\.service)'): error executing "/bin/systemctl": Failed to list unit files: Connection timed out;
| 2020-08-14 05:47:56 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:48:11 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:48:33 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused
| 2020-08-14 05:49:44 ERROR juju.worker.raft.rafttransport streamlayer.go:121 streamLayer.Addr timed out waiting for API address
| 2020-08-14 05:49:44 ERROR juju.worker.dependency engine.go:671 "raft-transport" manifold worker returned unexpected error: timed out waiting for API address
| 2020-08-14 05:49:44 ERROR juju.apiserver.pubsub pubsub.go:104 pubsub receive error: read tcp 10.48.128.113:17070->10.48.128.131:52492: use of closed network connection
| 2020-08-14 05:49:44 ERROR juju.apiserver.pubsub pubsub.go:104 pubsub receive error: read tcp 10.48.128.113:17070->10.48.128.127:52750: use of closed network connection
| 2020-08-14 05:49:59 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [39f5e5] "machine-1" cannot open api: can't reset agent password: updating machine "1": read tcp 10.48.128.113:41626->10.48.128.113:37017: i/o timeout
| 2020-08-14 05:50:44 WARNING juju.worker.httpserver worker.go:214 timeout waiting for apiserver shutdown
| debug info written to /var/log/juju/apiserver-debug.log
apiserver-debug.log available via link below:
| https://private-fileshare.canonical.com/~hloeung/tmp/lmO4x2mI.log
This log line is particularly interesting:
2020-08-14 05:47:44 ERROR juju.service. systemd service.go:127 failed to list services for application "juju-db": List failed (/bin/systemctl list-unit-files --no-legend --no-page -l -t service | grep -o -P '^\w[\S] *(?=\.service) '): error executing "/bin/systemctl": Failed to list unit files: Connection timed out;
Is this on a focal machine by the way? If so, can you also check journalctl output for any auth-related errors raised by systemd. I've seen errors like this with nested lxd containers running focal (in those case, the culprit seems to be apparmor).