Internal controller service restarts on 2.9
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Expired
|
High
|
Unassigned |
Bug Description
We may be seeing this more often with 2.9.27, but I've seen it on previous 2.9 releases too, e.g. 2.9.25.
While the jujud process itself doesn't restart, something within it is stopping and starting the network listener on port 17070, for long enough to trip our monitoring systems.
An example from today, in an Azure-hosted HA controller environment:
$ juju run --application ubuntu "grep 'running jujud\|httpserver worker' /var/log/
= ubuntu/1 (rc = 0) =
== stdout ==
2022-04-18 04:31:59 INFO juju.worker.
= ubuntu/2 (rc = 0) =
== stdout ==
2022-04-18 04:25:31 INFO juju.worker.
2022-04-18 04:26:35 INFO juju.worker.
= ubuntu/0 (rc = 0) =
== stdout ==
2022-04-18 04:26:28 INFO juju.worker.
2022-04-18 04:26:33 INFO juju.worker.
$
Looking into the logs on machine 0 here, we see a lot of this in the seconds before the restart:
ERROR juju.apiserver apiserver.go:1017 error serving RPCs: codec.ReadHeader error: error receiving message: read tcp 192.168.
Followed by this:
2022-04-18 04:25:25 ERROR juju.worker.raft apply.go:147 Raft future error: timed out enqueuing operation
2022-04-18 04:25:25 ERROR juju.worker.
2022-04-18 04:25:25 ERROR juju.worker.
2022-04-18 04:25:25 ERROR juju.worker.
2022-04-18 04:25:25 ERROR juju.worker.
2022-04-18 04:25:25 ERROR juju.worker.
And then a bunch of these from all the connected agents:
INFO juju.apiserver.
Then:
2022-04-18 04:25:25 INFO juju.cmd.
2022-04-18 04:25:25 INFO juju.cmd.
2022-04-18 04:25:25 ERROR juju.cmd.
2022-04-18 04:25:25 INFO juju.worker.logger logger.go:136 logger worker stopped
Then more connection errors along the lines of:
ERROR juju.worker.
And finally the listener restarts:
2022-04-18 04:26:28 INFO juju.worker.
Can obviously supply the full logs and any further data as needed but I think that's the gist of it.
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Joseph Phillips (manadart) |
Changed in juju: | |
milestone: | none → 2.9-next |
tags: | added: canonical-is |
Seeing this for jaas-azure- westus- 002, twice, same time yesterday and today.
I've collected machine agent logs and available via the usual place, juju-controller -reports. ..