Comment 4 for bug 1891586

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1891586] Re: 2.8.1 juju machine agent restarting itself

Interestingly, looking at the actual code, it is shelling out to systemd
because of a lack in the API, but the code comment mentions the API that it
would like to have, which was actually implemented back in 2015. I'm pretty
sure we could rely on that being available everywhere that we care
(xenial+), which would let us work around the CLI at least.
I don't know that it makes the request any more robust.

Is the issue that this

a) happened once
b) keeps happening and we are never able to recover

I don't think we've seen a failure talking to Systemd before.

John
=:->

On Tue, Sep 1, 2020 at 7:30 PM Ian Booth <email address hidden> wrote:

> That "Failed to list unit files: Connection timed out;" would be coming
> from systemctl itself trying to run the list-unit-files command. Juju
> uses that command at controller startup to check that mongo etc is
> installed, as well as when deploying units.
>
> You could guess that it would be related to load on the machine. The
> fact that the timeout causes the agent to restart would be due to the
> fact that design of how Juju manages it worker routines is such that it
> is considered better to restart if there's an error rather than to try
> and maintain state and recover. There's perhaps an argument to be made
> that I/O timeout type errors should result in that operation being
> retried after a back off. However, that would involve being able to
> cleanly identify the root cause error in the 100s of places where errors
> can occur and adding code to those 100s of places to do handle the
> retry. It's more feasible to do the agent restart but backoff at that
> point to avoid contributing the the load on the machine which is causing
> the issue.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1891586
>
> Title:
> 2.8.1 juju machine agent restarting itself
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1891586/+subscriptions
>