Canonical Juju

Provisioner worker pool errors cause on-machine provisioning to cease

Bug #1994488 reported by Joseph Phillips on 2022-10-26

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Heather Lanigan	Canonical Juju 2.9.37

Bug Description

The on-machine provisioner for LXD/KVM uses a worker pool implemented using worker.Runner.

It is possible for errors encountered by the runners to be interpreted as fatal, which shuts down the worker pool without ever restarting it.

When this happens, container provisioning is effectively halted on the machine until the jujud daemon is restarted.

We should in most cases, not interpret provisioner task errors as fatal.

An example can be seen here, where a transient API connection error causes the fatal error:
https://pastebin.ubuntu.com/p/gYhRmNMRNZ/

Joseph Phillips (manadart) on 2022-10-26

Changed in juju:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.9.37

Revision history for this message

Joseph Phillips (manadart) wrote on 2022-10-26:

This happens because the container provisioner is not started via dependency engine, rather by the machine agent directly.

So the normal mechanism whereby a worker would be restarted when its dependencies bounce due to an error, is not in play.

Ian Booth (wallyworld) on 2022-10-26

Changed in juju:
assignee:	nobody → Joseph Phillips (manadart)
status:	Triaged → In Progress

Revision history for this message

Joseph Phillips (manadart) wrote on 2022-10-26:

Actually it is (by proxy) tied to the dependency engine. It is run via APIWorkersManifold. It should be restarted if the APICaller bounces.

Joseph Phillips (manadart) on 2022-10-26

Changed in juju:
assignee:	Joseph Phillips (manadart) → Heather Lanigan (hmlanigan)

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2022-10-26:

Working on moving functionality of the unconverted-api-workers into their own workers and manifolds. They are causing multiple issues around shutting down machine agent and preventing migration in their current form.

Revision history for this message

Heather Lanigan (hmlanigan) wrote on 2022-11-02:

https://github.com/juju/juju/pull/14834

Heather Lanigan (hmlanigan) on 2022-11-03

Changed in juju:
status:	In Progress → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2022-11-14

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.