Provisioner worker pool errors cause on-machine provisioning to cease

Bug #1994488 reported by Joseph Phillips
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan

Bug Description

The on-machine provisioner for LXD/KVM uses a worker pool implemented using worker.Runner.

It is possible for errors encountered by the runners to be interpreted as fatal, which shuts down the worker pool without ever restarting it.

When this happens, container provisioning is effectively halted on the machine until the jujud daemon is restarted.

We should in most cases, not interpret provisioner task errors as fatal.

An example can be seen here, where a transient API connection error causes the fatal error:
https://pastebin.ubuntu.com/p/gYhRmNMRNZ/

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.9.37
Revision history for this message
Joseph Phillips (manadart) wrote :

This happens because the container provisioner is not started via dependency engine, rather by the machine agent directly.

So the normal mechanism whereby a worker would be restarted when its dependencies bounce due to an error, is not in play.

Ian Booth (wallyworld)
Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

Actually it is (by proxy) tied to the dependency engine. It is run via APIWorkersManifold. It should be restarted if the APICaller bounces.

Changed in juju:
assignee: Joseph Phillips (manadart) → Heather Lanigan (hmlanigan)
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Working on moving functionality of the unconverted-api-workers into their own workers and manifolds. They are causing multiple issues around shutting down machine agent and preventing migration in their current form.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.