Provisioner worker pool errors cause on-machine provisioning to cease
Bug #1994488 reported by
Joseph Phillips
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Heather Lanigan |
Bug Description
The on-machine provisioner for LXD/KVM uses a worker pool implemented using worker.Runner.
It is possible for errors encountered by the runners to be interpreted as fatal, which shuts down the worker pool without ever restarting it.
When this happens, container provisioning is effectively halted on the machine until the jujud daemon is restarted.
We should in most cases, not interpret provisioner task errors as fatal.
An example can be seen here, where a transient API connection error causes the fatal error:
https:/
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 2.9.37 |
Changed in juju: | |
assignee: | nobody → Joseph Phillips (manadart) |
status: | Triaged → In Progress |
Changed in juju: | |
assignee: | Joseph Phillips (manadart) → Heather Lanigan (hmlanigan) |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
This happens because the container provisioner is not started via dependency engine, rather by the machine agent directly.
So the normal mechanism whereby a worker would be restarted when its dependencies bounce due to an error, is not in play.