juju fails to provision lxd containers with lxd 4.18

Bug #1942864 reported by Frode Nordahl
120
This bug affects 23 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

This happens for juju 2.8.11, 2.9.11 and 2.9.12. On one of the machines I can see this in machine-0.log:

2021-09-07 08:04:58 ERROR juju.worker.dependency engine.go:671 "broker-tracker" manifold worker returned unexpected error: no container types determined
2021-09-07 08:04:58 INFO juju.worker.authenticationworker worker.go:103 "machine-0" key updater worker started
2021-09-07 08:04:58 INFO juju.container-setup container_initialisation.go:110 initial container setup with ids: [0/lxd/0]
2021-09-07 08:04:58 INFO juju.packaging.manager run.go:88 Running: snap info lxd
2021-09-07 08:04:58 INFO juju.worker.machiner machiner.go:112 "machine-0" started
2021-09-07 08:04:59 INFO juju.container.lxd initialisation_linux.go:300 switching LXD snap channel from 4.0/stable/ubuntu-20.04 to latest/stable
2021-09-07 08:04:59 INFO juju.packaging.manager run.go:88 Running: snap refresh --channel latest/stable lxd
2021-09-07 08:05:18 WARNING juju.container-setup container_initialisation.go:139 not stopping machine agent container watcher due to error: setting up container dependencies on host machine: Not Found
2021-09-07 08:05:18 ERROR juju.container-setup container_initialisation.go:118 starting container provisioner for lxd: setting up container dependencies on host machine: Not Found
2021-09-07 08:05:21 INFO juju.container-setup container_initialisation.go:110 initial container setup with ids: [0/lxd/0]
2021-09-07 08:05:21 INFO juju.packaging.manager run.go:88 Running: snap info lxd
2021-09-07 08:05:22 INFO juju.container.lxd initialisation_linux.go:295 LXD snap is already installed (channel: latest/stable); skipping package installation

Issuing `juju model-config lxd-snap-channel=4.17/stable` prior to deploying anything appears to fix/work around the issue.

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Same happened to my deployment as well.

Revision history for this message
Sérgio Manso (sergiomanso) wrote :

I'm facing this same problem with lxd 4.18 and Juju 2.8.11

Revision history for this message
Moula BADJI (moulab1) wrote :

Same bug with juju 2.9.13 Dev.

Revision history for this message
Stéphane Graber (stgraber) wrote :

This is caused by Juju doing exact error string comparison against LXD's errors...

https://github.com/juju/juju/pull/13313

Revision history for this message
Michael Skalka (mskalka) wrote :

Tagging this as critical, it has rendered SQA unable to perform any baremetal master or stable testing.

tags: added: field-critical
John A Meinel (jameinel)
Changed in juju:
importance: Undecided → High
milestone: none → 2.9.13
status: New → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

A fix using the newer StatusErrorMatch API is here

https://github.com/juju/juju/pull/13315

Ian Booth (wallyworld)
Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

We just hit this bug on our deployment as well

Revision history for this message
Steven Parker (sbparke) wrote :

Confirming work around works for 9.12 with focal.

I found using the model-defaults was a little easier when deploying multiple models for testing.

juju model-defaults lxd-snap-channel=4.17/stable

Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9.13 → 2.9.14
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

The mentioned fix got reverted in https://github.com/juju/juju/commit/dd6a5ac so this bug should be re-opened?

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

The mentioned workaround doesn't work anymore because there is no snap channel 4.17 anymore for LXD. This forces the user to use the `4.0/stable` channel instead, which seems to lead to other issues in my case.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Nevermind, I just validated Juju 2.9.17 with LXD 4.18 and the problem seems to have indeed been fixed, so it must have been fixed meanwhile via another patch which hasn't been linked here.

Revision history for this message
Bartosz Woronicz (mastier1) wrote :

Seems like combinations of anything below juju 2.9.14 and lxd => 4.18 will fail
For instance I encountered that with juju 2.9.3 and lxd 4.20

Revision history for this message
Jean-François Roche (jf-roche) wrote :

Experienced same problem with juju 2.9.18 and lxd 4.20 :

2021-11-19 12:04:06 INFO juju.container-setup container_initialisation.go:110 initial container setup with ids: [5/lxd/0]
2021-11-19 12:04:06 INFO juju.packaging.manager utils.go:58 Running: snap info lxd
2021-11-19 12:04:07 INFO juju.container.lxd initialisation_linux.go:295 LXD snap is already installed (channel: 4.20/stable); skipping package installation
2021-11-19 12:04:07 WARNING juju.container-setup container_initialisation.go:139 not stopping machine agent container watcher due to error: setting up container dependencies on host machine: Not Found

Revision history for this message
Sean Michael (seanmich) wrote :

This is happening for me as well on 2.9.19 and lxd 4.20. I also tried lxd 4.19 and 4.18. It appears the previous workaround using 4.17 no longer works.

Can this be reopened or should a new bug be created?

19:58:24 INFO juju.container.lxd LXD snap is already installed (channel: 4.20/stable); skipping package installation
19:58:24 WARNING juju.container-setup not stopping machine agent container watcher due to error: setting up container dependencies on host machine: Not Found
19:58:24 ERROR juju.container-setup starting container provisioner for lxd: setting up container dependencies on host machine: Not Found
machine-9: 19:58:24 INFO juju.container.lxd LXD snap is already installed (channel: 4.20/stable); skipping package installation
19:58:24 WARNING juju.container-setup not stopping machine agent container watcher due to error: setting up container dependencies on host machine: Not Found
19:58:24 ERROR juju.container-setup starting container provisioner for lxd: setting up container dependencies on host machine: Not Found

Revision history for this message
Felipe Alencastro (falencastro) wrote :

This is stil happening using juju 2.9.25 and lxd 4.23.

2022-03-11 21:27:27 INFO juju.container.lxd initialisation_linux.go:295 LXD snap is already installed (channel: latest/stable); skipping package installation
2022-03-11 21:27:30 WARNING juju.container-setup container_initialisation.go:139 not stopping machine agent container watcher due to error: setting up container dependencies on host machine: Not Found
2022-03-11 21:27:30 ERROR juju.container-setup container_initialisation.go:118 starting container provisioner for lxd: setting up container dependencies on host machine: Not Found
...
2022-03-11 21:28:14 WARNING juju.container-setup container_initialisation.go:139 not stopping machine agent container watcher due to error: setting up container dependencies on host machine: failed to acquire initialization lock: cancelled acquiring mutex
2022-03-11 21:28:14 ERROR juju.container-setup container_initialisation.go:118 starting container provisioner for lxd: setting up container dependencies on host machine: failed to acquire initialization lock: cancelled acquiring mutex

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

This problem is having a major impact on a customer while expanding nodes, can we open this again? I can put it under filed-critical if needed.

Revision history for this message
Ian Booth (wallyworld) wrote :

I am having a lot of trouble reproducing this. The only way I can see it maybe happening is if the model is still an older version of Juju. When you say "juju 2.9.25", do you mean just the controller, or have you also upgraded the affected models as well?

I tried to reproduce on AWS using both LXD 4.23 and latest (4.24). I first bootstrapped

juju bootstrap aws --config lxd-snap-channel=latest/stable --no-default-model

and then deployed the ubuntu charm to both a LXD container on machine 0 as well as a new machine

juju deploy ubuntu --to lxd:0
juju add-unit ubuntu --to lxd

I also forced a scenario where the LXD API returns "Not Found" in order to test that the parsing of the not found error was still being done correctly. I did this by using a custom jujud with the LXD bridge renamed to "lxdbr6" which doesn't exist and hence triggers the juju code path which needs to process this error and create a new bridge.

Are you able to provide more information on how to reproduce? I'd suggest opening a new bug with the relevant details: what cloud, was config "lxd-snap-channel" used, what deploy command etc

Revision history for this message
Felipe Alencastro (falencastro) wrote (last edit ):

@wallyworld, you're correct we're at 2.9.26 currently and our model is still on 2.8.10, we'll try again once we upgrade it, probably next week.

[EDIT]
After upgrading controllers and model to 2.9.27 everything works as expected.

Revision history for this message
Dongwon Cho (dongwoncho) wrote :

I was hitting the same this morning with juju 2.9.28 and lxd latest/stable which is 5.0.0-e478009.
The workaround to change the lxd channel as follows works for me.
juju model-config -m $MODEL lxd-snap-channel=4.24/stable

Revision history for this message
Navdeep (navdeep-bjn) wrote (last edit ):

Seeing this on lxd 5.0.0-69602b2 and juju 2.9.28

This workaround worked.
juju model-config -m $MODEL lxd-snap-channel=4.24/stable

I noticed this bug once we upgraded our juju agent on the deployment machine to 2.9 version form 2.8

Revision history for this message
Ian Booth (wallyworld) wrote :

LXD 5.0.0 changed its error messages, and Juju needed a tweak to handle that.
This PR landed recently

https://github.com/juju/juju/pull/13957

The fix will be in the next Juju 2.9 point release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.