lxd profiles aren't reliably applied on juju 2.9.27

Bug #1966129 reported by Adam Dyess
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan

Bug Description

During deployment of a charm integration tests, we use juju to deploy a bundle into a lxd cloud. This resulted in repeated failed tests which appeared sometime between 2.9.22 and 2.9.27.

It's been validated that the following with a 2.9.27 client will reproduce the issue

Reproduce:

```
git clone https://github.com/charmed-kubernetes/charm-containerd.git
cd charm-containerd
tox -e integration -- --controller <lxd-controller> --cloud <lxd-cloud>
```

two of the machines will block in the "allocating" state with the message
"required charm profile not yet applied to machine"

preventing the successful deployment of the charms.

here's the output from dump-db
https://paste.ubuntu.com/p/WZGYBMTFQF/

and the output from lxc profile list
https://paste.ubuntu.com/p/nwDrQtGFvS/

Tags: regression
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Restarting the controller's jujud allows the profiles to be applied.

Changed in juju:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Heather Lanigan (hmlanigan) wrote (last edit ):

Reproduces also with this bundle:
https://people.canonical.com/~heather/1966129-bundle.yaml

Timing windows in the new lxd profile watchers need to be investigated. Not seen in our CI tests. Though all are not running right now.

The db has the profile in the charm, it just never gets written to lxd nor applied to the correct machines. If the instanceMutator is restarted, the correct profiles get applied during the reconciliation.

tags: added: regression
Changed in juju:
assignee: nobody → Heather Lanigan (hmlanigan)
status: Triaged → In Progress
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

This bug was introduced when the lxd profile watcher code was moved from the model cache to a state watcher. The first notification of an instanceData doc does not indicate that the machine was provisioned. The InstanceID must change.

https://github.com/juju/juju/blob/juju-2.9.23/core/cache/machine.go#L168

Revision history for this message
Heather Lanigan (hmlanigan) wrote :
Changed in juju:
milestone: none → 2.9.28
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.