jammy machines don't start on LXD < 5.2

Bug #1981955 reported by Jordan Barrett
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Yang Kelvin Liu

Bug Description

On Juju v3.0-beta3 (SHA 972bbc40), jammy machines don't start.

If you run
```
juju add-machine --series focal
```
the machine will set its state to "started" within a minute or two.

With
````
juju add-machine --series jammy
````
it seems to stay in state "pending" forever.

Revision history for this message
Vitaly Antonenko (anvial) wrote :

Jordan,

What cloud substrate did you use?

In version 3.0-beta2, I had the same problems with the xenial series.

Because I've just checked on lxd and aws, and it looks like I still have problems with xenial. But with jammy everything is OK.

AWS:
```
Model Controller Cloud/Region Version SLA Timestamp
test-series aws aws/us-east-1 3.0-beta3.1 unsupported 12:15:41+03:00

App Version Status Scale Charm Channel Rev Exposed Message
u-jammy 22.04 active 1 ubuntu stable 20 no

Unit Workload Agent Machine Public address Ports Message
u-jammy/0* active idle 1 3.238.182.92

Machine State Address Inst id Series AZ Message
1 started 3.238.182.92 i-082d47f57b2fee5c7 jammy us-east-1f running

```

LXD:
```
Model Controller Cloud/Region Version SLA Timestamp
test-series lxd lxd/default 3.0-beta3.1 unsupported 12:15:24+03:00

App Version Status Scale Charm Channel Rev Exposed Message
u-j 22.04 active 1 ubuntu stable 20 no
u-x waiting 0/1 ubuntu stable 20 no waiting for machine

Unit Workload Agent Machine Public address Ports Message
u-j/0* active idle 1 10.209.4.28
u-x/0 waiting allocating 0 waiting for machine

Machine State Address Inst id Series AZ Message
0 pending juju-8dd23e-0 xenial Running
1 started 10.209.4.28 juju-8dd23e-1 jammy Running
```

Revision history for this message
John A Meinel (jameinel) wrote :

Critical if it is true, given that 3.0 defaults to bootstrapping jammy and we definitely will want it to start out of the box.

Changed in juju:
importance: Undecided → Critical
milestone: none → 3.0-beta3
status: New → Triaged
Revision history for this message
Jordan Barrett (barrettj12) wrote (last edit ):

This was using LXC/LXD v4.0.9, on Ubuntu 20.04. I don't think it's just me, since our GitHub runners were having the same problem, e.g.
https://github.com/juju/juju/runs/7368191528?check_suite_focus=true

Revision history for this message
Jordan Barrett (barrettj12) wrote :

Just reproduced this issue by pulling the latest version of 3.0 (SHA fcf0748), rebuilding all Juju binaries and bootstrapping a fresh LXD controller. When I run
```
juju add-machine --series jammy
```
Juju gets stuck in this state:
```
Model Controller Cloud/Region Version SLA Timestamp
m jammytest localhost/localhost 3.0-beta3.1 unsupported 11:17:01+10:00

Machine State Address Inst id Series AZ Message
0 pending 10.67.163.232 juju-9743c2-0 jammy Running
```

Compare this to
```
juju add-machine --series focal
```
which ends up in this state within 1-2 mins:
```
Machine State Address Inst id Series AZ Message
1 started 10.67.163.118 juju-9743c2-1 focal Running
```

Here's the debug logs, comparing the jammy machine to a working focal machine startup:
https://pastebin.canonical.com/p/xhz8YJD8JJ/

Here's the contents of /var/log on the faulty jammy machine:
https://drive.google.com/file/d/12Bf54_5k16MuytutcCjFCLO5gI5eSpwJ/view?usp=sharing

Revision history for this message
Jordan Barrett (barrettj12) wrote (last edit ):

This seems to have been caused by an upstream bug in LXD:
https://github.com/lxc/lxd/issues/10422

This was fixed in LXD 5.2, so upgrading to a newer LXD version fixes the issue.

Changed in juju:
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
summary: - jammy machines don't start
+ jammy machines don't start on LXD < 5.2
Changed in juju:
milestone: 3.0-beta3 → 3.0-rc1
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

https://github.com/juju/juju/pull/14447 This PR ensures LXD version >= 5.2 because that's the min version we support in 3.0

Changed in juju:
status: In Progress → Fix Committed
Ian Booth (wallyworld)
Changed in juju:
milestone: 3.0-rc1 → 3.0-beta4
Revision history for this message
Simon Fels (morphis) wrote :

Why are we limiting Juju to feature releases and ignoring the 5.0 LTS? Asking people to run a feature release of LXD in production which is only supported for one month (!) sounds odd and rules out any use of Juju+LXD in customer deployments.

Revision history for this message
Jordan Barrett (barrettj12) wrote :

@morphis: as the bug report says, there is a critical issue in older versions of LXD which means that jammy machines fail to deploy:
https://github.com/lxc/lxd/issues/10422

As far as we know, that issue is only fixed in LXD 5.2 and later. Hence we are limiting to versions of LXD where the bug is fixed.

If the LXD team wants to backport the fix to 5.0, then we are happy to support LXD 5.0 in Juju.

Revision history for this message
Jordan Barrett (barrettj12) wrote :

@morphis: I did some more testing today and it seems the bug in LXD 5.0 has been fixed. Hence, we have agreed to support LXD 5.0 LTS in Juju 3.0. There is a PR to amend this here:
https://github.com/juju/juju/pull/14772

Changed in juju:
status: Fix Committed → Fix Released
milestone: 3.0-beta4 → 3.0-rc2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.