Juju 2.9.35 breaks LXD deployment

Bug #1993137 reported by Simon Fels
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Joseph Phillips

Bug Description

We see deployments of charms on arm64 failing on a local LXD failing with the following error:

1 down pending focal failed to start machine 1 (composing user data: invalid machine configuration: missing API hosts), retrying in 10s (10 more attempts)

The container is never created in LXD (checked with `lxc ls`) and Juju times out after some time to create the machine and remains with the error "missing API hosts".

This can be reproduced as follows:

1. Install LXD 5.0 and Juju 2.9

$ snap install --channel=5.0/stable lxd
$ snap install --classic --channel=2.9/stable juju

2. Initialize LXD with the follow preseed:

$ cat << EOF | lxd init --preseed
config:
  cluster.https_address: 172.31.13.156:8443
  core.https_address: 172.31.13.156:8443
networks:
- config:
    bridge.mode: fan
    fan.overlay_subnet: 240.0.0.0/8
    fan.underlay_subnet: 172.31.0.0/16
    ipv4.dhcp.expiry: infinite
    ipv4.nat: "true"
  description: ""
  name: lxdfan0
  type: bridge
  project: default
storage_pools:
- config:
    size: 19GiB
  description: ""
  name: data
  driver: zfs
profiles:
- config: {}
  description: Default LXD profile
  devices:
    eth0:
      nictype: bridged
      parent: lxdfan0
      type: nic
    root:
      path: /
      pool: data
      type: disk
  name: default
- config:
    boot.autostart: "true"
    security.nesting: "true"
  description: ""
  devices: {}
  name: juju-anbox-cloud
- config:
    boot.autostart: "true"
    security.nesting: "true"
  description: ""
  devices: {}
  name: juju-controller
- config:
    boot.autostart: "true"
    security.nesting: "true"
  description: ""
  devices: {}
  name: juju-default
projects:
- config:
    features.images: "true"
    features.networks: "true"
    features.profiles: "true"
    features.storage.volumes: "true"
  description: Default LXD project
  name: default
EOF

3. Bootstrap a new controller

$ juju bootstrap lxd test-controller

5. After bootstrap has finished, deploy the Ubuntu charm

$ juju deploy ubuntu --constraints="arch=arm64"

The deployment will succeed but a machine is never provision and fails with the error from above.

The same preseed for LXD was used with previous versions of Juju prior to 2.9.35 and has caused no similar problems.

This is currently causing all users of the Anbox Cloud Appliance (which we give active support for and sell through cloud marketplaces to fail) to fail to deploy.

Revision history for this message
Simon Fels (morphis) wrote :

Dropping FAN support from the preseed fixes the problem:

config:
  cluster.https_address: 172.31.13.156:8443
  core.https_address: 172.31.13.156:8443
networks:
- config:
    ipv4.dhcp.expiry: infinite
    ipv4.nat: "true"
  description: ""
  name: lxdbr0
  type: bridge
  project: default
storage_pools:
- config:
    size: 19GiB
  description: ""
  name: data
  driver: zfs
profiles:
- config: {}
  description: Default LXD profile
  devices:
    eth0:
      nictype: bridged
      parent: lxdfan0
      type: nic
    root:
      path: /
      pool: data
      type: disk
  name: default
- config:
    boot.autostart: "true"
    security.nesting: "true"
  description: ""
  devices: {}
  name: juju-anbox-cloud
- config:
    boot.autostart: "true"
    security.nesting: "true"
  description: ""
  devices: {}
  name: juju-controller
- config:
    boot.autostart: "true"
    security.nesting: "true"
  description: ""
  devices: {}
  name: juju-default
projects:
- config:
    features.images: "true"
    features.networks: "true"
    features.profiles: "true"
    features.storage.volumes: "true"
  description: De

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
importance: Undecided → High
milestone: none → 2.9.36
status: New → Triaged
Revision history for this message
Simon Fels (morphis) wrote :

It doesn't seem to be FAN networking only as the following config breaks things in the same way as the original preseed:

config:
  cluster.https_address: 172.31.13.156:8443
  core.https_address: 172.31.13.156:8443
cluster:
  enabled: true
  server_name: lxd0
networks:
- name: lxdbr0
  type: bridge
  config:
    ipv4.nat: true
    ipv4.dhcp.expiry: infinite
    ipv4.address: 240.0.0.1/16
    ipv6.address: none
profiles:
- name: default
  devices:
    root:
      path: /
      pool: data
      type: disk
    eth0:
      type: nic
      nictype: bridged
      parent: lxdbr0
storage_pools:
- name: data
  driver: zfs
  config:
    source: /dev/nvme1n1

Error in "juju status" remains "composing user data: invalid machine configuration: missing API hosts" after deploy the Ubuntu charm.

Revision history for this message
Simon Fels (morphis) wrote :

Verified on an old AMI still having 2.9.34 that the issue does not occur with it.

Revision history for this message
Joseph Phillips (manadart) wrote :

This is a result of the patch for:
https://bugs.launchpad.net/juju/+bug/1942804

Changed in juju:
status: Triaged → In Progress
Changed in juju:
importance: High → Critical
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Joseph Phillips (manadart) wrote :
Revision history for this message
Joseph Phillips (manadart) wrote :

2.9.36 was burned as a release version.

We jumped to 2.9.37, which has this fix.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.