LTS upgrade to v21 breaks in LXD containers in OpenStack (nova-compute-lxd)

Bug #1959118 reported by Jan Graichen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Some time ago, cloud-init version 21 was pushed to all LTS releases including 20.04, and 18.04. It will be installed by default and present in all LTS cloud images.

We are running LXD in OpenStack using nova-compute-lxd and unmodified cloud images. Since cloud-init was upgraded in LTS images, cloud-init will only use the new LXD data source, and will no longer contact the EC2-compatible meta-data server from OpenStack. This means that all LXD instances will not receive any meta-data anymore, as it is only available via the meta-data server.

No new LXD instances can be started and configured in OpenStack, not with the current LTS 18.04 nor the LTS 20.04 images. We cannot rebuild instances nor spawn new ones, despite using the LTS releases that formerly worked.

Can the upgrade be rolled backed, or the data source be disabled, to restore the original LTS behavior?

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: cloud-init 21.4-0ubuntu1~20.04.1
ProcVersionSignature: Ubuntu 4.15.0-166.174-generic 4.15.18
Uname: Linux 4.15.0-166-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
CasperMD5CheckResult: skip
CloudName: OpenStack
Date: Wed Jan 26 15:44:05 2022
Ec2AMI: ami-000002c6
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: az3
Ec2InstanceType: p2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
PackageArchitecture: all
ProcEnviron:
 SHELL=/bin/bash
 LANG=C.UTF-8
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
 PATH=(custom, no user)
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)
user_data.txt: Error: path contained symlinks.

Revision history for this message
Jan Graichen (jgraichen) wrote :
Revision history for this message
James Falcon (falcojr) wrote :

The intent was not to change LTS behavior, so yes, we will modify the behavior accordingly.

Can you help me understand your use case so we know what needs to change? Previous to this LXD Datasource, we expected LXD containers to be identified to cloud-init using the NoCloud datasource. It sounds like in your case, they're identifying as a different datasource. You mention Openstack but also an EC2-compatible metadata. Do you know which datasource is being identified and how this is being accomplished?

I looked at the attached logs, and I currently only see logs from a machine that identified the LXD datasource. Is there a possibility of obtaining cloud-init logs from a machine that hasn't been upgraded and doesn't identify the LXD datasource? To do so, run "cloud-init collect-logs" and upload the resulting tarball to this bug.

Revision history for this message
Jan Graichen (jgraichen) wrote :

We use nova-compute-lxd to run OpenStack "VMs" not as libvirt+QEMU virtual machines, but as LXD containers. The nova-compute-lxd connects to the local LXD daemon and creates containers, using a rootfs image from Glance, and attaching it to networks managed by Neutron (VXLANs in our case). As far as I know, LXD basically is used as a dumb "hypervisors".

As images, we use the rootfs images "focal-server-cloudimg-amd64-root.tar.xz" from https://cloud-images.ubuntu.com/focal/current/. A daily cron job checks for new images and imports them to Glance. We actually do run "regular" libvirt+QEMU VMs to. They are set up the same way, and use the qcow images.

Before the upgrade, the LXD container used "DataSourceOpenStackLocal [net,ver=2]" data source. I am not aware of any special configuration needed to make that happen. The containers used DHCP to get their IP addresses, contacted the metadata server at 169.254.169.254, and processed the results the same way as the QEMU VMs did.

> To do so, run "cloud-init collect-logs" and upload the resulting tarball to this bug.

I'll see if I can find a still running container. Most instances were recently recreated, which failed. But I was able to find a cloud-init log from a previous instance, that shows the data source as DataSourceOpenStackLocal, even after the cloud-init package was upgraded in the running system and after reboots.

James Falcon (falcojr)
Changed in cloud-init (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Chad Smith (chad.smith) wrote :

upstream commits landed which should resolve this issue on the next published release of cloud-init v. 22.1. Expectation is within the next 3 weeks this fix will be in daily Ubuntu images.

 - upstream commit reordering LXD datasource detecting priority to after OpenStack: https://github.com/canonical/cloud-init/commit/46a0126e874927353e83b385b58ab054e58667cc
 - upstream commits to Ubuntu bionic/focal/impish/jammy release branches to ensure package configuration files also set priority order of datasource detection which prefers OpenStack before LXD on new systems.
  - https://github.com/canonical/cloud-init/commit/f32e964ce1c3918bea1f8856a02318e21055a4e6

Changed in cloud-init (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.2 KiB)

This bug was fixed in the package cloud-init - 21.4-119-gdeb3ae82-0ubuntu1~22.04.1

---------------
cloud-init (21.4-119-gdeb3ae82-0ubuntu1~22.04.1) jammy; urgency=medium

  * d/cloud-init.templates: Move LXD to back of datasource_list
  * New upstream snapshot.
    - tests: lsblk --json output changes mountpoint key to mountpoinst []
      (#1261)
    - mounts: fix mount opts string for ephemeral disk (#1250)
      [Chris Patterson]
    - Shell script handlers by freq (#1166) [Chris Lalos]
    - minor improvements to documentation (#1259) [Mark Esler]
    - cloud-id: publish /run/cloud-init/cloud-id-<cloud-type> files (#1244)
    - add "eslerm" as contributor (#1258) [Mark Esler]
    - sources/azure: refactor ssh key handling (#1248) [Chris Patterson]
    - bump pycloudlib (#1256)
    - sources/hetzner: Use EphemeralDHCPv4 instead of static configuration
      (#1251) [Markus Schade]
    - bump pycloudlib version (#1255) [Brett Holman]
    - Fix IPv6 netmask format for sysconfig (#1215) [Harald] (LP: #1959148)
    - sources/azure: drop debug print (#1249) [Chris Patterson]
    - tests: do not check instance.pull_file().ok() (#1246)
    - sources/azure: consolidate ephemeral DHCP configuration (#1229)
      [Chris Patterson]
    - cc_salt_minion freebsd fix for rc.conf (#1236) [Brett Holman]
    - sources/azure: fix metadata check in _check_if_nic_is_primary() (#1232)
      [Chris Patterson]
    - Add _netdev option to mount Azure ephemeral disk (#1213) [Eduardo Otubo]
    - testing: stop universally overwriting /etc/cloud/cloud.cfg.d (#1237)
    - Integration test changes (#1240)
    - Fix Gentoo Locales (#1205) [Brett Holman]
    - Add "slingamn" as contributor (#1235) [Shivaram Lingamneni]
    - integration: do not LXD bind mount /etc/cloud/cloud.cfg.d (#1234)
    - Integration testing docs and refactor (#1231)
    - vultr: Return metadata immediately when found (#1233) [eb3095]
    - spell check docs with spellintian (#1223) [Brett Holman]
    - docs: include upstream python version info (#1230)
    - Schema a d (#1211)
    - Move LXD to end ds-identify DSLIST (#1228) (LP: #1959118)
    - fix parallel tox execution (#1214) [Brett Holman]
    - sources/azure: refactor _report_ready_if_needed and _poll_imds (#1222)
      [Chris Patterson]
    - Do not support setting up archive.canonical.com as a source (#1219)
      [Steve Langasek] (LP: #1959343)
    - Vultr: Fix lo being used for DHCP, try next on cmd fail (#1208) [eb3095]
    - sources/azure: refactor _should_reprovision[_after_nic_attach]() logic
      (#1206) [Chris Patterson]
    - update ssh logs to show ssh private key gens pub and simplify code
      (#1221) [Steve Weber]
    - Remove mitechie from stale PR github action (#1217)
    - Include POST format in cc_phone_home docs (#1218) (LP: #1959149)
    - Add json parsing of ip addr show (SC-723) (#1210)
    - cc_rsyslog: fix typo in docstring (#1207) [Louis Sautier]
    - Update .github-cla-signers (#1204) [Chris Lalos]
    - sources/azure: drop unused case in _report_failure() (#1200)
      [Chris Patterson]
    - sources/azure: always initialize _ephemeral_dhcp_ctx on unpickle (#1199)
      [Chris Patterson]
    - Add support fo...

Read more...

Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Jan Graichen (jgraichen) wrote :

The fix seems to still not be available in Ubuntu 18.04 and 20.04 as well as their cloud images. The current images still list the LXD data source before the OpenStack data source.

Is there any additional step needed to fix the issue in the LTS releases?

Revision history for this message
James Falcon (falcojr) wrote :

Jan, the fix is available upstream but not yet rolled out to existing Ubuntu series. They're expected to roll out next week assuming no additional issues are found.

The version number is 22.1-14-g2e17a0d6-0ubuntu1~YY.MM.2 and is currently available from the -proposed pockets.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.