cloud-init based images not working on LXC instances

Bug #1834506 reported by Miguel Ángel Herranz Trillo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Stephen Finucane

Bug Description

Description
===========

Apparently cloud-init based images are unable to correctly initialize in a OpenStack compute using LXC.

Steps to reproduce:
==================

Install a OpenStack using Libvirt + LXC (nova-compute-lxc package in Ubuntu) and launch a cloud-init based instance.

This has been tested with an Ubuntu cloud image (after setting a default root password to allow console login, see [2]):

```
  wget http://uec-images.ubuntu.com/releases/18.04/release/ubuntu-18.04-server-cloudimg-amd64.tar.gz # md5sum f90bf979eab74f18bd4400a7355157dc, downloaded on 2019-06-18
  tar zxfv ubuntu-18.04-server-cloudimg-amd64.tar.gz
  virt-customize -a bionic-server-cloudimg-amd64.img --root-password password:gocubsgo
  openstack image create "ubuntu_with_root_password" --file bionic-server-cloudimg-amd64.img --disk-format raw --container-format bare --public
```

NOTE: since cloud-init usually check the DMI product name (see [1]), in case that the host is an OpenStack compute host or instance itself, the DMI product name may be passed to the LXC instance, so cloud-init would use OpenStack datasource as expected, but for wrong reasons. To avoid that posibility, it may be better to reproduce in a non OpenStack based host.

Environment info:

  Host: AWS instance (t2.large)
  OS: Ubuntu 18.04.2 LTS
  Kernel: Linux 4.15.0-1041-aws #43-Ubuntu SMP x86_64
  Openstack: queens

  Related packages versions:

  nova-api: Installed: 2:17.0.9-0ubuntu3
  python-nova: Installed: 2:17.0.9-0ubuntu3
  nova-compute: Installed: 2:17.0.9-0ubuntu3
  nova-compute-lxc: Installed: 2:17.0.9-0ubuntu3

Expected result
===============
The instance should have a working network configuration and be accesible, etc.

Actual result
=============

The instance is created and is running, but is not accessible by network (neither using the router network namespace nor by assigning an floating IP).

It can be accessed with 'virsh -c lxc:/// console instance-<number>' using root/gocubsgo previously setted:

```
Ubuntu 18.04.2 LTS ubuntu console

ubuntu login: root
Password:
run-parts: /etc/update-motd.d/98-fsck-at-reboot exited with return code 1

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@ubuntu:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
19: eth0@if20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:16:3e:b3:25:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
root@ubuntu:~#
```

It can be seen that the network is not configured.

The logs of cloud-init show that no datasource was found so it disabled cloud-init start.

Logs & Configs
==============

```
root@ubuntu:~# cat /run/cloud-init/.ds-identify.result
1
```

```
root@ubuntu:~# cat /run/cloud-init/ds-identify.log
[up 22612.42s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, None ]
DMI_PRODUCT_NAME=HVM domU
DMI_SYS_VENDOR=Xen
DMI_PRODUCT_SERIAL=ec21a6db-7988-521e-8fe1-eaed834f9b54
DMI_PRODUCT_UUID=EC21A6DB-7988-521E-8FE1-EAED834F9B54
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=
FS_LABELS=unavailable:container
ISO9660_DEVS=unavailable:container
KERNEL_CMDLINE=/sbin/init
VIRT=lxc-libvirt
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=4.15.0-1041-aws
UNAME_KERNEL_VERSION=#43-Ubuntu SMP Thu Jun 6 13:39:11 UTC 2019
UNAME_MACHINE=x86_64
UNAME_NODENAME=ubuntu
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=NoCloud ConfigDrive OpenNebula DigitalOcean Azure AltCloud OVF MAAS GCE OpenStack CloudSigma SmartOS Bigstep Scaleway AliYun Ec2 CloudStack Hetzner IBMCloud None
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=24 ppid=5
is_container=true
is_ds_enabled(IBMCloud) = true.
is_ds_enabled(IBMCloud) = true.
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 22612.47s] returning 1
root@ubuntu:~# cat /run/cloud-init/
.ds-identify.result cloud.cfg
cloud-init-generator.log ds-identify.log
```

```
root@ubuntu:~# cat /run/cloud-init/cloud.cfg
di_report:
  datasource_list: [ ]
  # reporting not found result. notfound=disabled.
```

```
root@ubuntu:~# cat /run/cloud-init/cloud-init-generator.log
/lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late
kernel command line (container[lxc-libvirt]: pid 1 cmdline not available):
kernel_cmdline found unset
etc_file found unset
default found enabled
checking for datasource
ds-identify rc=1
ds-identify _RET=notfound
cloud-init is enabled but no datasource found, disabling
already disabled: no change needed [no /run/systemd/generator.early/multi-user.target.wants/cloud-init.target]
```

Related bugs:
=============

  https://bugs.launchpad.net/nova/+bug/1693524

References
==========

[1] https://cloudinit.readthedocs.io/en/latest/topics/datasources/openstack.html)
[2] https://serverascode.com/2018/06/26/using-cloud-images.html

Changed in nova:
assignee: nobody → Miguel Ángel Herranz Trillo (maherranzt)
status: New → In Progress
Revision history for this message
Miguel Ángel Herranz Trillo (maherranzt) wrote :

I proposed a fix, but it seems Gerrit did not post the relation here (unlike other bug/review I filed recently), so I am posting here manually:

Fix proposed on branch: master
Review: https://review.opendev.org/#/c/667976/

Revision history for this message
melanie witt (melwitt) wrote :

From what I can tell, nova isn't the place where an addition of product_name belongs. I found this similar issue from awhile back:

https://bugs.launchpad.net/cloud-init/+bug/1799954

where cloud-init was failing openstack detection for lxd instances.

The fix was applied to the nova-lxd codebase:

https://review.opendev.org/411985

to add the product_name information, similar to what you've proposed to nova.

I can't find where the source for the nova-compute-lxc package comes from -- but I would expect that's where your fix is needed. Do you know anything about where the source for nova-compute-lxc comes from?

Revision history for this message
Miguel Ángel Herranz Trillo (maherranzt) wrote :

Hi Melanie, thanks for looking at the issue and the valuable feedback, I haven't seen the other bugs, despite I did search, but maybe not in correct way or on the correct projects.

The thing is that nova-lxd driver is different enough to default nova configuration (libvirt+kvm) that they need the product_name using the LXD API, that goes in nova-lxd package. LXD does not use libvirt at all.

But in this bug I am using libvirt+lxc, which as far as I know, is also integrated with the other libvirt+<whatever> options, as they all use libvirt API, so they are located in the default nova package (for example, in Ubuntu, there is no nova-lxc package, but only a nova-compute-lxc package that only add the nova-compute.conf file setting up "virt_type=lxc").

So, I think, and correct me if I am wrong with this assumption, that the default nova driver is related to libvirt in general, and so, any option for any libvirt 'type' (kvm, lxc, etc) supported by OpenStack, should be integrated there.

There is many places in nova code where virt_type is checked, so I don't see a major concern doing this time also.

So in summary, to me is like this:

(package nova-lxd) -> talks to LXD API -> talks to LXC API
(package nova) -> talks to libvirt API -> talks to KVM/LXC API

There is a potential source of confusion in one thing that maybe is worth to warn about: Ubuntu use 'lxc' command as the client to interface its LXD service, but it has nothing to do with libvirt-lxc as far as I know.

Let me know what you think about this rationale. If there is another solution or place to put it, I don't mind to change it.

Revision history for this message
melanie witt (melwitt) wrote :

Hi Miguel, thanks for your reply. Apologies for my lack of awareness about what nova-compute-lxc package contains.

If nova-compute-lxc is purely configuration, then I see your rationale and seems like the change would have to go in the nova libvirt driver.

I think the main point where I don't understand is, why is it that the 'lxc' virt_type needs to have product_name set to 'OpenStack Nova' but no other virt_type needs it? How are the other virt_type being correctly detected as OpenStack then?

Don't feel you need to chase down answers to these questions. I'll see if I can find someone else on the team who can fill in some knowledge here so all can understand why 'lxc' is different and go ahead with the right fix.

Revision history for this message
Miguel Ángel Herranz Trillo (maherranzt) wrote :

Nothing to apologize, I also have to start to learn about the internals of nova, libvirt and cloud-init and the relation.

I think other virt_types are based in true virtualization (no just Linux namespaces as containers) so they don't share the host kernel and hence, they setup a different DMI registry than the host, and that is something cloud-init is using to check if OpenStack is used. So, since the real VMs see a different kernel/DMI parameters, they don't need or even check the fallback used for containers that is putting the 'product_name' in init process' environment. Al least that is what I think it happens with KVM, maybe not true for other virt_types.

But I also haven't find the place where that DMI code is configured in nova, so maybe it is done outside of nova (qemu?), and so should be the product_name env var for LXC.

If you can find someone that know this part it will be very helpful.

Revision history for this message
Miguel Ángel Herranz Trillo (maherranzt) wrote :

I dug a bit more and I think I found it. That "OpenStack Nova" string is defined here:

nova/version.py:
----------------

NOVA_PRODUCT = "OpenStack Nova"

def product_string():
    _load_config()

    return NOVA_PRODUCT

and used in the driver to setup 'system_product'

nova/virt/libvirt/driver.py:
----------------------------

    def _get_guest_config_sysinfo(self, instance):
        sysinfo = vconfig.LibvirtConfigGuestSysinfo()
        ...
        sysinfo.system_product = version.product_string()

which is translated again to Libvirt XML schema (https://libvirt.org/formatdomain.html#elementsSysinfo) in charge of DMI entries of the VM.

nova/virt/libvirt/config.py:
----------------------------

    def format_dom(self):
        sysinfo = super(LibvirtConfigGuestSysinfo, self).format_dom()
        ...
        system = etree.Element("system")
        ...
        if self.system_product is not None:
            system.append(self._text_node("entry", self.system_product,
                                          name="product"))
        ...
        if len(list(system)) > 0:
            sysinfo.append(system)

        return sysinfo

So the difference seems to be that KVM/etc uses DMI tag at `domain/sysinfo/system/entry[name='product']` and LXC should use env var tag at `domain/os/initenv[name='product_name']` which is handled in another private member function of the driver.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

FWIW, the analysis from Miguel in comment#6 (nice sleuthing) makes sense to me.

Just for my own education: The 'initenv' variables are set _before_ the `init` process of the container is spawned, so they can be used by it. And I don't see any security implications related to it — also FWIW, double-checked with a libvirt upstream developer.

Changed in nova:
assignee: Miguel Ángel Herranz Trillo (maherranzt) → sean mooney (sean-k-mooney)
Changed in nova:
assignee: sean mooney (sean-k-mooney) → Stephen Finucane (stephenfinucane)
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/667976
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8f975bc8287d980f3e6c5da601051cf626c081dd
Submitter: Zuul
Branch: master

commit 8f975bc8287d980f3e6c5da601051cf626c081dd
Author: Miguel Herranz <email address hidden>
Date: Thu Jun 27 15:43:27 2019 +0200

    Add support for cloud-init on LXC instances

    Images that use cloud-init are not correctly initialized when using
    libvirt LXC nova driver.

    One way cloud-init checks if the OpenStack datasource should be used
    is by checking DMI data that is meaningful for virtual machines but
    not for containers.

    Another way cloud-init is using is to check if the 'product_name'
    env variable for init process (PID 1) is "OpenStack Nova" [1][2].

    This commit add that env variable to the instance when the driver
    is LXC.

    [1] https://cloudinit.readthedocs.io/en/latest/topics/datasources/openstack.html
    [2] https://git.launchpad.net/cloud-init/tree/tools/ds-identify#n974

    Closes-Bug: 1834506

    Change-Id: I2d0a4461081f5284d16df73a783cb7dae3ff0ef5
    Signed-off-by: Miguel Herranz <email address hidden>

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.