cloud-init fails if no network config is set

Bug #1686338 reported by Bernd Stolle
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

cloud-init version: 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1
Ubuntu version: 16.04 LTS

On a pristine install cloud-init fails in every stage on startup with the following error:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 647, in status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 350, in main_init
    init.apply_network_config(bring_up=not args.local)
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 648, in apply_network_config
    return self.distro.apply_network_config(netcfg, bring_up=bring_up)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 163, in apply_network_config
    dev_names = self._write_network_config(netconfig)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 89, in _write_network_config
    return self._supported_write_network_config(netconfig)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 82, in _supported_write_network_config
    renderer.render_network_config(network_config=network_config)
  File "/usr/lib/python3/dist-packages/cloudinit/net/renderer.py", line 47, in render_network_config
    network_state=parse_net_config_data(network_config), target=target)
  File "/usr/lib/python3/dist-packages/cloudinit/net/eni.py", line 446, in render_network_state
    util.write_file(fpeni, header + self._render_interfaces(network_state))
  File "/usr/lib/python3/dist-packages/cloudinit/net/eni.py", line 403, in _render_interfaces
    for iface in network_state.iter_interfaces():
AttributeError: 'NoneType' object has no attribute 'iter_interfaces'

This leads to the instance not being provisioned at all.

Looking through the code the following change seems to be the origin: https://github.com/cloud-init/cloud-init/commit/ef18b8ac4cf7e3dfd98830fbdb298380a192a0fc#diff-4542b4dbbb95a6fa664e1030691a1809R40

At this point there is a check if both 'version' and 'config' are True. By default no network config is set and the 'net_config' parameter passed here in 'parse_net_config_data' is '{'config': [], 'version': 1}'. With 'config' being an empty list it evaluates to False therefore skipping the parsing and just returning None. The rendering step never bothers to check for None and just assumes the passed config to be valid.

Suggested fix:
1. Change the check of 'config' to explicitly test for None (if version and config is not None) which would restore the intended semantics of parsing the empty list (and therefore returning the empty NetworkState).
2. Add an explicit check to the renderer(s) to check if the network_state is None and skip the rendering in this case (maybe emit a warning).

I could prepare a fix if you let me know where and how to submit a PR.

Revision history for this message
David Britton (dpb) wrote :

Hi, Bernd Stolle --

A patch is certainly welcome, the code is here:

https://code.launchpad.net/cloud-init

Once you clone and fix, you can push up a branch to your repository on launchpad. Once you do that, you can open a "merge proposal" by viewing the branch in the LP UI, choosing'master' as the reference to merge into.

Give it a shot, and ask us any questions in #cloud-init on freenode.

Revision history for this message
Ryan Harper (raharper) wrote :

Which datasource and what's included?

If no config is provided a fallback configuration is generated, but this appears to suggest that a config was provided (but was empty)?

That's certainly a bug; I'd like to fully understand what's going on though such that the datasource is returning an empty network config.

Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
David Britton (dpb) wrote :

Agreed with your analysis in the description.

Changed in cloud-init (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
analbeard (simon-weald) wrote :

Hi Ryan

In our use-case, we use our own custom tool to configure the networking. This tool reads a json file in /latest/ on the configdrive mount to get its network config info. As such, we don't use cloud-init to configure networking, and therefore network_data.json contains "{}". Should this be completely empty?

Revision history for this message
analbeard (simon-weald) wrote :

To follow up with the requested information, we are seeing this issue with ConfigDrive and the 0.7.9-113-g513e99e0-0ubuntu1~16.04.1 version of cloud-init. The contents of network_data.json are:

root@node1:/mnt/openstack/latest# cat network_data.json
{}

I'll proceed with disabling network configuration entirely.

Thanks

Simon

Revision history for this message
Scott Moser (smoser) wrote :

Is there a difference between the network_data.json that is in '/latest/' and the version that cloud-init reads?

It looks like we're not correctly handling the case where network_data.json is present but contains '{}'.... Which I guess is a bug, but I agree with Ryan in that i'm interested in knowing *why* it got that way.

Revision history for this message
Serapheim Dimitropoulos (serapheim) wrote :
Download full text (4.2 KiB)

Hit this again in 18.04:

```
[ 71.626777] cloud-init[1112]: ------------------------------------------------------------
[ 71.636680] cloud-init[1112]: Traceback (most recent call last):
[ 71.641365] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 655, in status_wrapper
[ 71.650437] cloud-init[1112]: ret = functor(name, args)
[ 71.653071] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 361, in main_init
[ 71.657061] cloud-init[1112]: init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
[ 71.661386] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 653, in apply_network_config
[ 71.668850] cloud-init[1112]: return self.distro.apply_network_config(netcfg, bring_up=bring_up)
[ 71.670922] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 175, in apply_network_config
[ 71.677208] cloud-init[1112]: dev_names = self._write_network_config(netconfig)
[ 71.678982] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 119, in _write_network_config
[ 71.685348] cloud-init[1112]: return self._supported_write_network_config(netconfig)
[ 71.687216] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 94, in _supported_write_network_config
[ 71.693349] cloud-init[1112]: renderer.render_network_config(network_config=network_config)
[ 71.694996] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/net/renderer.py", line 53, in render_network_config
[ 71.701354] cloud-init[1112]: network_state=parse_net_config_data(network_config), target=target)
[ 71.703120] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 193, in render_network_state
[ 71.709377] cloud-init[1112]: content = self._render_content(network_state)
[ 71.713066] cloud-init[1112]: File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 227, in _render_content
[ 71.717210] cloud-init[1112]: if network_state.version == 2:
[ 71.720878] cloud-init[1112]: AttributeError: 'NoneType' object has no attribute 'version'
[ 71.724878] cloud-init[1112]: ------------------------------------------------------------
[FAILED] Failed to start Initial cloud-init job (metadata service crawler).
```

Looking in the code:
[1] In netplan.py(https://github.com/cloud-init/cloud-init/blob/6d48d265a0548a2dc23e587f2a335d4e38e8db90/cloudinit/net/netplan.py)
```
    def _render_content(self, network_state):

        # if content already in netplan format, pass it back
        if network_state.version == 2: # <==== we got None here
```
seems like network_state is None. Let's see where this is set.

[2] It is set in renderer.py (https://github.com/cloud-init/cloud-init/blob/6d48d265a0548a2dc23e587f2a335d4e38e8db90/cloudinit/net/renderer.py)
```
        network_state=parse_net_config_data(network_config), target=target)
```

[3] Looking at parse_net_config_data() in network_state.py (https://github.com/cloud-init/cloud-init/blob/6d48d265a...

Read more...

Revision history for this message
Scott Moser (smoser) wrote :

Hi,

I believe you're suggesting that cloud-init fails handle config-drive data that you've provided it. From what I can understand, this config-drive data is from a locally modified OpenStack.

Cloud-init does not support all possible things that look remotely like a Config Drive from OpenStack. We can't reasonably support all possible data or data formats you produce on a disk and label it 'config-2'.

You point out that the fix seems straight forward enough, but if we modify cloud-init to handle you're slightly-modified ConfigDrive then how can we test and make sure that this doesn't regress in the future?

If you'd like us to look at this bug further, please run 'cloud-init collect-logs' and attach the cloud-init.tar.gz file that it creates.

Also, please give more information on how you've modified your openstack.

I've poked a bit and I can create a failing unit test, and a change to
make that test not fail: http://paste.ubuntu.com/p/YRPXTPm5hj/ .
The problem with this solution is that it will still render network
configuration and have side effects.

What you really need to do is properly tell cloud-init that network configuration is disabled. Currently the only way to do that is to place content in /etc/cloud.cfg.d/ that says 'network: config: disabled'.

I'm sorry if I've misunderstood the bug, and you are actually not "hacking" an openstack to produce this.

After you've provided the requested information, please set the bug status back to 'New'

Changed in cloud-init (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Prakash Surya (pkashsurya) wrote :

Hi Scott,

The error Serapheim described above was hit when running under QEMU; e.g. with the following command:

    $ sudo qemu-system-x86_64 -nographic -m 1G -drive file=ubuntu.qcow2

We're definitely not using OpenStack. The QCOW

Revision history for this message
Prakash Surya (pkashsurya) wrote :

Sorry, I didn't mean to post that yet. The QCOW2 image we're using is based on Ubuntu 18.04, and I can give any details about how we're generating it, and/or it's contents (ubuntu-minimal plus some minor configuration changes).

I can gather the logs requested from "cloud-init collect-logs" if that'd help, but based on Serapheim's comment and the easy reproducer, maybe logs aren't necessary?

Revision history for this message
Prakash Surya (pkashsurya) wrote :

Attached is the cloud init logs generated by "cloud-init collect-logs". I can provide the QCOW2 image I used too, if it's useful, although it's over 700M in size.

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Scott Moser (smoser) wrote :

@Prakash,

From your logs it appears that you are booting under kvm and are using the Ec2 Datasource. The metadata in that datasource is invalid. It describes network information (as seen in /var/log/cloud-init.log and cleaner in /run/cloud-init/instance-data.json) like this:

   "network": {
    "interfaces": {
     "macs": {
      "02:83:f8:24:b0:9c": {
       "device-number": "0",
       "interface-id": "eni-0ffc51e0fbd371979",
       "local-hostname": "ip-10-110-227-52.delphix.com",
       "local-ipv4s": "10.110.227.52",

That describes a nic with a nic with a MAC address of 02:83:f8:24:b0:9c.
but your system is booted with one like: 52:54:00:12:34:56

So in that sense, very much "Garbage in, Garbage out."

Now... that said, cloud-init is rendering network configuration in the "network" stage, which I think is kind of buggy or at least requires more thinking. I didn't think we were doing that. So we will look into that a bit more, but I'm not sure the fix will fix your specific problem.

Revision history for this message
Scott Moser (smoser) wrote :

@Prakesh or others.
Based on last comment #12, I have set this to 'Incomplete'.
Please provide some more information and set the bug back to "New" when you're done.

Thanks.
Scott

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
Simos Xenitellis  (simosx) wrote :
Download full text (5.2 KiB)

The following relate to LXD and the LXD container images for 16.04 and 18.04. Both container images have cloud-init 18.2 (18.2-4-g05926e48).

SUMMARY
cloud-init fails to set up networking if a container is launched with cloud-init networking instructions.

WHAT WAS TESTED
16.04 container image with cloud-init v1 and v2 configurations.
18.04 container image with cloud-init v1 and v2 configurations.

HOW TO REPRODUCE

1. Set up LXD

2. Create a new LXD profile. Create first this file (mycloudinit.profile)
---
$ cat mycloudinit.profile
config:
  user.network-config: |
    network:
        version: 1
        config:
        - type: physical
          name: eth0
          subnets:
            - type: dhcp
description: LXD profile with some cloud-init network-config
devices:
  eth0:
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: mycloudinit
used_by: []
--------------------
and then add it to a profile with:
$ lxc profile create mycloudinit
$ cat version1.profile | lxc profile edit mycloudinit

3. Launch a container while specifying this profile

lxc launch ubuntu:18.04 mycontainer --profile mycloudinit

4. Enter the container and check the cloud-init logs

lxc exec mycontainer -- sudo --user ubuntu --login
cd /var/log/

WHAT ERROR DO YOU GET

ON CONTAINER 18.04, THE ERROR IS:

Cloud-init v. 18.2 running 'init-local' at Tue, 24 Jul 2018 16:31:39 +0000. Up 1.00 seconds.
2018-07-24 16:31:39,793 - stages.py[WARNING]: Failed to rename devices: Failed to apply network config names. Found bad network config version: None
2018-07-24 16:31:39,794 - util.py[WARNING]: failed stage init-local
failed run of stage init-local
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 655, in status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 361, in main_init
    init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 653, in apply_network_config
    return self.distro.apply_network_config(netcfg, bring_up=bring_up)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 175, in apply_network_config
    dev_names = self._write_network_config(netconfig)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line 119, in _write_network_config
    return self._supported_write_network_config(netconfig)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 94, in _supported_write_network_config
    renderer.render_network_config(network_config=network_config)
  File "/usr/lib/python3/dist-packages/cloudinit/net/renderer.py", line 53, in render_network_config
    network_state=parse_net_config_data(network_config), target=target)
  File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 193, in render_network_state
    content = self._render_content(network_state)
  File "/usr/lib/python3/dist-packages/cloudinit/net/netplan.py", line 227, in _render_content
    if network_state.version == 2:...

Read more...

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Simos Xenitellis  (simosx) wrote :

I made a change in the cloud-init configuration:

--- version1.profile 2018-07-24 20:32:06.054307582 +0300
+++ version1.profile.after 2018-07-24 20:32:32.890558707 +0300
@@ -1,5 +1,5 @@
 config:
- user.network-config: |
+ user.user-data: |
     network:
         version: 1
         config:

and now cloud-init works as expected on LXD.

I still get a non-fatal warning: 2018-07-24 17:29:04,515 - __init__.py[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: 'b'network:'...'

Relevant discussion: https://github.com/lxc/lxd/issues/3347

Therefore, the workaround should be not to use `network-config` for network configuration but use `user-data` instead?

Revision history for this message
Chad Smith (chad.smith) wrote :

Cloud-init also introduced a fix for https://bugs.launchpad.net/bugs/1798117 which allows cloud-init to handle popping off a top-level network key as provided by LXC, so either approach will now work user.network-config vs user.user-data.

Thanks for this background Simos.

cloud-init 18.4 will handle both formats.

Changed in cloud-init (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.