Change breaks cloud-init on Ali

Bug #1917875 reported by Dirk Marwinski on 2021-03-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
High
Unassigned

Bug Description

Hi,

cloud-init change

https://github.com/canonical/cloud-init/commit/70dbccbbb27f7cc3f2decd692d41403f0d745c62

appears to break cloud-init on Aliyun. We had been using a version 20.2 or more precisely

http://ftp.de.debian.org/debian/pool/main/c/cloud-init/cloud-init_20.2-2~deb10u1_all.deb

with no issues. Version 20.4.1 or more precisely

http://ftp.de.debian.org/debian/pool/main/c/cloud-init/cloud-init_20.4.1-1_all.deb

breaks with the log output shown below. As the key device-number is not used in 20.2 this clearly points to the change referenced above. It is kind of unclear where that key is supposed to be configured as there is no other reference to that in the code.

This is more or less a standard Debian configuration that we are using from the packages referenced above. This appears to cause user and group management to be skipped which is causing us headache. Quite interestingly
it does work find on AWS which appears to use the same code.

Let me know if you need anything else.

2021-03-05 16:47:30,866 - util.py[DEBUG]: loaded blob returned None, returning default.
2021-03-05 16:47:30,867 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2021-03-05 16:47:30,867 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2021-03-05 16:47:30,867 - util.py[DEBUG]: Reading from /sys/class/net/ens5/address (quiet=False)
2021-03-05 16:47:30,867 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/ens5/address
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/ens5/name_assign_type (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/ens5/name_assign_type
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/ens5/carrier (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/ens5/carrier
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/ens5/address (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/ens5/address
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/lo/addr_assign_type (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/lo/addr_assign_type
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/lo/uevent (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 23 bytes from /sys/class/net/lo/uevent
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/lo/device/device (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/ens5/addr_assign_type (quiet=False)
2021-03-05 16:47:30,868 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/ens5/addr_assign_type
2021-03-05 16:47:30,868 - util.py[DEBUG]: Reading from /sys/class/net/ens5/uevent (quiet=False)
2021-03-05 16:47:30,869 - util.py[DEBUG]: Read 25 bytes from /sys/class/net/ens5/uevent
2021-03-05 16:47:30,869 - util.py[DEBUG]: Reading from /sys/class/net/ens5/address (quiet=False)
2021-03-05 16:47:30,869 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/ens5/address
2021-03-05 16:47:30,869 - util.py[DEBUG]: Reading from /sys/class/net/ens5/device/device (quiet=False)
2021-03-05 16:47:30,869 - util.py[DEBUG]: Read 7 bytes from /sys/class/net/ens5/device/device
2021-03-05 16:47:30,869 - util.py[DEBUG]: Reading from /sys/class/net/lo/type (quiet=False)
2021-03-05 16:47:30,869 - util.py[DEBUG]: Read 4 bytes from /sys/class/net/lo/type
2021-03-05 16:47:30,869 - util.py[DEBUG]: Reading from /sys/class/net/ens5/type (quiet=False)
2021-03-05 16:47:30,869 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/ens5/type
2021-03-05 16:47:30,869 - util.py[WARNING]: failed stage init
2021-03-05 16:47:30,869 - util.py[DEBUG]: failed stage init
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 653, in status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
    init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 678, in apply_network_config
    netcfg, src = self._find_networking_config()
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 643, in _find_networking_config
    if self.datasource and hasattr(self.datasource, 'network_config'):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceEc2.py", line 415, in network_config
    result = convert_ec2_metadata_network_config(
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceEc2.py", line 774, in convert_ec2_metadata_network_config
    nic_idx = int(nic_metadata['device-number']) + 1
KeyError: 'device-number'
2021-03-05 16:47:30,870 - atomic_helper.py[DEBUG]: Atomically writing to file /var/lib/cloud/data/status.json (via temporary file /var/lib/cloud/data/tmppae4_4kw) - w: [644] 526 bytes/chars
2021-03-05 16:47:30,870 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2021-03-05 16:47:30,870 - util.py[DEBUG]: Read 12 bytes from /proc/uptime
2021-03-05 16:47:30,870 - util.py[DEBUG]: cloud-init mode 'init' took 1.744 seconds (1.74)
2021-03-05 16:47:30,870 - handlers.py[DEBUG]: finish: init-network: SUCCESS: searching for network datasources

description: updated
Dan Watkins (oddbloke) wrote :

Hi Dirk,

Thanks for using cloud-init and taking the time to file this bug! `device-number` is an attribute that the EC2 IMDS sets in its network configuration; on an EC2 instance I see:

$ cloud-init query ds.meta_data.network.interfaces.macs
{
 "0a:30:3f:3d:e0:49": {
  "device_number": "0",
  "interface_id": "eni-014a09bfddf3a9ba9",
  <snip>
 }
}

(The use of an underscore vs. a dash here is not material: `query` normalises dashes to underscores when producing its output.)

Aliyun's datasource is based on the EC2 datasource (via inheritance), so it inherited this new behaviour. However, it's evidently the case that Aliyun do not expose `device-number` in their network metadata. We should handle this case gracefully, as there will be other EC2-a-like metadata services which don't do so.

Having said that, it's possible that Aliyun do expose this information but in a way we don't handle: would you be able to paste the output of `cloud-init query ds.meta_data.network.interfaces.macs` back in this bug report?

Thanks!

Dan

Changed in cloud-init:
status: New → Triaged
importance: Undecided → High
Dirk Marwinski (marwinski) wrote :

Hi Dan,

Thanks for taking this up! This is the output from the command above on Ali:

root@iZgw8c34a8o8lyuosnx61pZ:/home/admin# cloud-init query ds.meta_data.network.interfaces.macs
{
 "00:16:3e:00:11:c1": {
  "gateway": "192.168.0.125",
  "netmask": "255.255.255.128",
  "network_interface_id": "eni-gw834afjtsissx85nrk9",
  "primary_ip_address": "192.168.0.6",
  "private_ipv4s": "[\"192.168.0.6\"]",
  "vpc_cidr_block": "192.168.0.0/24",
  "vpc_id": "vpc-gw8bwh3pntfg5fsg5rypc",
  "vswitch_cidr_block": "192.168.0.0/25",
  "vswitch_id": "vsw-gw8j932a53kdc2wojcejc"
 }
}

@Jerry: can you take this up with Ali support.

Thanks,
Dirk

Dan Watkins (oddbloke) wrote :
Changed in cloud-init:
status: Triaged → Fix Committed

This bug is believed to be fixed in cloud-init in version 21.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers