Comment 15 for bug 1885527

Revision history for this message
Hadmut Danisch (hadmut) wrote :

I've found the problem.

The cloud provider (in this case german Hetzner) provides a machine with a virtual ethernet interface

# lspci | fgrep Ethernet
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device

which is, following the naming standards, named ens3

But then, the provider gives an cloud-init-file, which cloud-init fetches by http (I have not yet fully understood the after-boot-behaviour of cloud-init), which gives the network-configuration as

network-config:
  config:
  - mac_address: 96:00:00:70:f1:8f
    name: eth0
    subnets:
    - dns_nameservers:
      - 213.133.99.99
      - 213.133.100.100
      - 213.133.98.98
      ipv4: true
      type: dhcp
    - address: 2a01:4f9:c010:d0eb::1/64
      gateway: fe80::1
      ipv6: true
      type: static
    type: physical
  version: 1

thus renaming the device from ens3 to eth0

Therefore kern.log contains entries like

Sep 23 14:13:44 worker-00 kernel: [ 1.372316] virtio_net virtio0 ens3: renamed from eth0
Sep 23 14:13:44 worker-00 kernel: [ 4.834687] virtio_net virtio0 eth0: renamed from ens3

proving that the network interface changes between ens3 and eth0.

This then *sometimes* (logs taken from another machine where that problem occured) results in the cloud-init.log

2020-09-22 15:00:15,611 - __init__.py[DEBUG]: Attempting setup of ephemeral network on ens3 with 169.254.0.1/16 brd 169.254.255.255
2020-09-22 15:00:15,611 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '169.254.0.1/16', 'broadcast', '169.254.255.255', 'dev', 'ens3'] with allowed return codes
 [0] (shell=False, capture=True)
2020-09-22 15:00:15,637 - handlers.py[DEBUG]: finish: init-local/search-Hetzner: FAIL: no local data found from DataSourceHetzner
2020-09-22 15:00:15,637 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceHetzner.DataSourceHetzner'> failed
2020-09-22 15:00:15,639 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceHetzner.DataSourceHetzner'> failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 770, in find_source
    if s.update_metadata([EventType.BOOT_NEW_INSTANCE]):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 659, in update_metadata
    result = self.get_data()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceHetzner.py", line 53, in get_data
    with cloudnet.EphemeralIPv4Network(nic, "169.254.0.1", 16,
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 990, in __enter__
    self._bringup_device()
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 1027, in _bringup_device
    subp.subp(
  File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 291, in subp
    raise ProcessExecutionError(stdout=out, stderr=err,
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['ip', '-family', 'inet', 'addr', 'add', '169.254.0.1/16', 'broadcast', '169.254.255.255', 'dev', 'ens3']
Exit code: 1
Reason: -
Stdout:
Stderr: Cannot find device "ens3"
2020-09-22 15:00:15,646 - main.py[DEBUG]: No local datasource found

So it seems to be a timing problem.

If the cloud-init-configuration contains a different interface name that the natural standard name (here: eth0 instead of ens3), then it can sometimes (rarely, but experienced several times) happen that cloud-init does not find the interface ens3, because it has been renamed eth0, thus can't initialize it, can't fetch the init file and then for some strange reason believes it should refresh itself.