Comment 4 for bug 1927124

Revision history for this message
Chad Smith (chad.smith) wrote :

More details, this is a upstream bug due to a cloudinit/stages creating a copy of the distro instance based on re-reading and updating distro config from disk if unset in Init
https://github.com/canonical/cloud-init/blob/master/cloudinit/stages.py#L91-L96

The two problems upstream are that are:
 1. cloudinit/distros/networking.py get_interfaces_by_mac doesn't honor blacklist_drivers from a datasource
 2. DataSourceAzure sets blacklist_drivers on DataSourceAzure.distro.networking.blacklist_drivers during _get_data.
 3. stages.py also does not copy blacklist_drivers into a newly instantiated distro instance on the found datasource.

This will only affect older kernels like 4.4 because any newer kernels surface a sysfs "master" links in SRIOV devices so cloud-init ignores them by default so no duplicate mac errors are seen.

The following diff resolves this for Azure on 4.4 FIPS kernel.

I'll have to talk with the team about how best to support this on Xenial PRO images.

diff --git a/cloudinit/distros/networking.py b/cloudinit/distros/networking.py
index c291196a..471d7e52 100644
--- a/cloudinit/distros/networking.py
+++ b/cloudinit/distros/networking.py
@@ -71,7 +71,7 @@ class Networking(metaclass=abc.ABCMeta):
     def get_interfaces(self) -> list:
         return net.get_interfaces()

- def get_interfaces_by_mac(self) -> dict:
+ def get_interfaces_by_mac(self, *, blacklist_drivers=None) -> dict:
         return net.get_interfaces_by_mac(
             blacklist_drivers=self.blacklist_drivers)

@@ -144,7 +144,9 @@ class Networking(metaclass=abc.ABCMeta):
         expected_macs = set(expected_ifaces.keys())

         # set of current macs
- present_macs = self.get_interfaces_by_mac().keys()
+ present_macs = self.get_interfaces_by_mac(
+ blacklist_drivers=self.blacklist_drivers
+ ).keys()

         # compare the set of expected mac address values to
         # the current macs present; we only check MAC as cloud-init
diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
index dcdf9f8f..0069bd0a 100755
--- a/cloudinit/sources/DataSourceAzure.py
+++ b/cloudinit/sources/DataSourceAzure.py
@@ -344,6 +344,7 @@ class DataSourceAzure(sources.DataSource):
         EventType.BOOT,
         EventType.BOOT_LEGACY
     }}
+ blacklist_drivers = BLACKLIST_DRIVERS

     _negotiated = False
     _metadata_imds = sources.UNSET
@@ -626,7 +627,7 @@ class DataSourceAzure(sources.DataSource):
         except Exception as e:
             LOG.warning("Failed to get system information: %s", e)

- self.distro.networking.blacklist_drivers = BLACKLIST_DRIVERS
+ self.distro.networking.blacklist_drivers = self.blacklist_drivers

         try:
             crawled_data = util.log_time(
diff --git a/cloudinit/stages.py b/cloudinit/stages.py
index bbded1e9..cc7619b3 100644
--- a/cloudinit/stages.py
+++ b/cloudinit/stages.py
@@ -92,6 +92,14 @@ class Init(object):
             # said datasource and move its distro/system config
             # from whatever it was to a new set...
             if self.datasource is not NULL_DATA_SOURCE:
+ # Certain datasources exclude network devices based
+ # on the corresponding driver (Azure SRIOV).
+ # When copying in a new distro, reset the
+ # blacklist_drivers for networking config generation.
+ if hasattr(self.datasource, "blacklist_drivers"):
+ self._distro.networking.blacklist_drivers = getattr(
+ self.datasource, "blacklist_drivers"
+ )
                 self.datasource.distro = self._distro
                 self.datasource.sys_cfg = self.cfg
         return self._distro