azure advanced networking sometimes triggers duplicate mac detection

Bug #1844191 reported by Ryan Harper
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fix Released

Bug Description

Hi, we're still being affected by this on Azure with 19.2-24-ge7881d5c-0ubuntu1~18.04.1 - using PACKER to build from image: BuildSource : Marketplace/Canonical/UbuntuServer/18.04-DAILY-LTS

Here is the packer config:
    "provisioners": [
          "type": "shell",
          "inline": [
            "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done"
            "type": "ansible",
            "playbook_file": "{{user `ansible_playbook`}}",
            "user": "packer",
            "extra_arguments": [ "--extra-vars", "codeVersion={{user `code_version`}} managed_image_name={{user `managed_image_name`}}" ]
            "type": "shell",
            "execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
            "inline_shebang": "/bin/sh -x",
            "inline": [ "/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync" ]

Here is the playbook:
- hosts: all
  remote_user: ubuntu
  become: yes
  become_method: sudo
  become_user: root

    DEBIAN_FRONTEND: noninteractive

Note: we are applying `enableAcceleratedNetworking: true` to the NIC, anecdotally we think this is related.

Usually our playbook has more in it (obviously) but Azure kept pointing fingers at us that our image was causing the problem, so I ran this test simply deploying a blank deprovisioned image via our same process.

And here's what happens on the serial console log:

[ 20.337603] sh[910]: + [ -e /var/lib/cloud/instance/obj.pkl ]
[ 20.343177] sh[910]: + echo cleaning persistent cloud-init object
[ 20.349027] [ OK ] Started Network Time Synchronization.
[ OK ] Reached target System Time Synchronized.
sh[910]: cleaning persistent cloud-init object
[ 20.361066] sh[910]: + rm /var/lib/cloud/instance/obj.pkl
[ 20.412333] sh[910]: + exit 0
[ 34.282291] cloud-init[938]: Cloud-init v. 19.2-24-ge7881d5c-0ubuntu1~18.04.1 running 'init-local' at Mon, 16 Sep 2019 18:02:23 +0000. Up 32.02 seconds.
[ 34.288809] cloud-init[938]: 2019-09-16 18:02:25,262 -[WARNING]: failed stage init-local
[ 34.423057] cloud-init[938]: failed run of stage init-local
[ 34.437716] cloud-init[938]: ------------------------------------------------------------
[ 34.441088] cloud-init[938]: Traceback (most recent call last):
[ 34.443719] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/", line 653, in status_wrapper
[ 34.448072] cloud-init[938]: ret = functor(name, args)
[ 34.450532] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/", line 362, in main_init
[ 34.454849] cloud-init[938]: init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
[ 34.458725] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/", line 697, in apply_network_config
[ 34.463421] cloud-init[938]: net.wait_for_physdevs(netcfg)
[ 34.466051] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/net/", line 344, in wait_for_physdevs
[ 34.470673] cloud-init[938]: present_macs = get_interfaces_by_mac().keys()
[ 34.473964] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/net/", line 633, in get_interfaces_by_mac
[ 34.479325] cloud-init[938]: (name, ret[mac], mac))
[ 34.481838] cloud-init[938]: RuntimeError: duplicate mac found! both 'eth0' and 'enP1s1' have mac '00:0d:3a:7c:f7:3f'
[ 34.486614] cloud-init[938]: ------------------------------------------------------------
[FAILED] Failed to start Initial cloud-init job (pre-networking).
See 'systemctl status cloud-init-local.service' for details.
[ OK ] Reached target Network (Pre).
         Starting Network Service...
[ OK ] Started Network Service.
         Starting Wait for Network to be Configured...
         Starting Network Name Resolution...
[ OK ] Started Wait for Network to be Configured.
         Starting Initial cloud-init job (metadata service crawler)...
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Host and Network Name Lookups.
[ OK ] Reached target Network.

When this happens, the machine never boots, and we get an OSProvisioningTimedOut error after about 30 minutes, and the machine never reaches healthy state.

Related branches

Ryan Harper (raharper)
Changed in cloud-init:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Ryan Harper (raharper) wrote :

I can reproduce this on Azure with advanced networking on 19.2

root@ragged-bond1:~# python3
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cloudinit import net
>>> import yaml
>>> y = yaml.load(open('/etc/netplan/50-cloud-init.yaml'))
>>> net.wait_for_physdevs(y['network'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/cloudinit/net/", line 344, in wait_for_physdevs
    present_macs = get_interfaces_by_mac().keys()
  File "/usr/lib/python3/dist-packages/cloudinit/net/", line 633, in get_interfaces_by_mac
    (name, ret[mac], mac))
RuntimeError: duplicate mac found! both 'enP1s1' and 'eth0' have mac '00:0d:3a:6c:d9:80'

Looking at the sriov device, the sysfs attributes include a 'master' pointing to eth0, so I think we can reasonably ignore devices which have the 'master' which is related to device bonding.

root@ragged-bond1:/usr/lib/python3/dist-packages# diff -u cloudinit/net/ cloudinit/net/
--- cloudinit/net/ 2019-09-16 21:15:42.550376776 +0000
+++ cloudinit/net/ 2019-09-16 21:18:26.178760942 +0000
@@ -109,6 +109,10 @@
     return os.path.exists(sys_dev_path(devname, "bonding"))

+def has_master_attr(devname):
+ return os.path.exists(sys_dev_path(devname, path='master'))
 def is_renamed(devname):
     /* interface name assignment types (sysfs name_assign_type attribute) */
@@ -661,6 +665,9 @@
         if is_bond(name):
+ if has_master_attr(name):
+ LOG.debug('Skipping device %s with "master" sysfs attriute', name)
+ continue
         mac = get_interface_mac(name)
         # some devices may not have a mac (tun0)
         if not mac:

Changed in cloud-init:
importance: High → Critical
status: Triaged → In Progress
Revision history for this message
Ryan Harper (raharper) wrote :

I've uploaded a version of cloud-init with this patch to a PPA:

% add-apt-repository -y ppa:raharper/bugfixes
% apt install cloud-init

Revision history for this message
Danno B (slikk66) wrote :

Hi Ryan, our current workflow is to take the DAILY image, create a base image for all our specialized images "base1804" on a bi-weekly basis, and then create a specialized image for each of our services as the code repositories are updated.

How long until you estimate this will natively find itself into the Canonical/UbuntuServer/18.04-DAILY-LTS image?

I'll try to get this installed currently via your deb file until then.

Thank you for your effort on this, you've got the patch out before Azure has even responded to my support request our ticket.

Revision history for this message
Danno B (slikk66) wrote :

Patch looks good on our instance! Was able to boot with advanced networking after manually installing this deb file to the image during packer build.

I'll keep the patch in place until I've confirmed it's been merged and released onto the daily image.

Thanks again!

Revision history for this message
Dan Watkins (oddbloke) wrote :

Added the block-proposed tag so that we can perform manual eoan testing before migration happens.

tags: added: block-proposed
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 059d049c to cloud-init on branch master.
To view that commit see the following URL:

Changed in cloud-init:
status: In Progress → Fix Committed
Revision history for this message
Dragonshadow (gteachey) wrote :

I'd like to confirm, this has not been released to a package update yet correct? We appear to have hit this same bug.

We're using Accelerated Networking, and adding a second IP to the interface generated the same duplicate MAC error reported here.

I'm not sure if a separate bug report should be made? In our case the machine was already deployed/provisioned, but after adding in a second IP to the NIC we've lost routing and the error is seen.

Revision history for this message
lilideng (lilideng) wrote :

when it will go into azure gallery image?

Revision history for this message
Chad Smith (chad.smith) wrote :

I apologize for the delay here, this bug should have been set to Fix Released when we released 19.2.36 (which has been published to Ubuntu Xenial, Bionic, Disco and Eaon images as of Oct 10th I believe. Azure image builds were delayed a bit due to an image build pipeline issue, but Azure also saw these fixes in October. Marking Fix Released on this bug now.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers