azure advanced networking sometimes triggers duplicate mac detection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
Critical
|
Unassigned | ||
cloud-init (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Undecided
|
Unassigned | ||
Kinetic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Impact]
When accelerated network is enabled on Azure, the host presents two network interfaces with the same mac address to the VM:
a synthetic nic (netvsc) and a VF nic, which is enslaved to the synthetic nic.
The net module is already excluding slave nics when enumerating interfaces. However, if cloud-init starts enumerating after the kernel makes the VF visible to userspace, but before the enslaving has finished, cloud-init will see two nics with duplicate mac.
[Test Case]
Launch an instance with accelerated networking and ensure the instance comes up as expected with no networking-related Tracebacks in /var/log/
[Regression Potential]
This is already in error handling code and is scoped to a particular driver. A regression here would mean we could allow a cloud-init instance to come up with duplicate macs when we otherwise wouldn't.
[Other info]
This bug was attempted but could not be reproduced by the cloud-init team. It was reported as being seen in "1 in 1000" launches.
Github PR: https:/
=== End SRU Template ===
Initial bug:
Hi, we're still being affected by this on Azure with 19.2-24-
Here is the packer config:
````
"provisioners": [
{
"type": "shell",
"inline": [
"while [ ! -f /var/lib/
]
},
{
"type": "ansible",
"user": "packer",
},
{
"type": "shell",
}]
````
Here is the playbook:
````
---
- hosts: all
remote_user: ubuntu
become: yes
become_method: sudo
become_user: root
environment:
DEBIAN_
````
Note: we are applying `enableAccelera
Usually our playbook has more in it (obviously) but Azure kept pointing fingers at us that our image was causing the problem, so I ran this test simply deploying a blank deprovisioned image via our same process.
And here's what happens on the serial console log:
````
[ 20.337603] sh[910]: + [ -e /var/lib/
[ 20.343177] sh[910]: + echo cleaning persistent cloud-init object
[ 20.349027] [ OK ] Started Network Time Synchronization.
[ OK ] Reached target System Time Synchronized.
sh[910]: cleaning persistent cloud-init object
[ 20.361066] sh[910]: + rm /var/lib/
[ 20.412333] sh[910]: + exit 0
[ 34.282291] cloud-init[938]: Cloud-init v. 19.2-24-
[ 34.288809] cloud-init[938]: 2019-09-16 18:02:25,262 - util.py[WARNING]: failed stage init-local
[ 34.423057] cloud-init[938]: failed run of stage init-local
[ 34.437716] cloud-init[938]: -------
[ 34.441088] cloud-init[938]: Traceback (most recent call last):
[ 34.443719] cloud-init[938]: File "/usr/lib/
[ 34.448072] cloud-init[938]: ret = functor(name, args)
[ 34.450532] cloud-init[938]: File "/usr/lib/
[ 34.454849] cloud-init[938]: init.apply_
[ 34.458725] cloud-init[938]: File "/usr/lib/
[ 34.463421] cloud-init[938]: net.wait_
[ 34.466051] cloud-init[938]: File "/usr/lib/
[ 34.470673] cloud-init[938]: present_macs = get_interfaces_
[ 34.473964] cloud-init[938]: File "/usr/lib/
[ 34.479325] cloud-init[938]: (name, ret[mac], mac))
[ 34.481838] cloud-init[938]: RuntimeError: duplicate mac found! both 'eth0' and 'enP1s1' have mac '00:0d:3a:7c:f7:3f'
[ 34.486614] cloud-init[938]: -------
[FAILED] Failed to start Initial cloud-init job (pre-networking).
See 'systemctl status cloud-init-
[ OK ] Reached target Network (Pre).
Starting Network Service...
[ OK ] Started Network Service.
Starting Wait for Network to be Configured...
Starting Network Name Resolution...
[ OK ] Started Wait for Network to be Configured.
Starting Initial cloud-init job (metadata service crawler)...
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Host and Network Name Lookups.
[ OK ] Reached target Network.
````
When this happens, the machine never boots, and we get an OSProvisioningT
Related branches
- Server Team CI bot: Approve (continuous-integration)
- Dan Watkins: Approve
-
Diff: 2624 lines (+1361/-289)42 files modified.gitignore (+11/-0)
Makefile (+3/-1)
cloudinit/atomic_helper.py (+6/-0)
cloudinit/net/__init__.py (+136/-3)
cloudinit/net/tests/test_init.py (+327/-0)
cloudinit/sources/DataSourceEc2.py (+1/-1)
cloudinit/sources/DataSourceOVF.py (+20/-1)
cloudinit/sources/DataSourceOracle.py (+61/-1)
cloudinit/sources/helpers/vmware/imc/guestcust_error.py (+1/-0)
cloudinit/sources/helpers/vmware/imc/guestcust_util.py (+37/-0)
cloudinit/sources/tests/test_oracle.py (+147/-0)
cloudinit/tests/helpers.py (+8/-0)
debian/changelog (+20/-0)
dev/null (+0/-13)
doc/rtd/conf.py (+3/-10)
doc/rtd/index.rst (+0/-9)
doc/rtd/topics/availability.rst (+53/-10)
doc/rtd/topics/datasources.rst (+92/-94)
doc/rtd/topics/datasources/altcloud.rst (+12/-11)
doc/rtd/topics/datasources/azure.rst (+2/-1)
doc/rtd/topics/datasources/cloudstack.rst (+1/-1)
doc/rtd/topics/datasources/configdrive.rst (+8/-8)
doc/rtd/topics/datasources/digitalocean.rst (+4/-2)
doc/rtd/topics/datasources/ec2.rst (+6/-5)
doc/rtd/topics/datasources/exoscale.rst (+6/-6)
doc/rtd/topics/datasources/nocloud.rst (+8/-8)
doc/rtd/topics/datasources/opennebula.rst (+15/-15)
doc/rtd/topics/datasources/openstack.rst (+2/-1)
doc/rtd/topics/datasources/smartos.rst (+7/-5)
doc/rtd/topics/debugging.rst (+7/-7)
doc/rtd/topics/dir_layout.rst (+22/-17)
doc/rtd/topics/docs.rst (+84/-0)
doc/rtd/topics/examples.rst (+1/-1)
doc/rtd/topics/faq.rst (+43/-0)
doc/rtd/topics/format.rst (+53/-28)
doc/rtd/topics/merging.rst (+12/-6)
doc/rtd/topics/network-config-format-v2.rst (+9/-9)
tests/unittests/test_datasource/test_ovf.py (+46/-9)
tests/unittests/test_ds_identify.py (+14/-2)
tests/unittests/test_vmware/test_guestcust_util.py (+65/-0)
tools/ds-identify (+1/-2)
tox.ini (+7/-2)
- cloud-init Commiters: Pending requested
-
Diff: 2624 lines (+1361/-289)42 files modified.gitignore (+11/-0)
Makefile (+3/-1)
cloudinit/atomic_helper.py (+6/-0)
cloudinit/net/__init__.py (+136/-3)
cloudinit/net/tests/test_init.py (+327/-0)
cloudinit/sources/DataSourceEc2.py (+1/-1)
cloudinit/sources/DataSourceOVF.py (+20/-1)
cloudinit/sources/DataSourceOracle.py (+61/-1)
cloudinit/sources/helpers/vmware/imc/guestcust_error.py (+1/-0)
cloudinit/sources/helpers/vmware/imc/guestcust_util.py (+37/-0)
cloudinit/sources/tests/test_oracle.py (+147/-0)
cloudinit/tests/helpers.py (+8/-0)
debian/changelog (+20/-0)
dev/null (+0/-13)
doc/rtd/conf.py (+3/-10)
doc/rtd/index.rst (+0/-9)
doc/rtd/topics/availability.rst (+53/-10)
doc/rtd/topics/datasources.rst (+92/-94)
doc/rtd/topics/datasources/altcloud.rst (+12/-11)
doc/rtd/topics/datasources/azure.rst (+2/-1)
doc/rtd/topics/datasources/cloudstack.rst (+1/-1)
doc/rtd/topics/datasources/configdrive.rst (+8/-8)
doc/rtd/topics/datasources/digitalocean.rst (+4/-2)
doc/rtd/topics/datasources/ec2.rst (+6/-5)
doc/rtd/topics/datasources/exoscale.rst (+6/-6)
doc/rtd/topics/datasources/nocloud.rst (+8/-8)
doc/rtd/topics/datasources/opennebula.rst (+15/-15)
doc/rtd/topics/datasources/openstack.rst (+2/-1)
doc/rtd/topics/datasources/smartos.rst (+7/-5)
doc/rtd/topics/debugging.rst (+7/-7)
doc/rtd/topics/dir_layout.rst (+22/-17)
doc/rtd/topics/docs.rst (+84/-0)
doc/rtd/topics/examples.rst (+1/-1)
doc/rtd/topics/faq.rst (+43/-0)
doc/rtd/topics/format.rst (+53/-28)
doc/rtd/topics/merging.rst (+12/-6)
doc/rtd/topics/network-config-format-v2.rst (+9/-9)
tests/unittests/test_datasource/test_ovf.py (+46/-9)
tests/unittests/test_ds_identify.py (+14/-2)
tests/unittests/test_vmware/test_guestcust_util.py (+65/-0)
tools/ds-identify (+1/-2)
tox.ini (+7/-2)
- Server Team CI bot: Approve (continuous-integration)
- Dan Watkins: Approve
-
Diff: 2624 lines (+1361/-289)42 files modified.gitignore (+11/-0)
Makefile (+3/-1)
cloudinit/atomic_helper.py (+6/-0)
cloudinit/net/__init__.py (+136/-3)
cloudinit/net/tests/test_init.py (+327/-0)
cloudinit/sources/DataSourceEc2.py (+1/-1)
cloudinit/sources/DataSourceOVF.py (+20/-1)
cloudinit/sources/DataSourceOracle.py (+61/-1)
cloudinit/sources/helpers/vmware/imc/guestcust_error.py (+1/-0)
cloudinit/sources/helpers/vmware/imc/guestcust_util.py (+37/-0)
cloudinit/sources/tests/test_oracle.py (+147/-0)
cloudinit/tests/helpers.py (+8/-0)
debian/changelog (+20/-0)
dev/null (+0/-13)
doc/rtd/conf.py (+3/-10)
doc/rtd/index.rst (+0/-9)
doc/rtd/topics/availability.rst (+53/-10)
doc/rtd/topics/datasources.rst (+92/-94)
doc/rtd/topics/datasources/altcloud.rst (+12/-11)
doc/rtd/topics/datasources/azure.rst (+2/-1)
doc/rtd/topics/datasources/cloudstack.rst (+1/-1)
doc/rtd/topics/datasources/configdrive.rst (+8/-8)
doc/rtd/topics/datasources/digitalocean.rst (+4/-2)
doc/rtd/topics/datasources/ec2.rst (+6/-5)
doc/rtd/topics/datasources/exoscale.rst (+6/-6)
doc/rtd/topics/datasources/nocloud.rst (+8/-8)
doc/rtd/topics/datasources/opennebula.rst (+15/-15)
doc/rtd/topics/datasources/openstack.rst (+2/-1)
doc/rtd/topics/datasources/smartos.rst (+7/-5)
doc/rtd/topics/debugging.rst (+7/-7)
doc/rtd/topics/dir_layout.rst (+22/-17)
doc/rtd/topics/docs.rst (+84/-0)
doc/rtd/topics/examples.rst (+1/-1)
doc/rtd/topics/faq.rst (+43/-0)
doc/rtd/topics/format.rst (+53/-28)
doc/rtd/topics/merging.rst (+12/-6)
doc/rtd/topics/network-config-format-v2.rst (+9/-9)
tests/unittests/test_datasource/test_ovf.py (+46/-9)
tests/unittests/test_ds_identify.py (+14/-2)
tests/unittests/test_vmware/test_guestcust_util.py (+65/-0)
tools/ds-identify (+1/-2)
tox.ini (+7/-2)
- Server Team CI bot: Needs Fixing (continuous-integration)
- cloud-init Commiters: Pending requested
-
Diff: 2624 lines (+1361/-289)42 files modified.gitignore (+11/-0)
Makefile (+3/-1)
cloudinit/atomic_helper.py (+6/-0)
cloudinit/net/__init__.py (+136/-3)
cloudinit/net/tests/test_init.py (+327/-0)
cloudinit/sources/DataSourceEc2.py (+1/-1)
cloudinit/sources/DataSourceOVF.py (+20/-1)
cloudinit/sources/DataSourceOracle.py (+61/-1)
cloudinit/sources/helpers/vmware/imc/guestcust_error.py (+1/-0)
cloudinit/sources/helpers/vmware/imc/guestcust_util.py (+37/-0)
cloudinit/sources/tests/test_oracle.py (+147/-0)
cloudinit/tests/helpers.py (+8/-0)
debian/changelog (+20/-0)
dev/null (+0/-13)
doc/rtd/conf.py (+3/-10)
doc/rtd/index.rst (+0/-9)
doc/rtd/topics/availability.rst (+53/-10)
doc/rtd/topics/datasources.rst (+92/-94)
doc/rtd/topics/datasources/altcloud.rst (+12/-11)
doc/rtd/topics/datasources/azure.rst (+2/-1)
doc/rtd/topics/datasources/cloudstack.rst (+1/-1)
doc/rtd/topics/datasources/configdrive.rst (+8/-8)
doc/rtd/topics/datasources/digitalocean.rst (+4/-2)
doc/rtd/topics/datasources/ec2.rst (+6/-5)
doc/rtd/topics/datasources/exoscale.rst (+6/-6)
doc/rtd/topics/datasources/nocloud.rst (+8/-8)
doc/rtd/topics/datasources/opennebula.rst (+15/-15)
doc/rtd/topics/datasources/openstack.rst (+2/-1)
doc/rtd/topics/datasources/smartos.rst (+7/-5)
doc/rtd/topics/debugging.rst (+7/-7)
doc/rtd/topics/dir_layout.rst (+22/-17)
doc/rtd/topics/docs.rst (+84/-0)
doc/rtd/topics/examples.rst (+1/-1)
doc/rtd/topics/faq.rst (+43/-0)
doc/rtd/topics/format.rst (+53/-28)
doc/rtd/topics/merging.rst (+12/-6)
doc/rtd/topics/network-config-format-v2.rst (+9/-9)
tests/unittests/test_datasource/test_ovf.py (+46/-9)
tests/unittests/test_ds_identify.py (+14/-2)
tests/unittests/test_vmware/test_guestcust_util.py (+65/-0)
tools/ds-identify (+1/-2)
tox.ini (+7/-2)
- Chad Smith: Approve
- Dan Watkins: Approve
- Server Team CI bot: Approve (continuous-integration)
-
Diff: 77 lines (+25/-2)2 files modifiedcloudinit/net/__init__.py (+8/-2)
cloudinit/net/tests/test_init.py (+17/-0)
Changed in cloud-init: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in cloud-init: | |
status: | Fix Released → Confirmed |
description: | updated |
Changed in cloud-init (Ubuntu): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Focal): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Jammy): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Kinetic): | |
status: | New → In Progress |
tags: | removed: block-proposed |
tags: |
added: verification-done-bionic verification-done-focal verification-done-jammy verification-done-kinetic removed: verification-needed verification-needed-bionic verification-needed-focal verification-needed-jammy verification-needed-kinetic |
Changed in cloud-init (Ubuntu): | |
status: | In Progress → Fix Released |
I can reproduce this on Azure with advanced networking on 19.2
root@ragged- bond1:~ # python3 open('/ etc/netplan/ 50-cloud- init.yaml' )) for_physdevs( y['network' ]) python3/ dist-packages/ cloudinit/ net/__init_ _.py", line 344, in wait_for_physdevs by_mac( ).keys( ) python3/ dist-packages/ cloudinit/ net/__init_ _.py", line 633, in get_interfaces_ by_mac
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cloudinit import net
>>> import yaml
>>> y = yaml.load(
>>> net.wait_
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/
present_macs = get_interfaces_
File "/usr/lib/
(name, ret[mac], mac))
RuntimeError: duplicate mac found! both 'enP1s1' and 'eth0' have mac '00:0d:3a:6c:d9:80'
Looking at the sriov device, the sysfs attributes include a 'master' pointing to eth0, so I think we can reasonably ignore devices which have the 'master' which is related to device bonding.
root@ragged- bond1:/ usr/lib/ python3/ dist-packages# diff -u cloudinit/ net/__init_ _.py.orig cloudinit/ net/__init_ _.py net/__init_ _.py.orig 2019-09-16 21:15:42.550376776 +0000 net/__init_ _.py 2019-09-16 21:18:26.178760942 +0000 exists( sys_dev_ path(devname, "bonding"))
--- cloudinit/
+++ cloudinit/
@@ -109,6 +109,10 @@
return os.path.
+def has_master_ attr(devname) : exists( sys_dev_ path(devname, path='master')) devname) :
continue
continue attr(name) : mac(name)
+ return os.path.
+
+
def is_renamed(
"""
/* interface name assignment types (sysfs name_assign_type attribute) */
@@ -661,6 +665,9 @@
if is_bond(name):
+ if has_master_
+ LOG.debug('Skipping device %s with "master" sysfs attriute', name)
+ continue
mac = get_interface_
# some devices may not have a mac (tun0)
if not mac: