RuntimeError: duplicate mac found! both 'ens4' and 'bond0' have mac '9c:XX:XX:46:5d:91'

Bug #1812857 reported by Stanislav Makar on 2019-01-22
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Unassigned

Bug Description

2019-01-22 13:58:22,667 - util.py[DEBUG]: failed stage init
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 658, in status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
    init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 648, in apply_network_config
    netcfg, src = self._find_networking_config()
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 635, in _find_networking_config
    if self.datasource and hasattr(self.datasource, 'network_config'):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceConfigDrive.py", line 155, in network_config
    self.network_json, known_macs=self.known_macs)
  File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/openstack.py", line 655, in convert_net_json
    known_macs = net.get_interfaces_by_mac()
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 595, in get_interfaces_by_mac
    (name, ret[mac], mac))
RuntimeError: duplicate mac found! both 'ens4' and 'bond0' have mac '9c:XX:XX:46:5d:91'

Net config:
2019-01-22 13:56:20,055 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'mtu': 1500, 'type': 'bond', 'subnets': [{'type': 'dhcp4'}], 'params': {'mac_address': '9c:XX:XX:46:5d:91', 'bond_up-delay': 250000, 'bond_mimon': 100, 'bond_xmit_hash_policy': 'layer2+3', 'bond_mode': '802.3ad'}, 'name': 'bond0', 'bond_interfaces': ['ens4', 'e
ns4d1']}, {'type': 'physical', 'mtu': 1500, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'f4:XX:XX:44:6f:f0', 'name': 'eno1'}, {'type': 'physical', 'subnets': [], 'mac_address': '9c:XX:XX:46:5d:91', 'name': 'ens4'}, {'type': 'physical', 'subnets': [], 'mac_address': '9c:XX:XX:46:5d:92', 'name': 'ens4d1'}, {'type': 'nameserver', 'address': '8.8.8.8'}]}

OS: Ubuntu 18.04.1 LTS
cloud-init: 18.4-0ubuntu1~18.04.1
cloud-provider: OpenStack
Datasource: ConfigDrive

Related branches

Stanislav Makar (smakar) on 2019-01-22
summary: RuntimeError: duplicate mac found! both 'ens4' and 'bond0' have mac
- '9c:dc:71:46:5d:91'
+ '9c:XX:XX:46:5d:91'
Stanislav Makar (smakar) wrote :

Usually bonds inherit mac address from one of
physical interface

This PR fixes the problem https://github.com/cloud-init/cloud-init/pull/19

Ryan Harper (raharper) wrote :

I think I understand what happened here. If you could please run and attach output of:

journalctl -b 0 -o short-monotonic \
    -u cloud-init-local.service \
    -u systemd-networkd \
    -u systemd-networkd-wait-online \
    -u cloud-init.service

When first booted, the bonds do not exist. Cloud-init brings up one interface
(physical) and dhcp's for crawling metadata and finds a network_data.json
which includes bonding configuration. cloud-init-local.service will convert
and render this as netplan yaml (writes to /etc/netplan/50-cloud-init.yaml)
and then exit. Next, systemd-networkd will run and apply the network config
which includes a bond0 which enslaves ens4 (and uses the nic's mac). Now that
networking is online, cloud-init.service runs and when it resumes it runs
apply_network_config once more (which attempts to ensure nics are named as per
config) and this is where cloud-init detects the duplicate mac.

I thought we had a bug open that was meant to address applying networking twice
on OpenStack Datasource which would also prevent this issue. I'll see if I can
find that.

All said, I do agree that we should ignore 'mac' duplicates on bonds.

Changed in cloud-init:
importance: Undecided → Medium
status: New → Confirmed
Stanislav Makar (smakar) wrote :
Download full text (14.8 KiB)

journalctl -b 0 -o short-monotonic -u cloud-init-local.service -u systemd-networkd -u systemd-networkd-wait-online -u cloud-init.service

[ 69.737278] ubuntu systemd[1]: Starting Initial cloud-init job (pre-networking)...
[ 70.927238] ironic-ubuntu1804-uefi-2502 cloud-init[952]: Cloud-init v. 18.4-0ubuntu1~18.04.1 running 'init-local' at Mon, 25 Feb 2019 09:20:29 +0000. Up 70.50 seconds.
[ 70.972962] ironic-ubuntu1804-uefi-2502 systemd[1]: Started Initial cloud-init job (pre-networking).
[ 72.721258] ironic-ubuntu1804-uefi-2502 systemd[1]: Starting Network Service...
[ 72.748539] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: bond0: netdev ready
[ 72.750697] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: Enumeration completed
[ 72.751210] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno3: Link is not managed by us
[ 72.751301] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: ens4: Link is not managed by us
[ 72.751351] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: ens4d1: Link is not managed by us
[ 72.751391] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno1: Link is not managed by us
[ 72.751445] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: lo: Link is not managed by us
[ 72.751489] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno4: Link is not managed by us
[ 72.751529] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno2: Link is not managed by us
[ 72.751573] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: bond0: IPv6 successfully enabled
[ 72.772300] ironic-ubuntu1804-uefi-2502 systemd[1]: Started Network Service.
[ 72.843477] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno3: Link is not managed by us
[ 72.843560] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: lo: Link is not managed by us
[ 72.843597] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno4: Link is not managed by us
[ 72.843626] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno2: Link is not managed by us
[ 72.843670] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: eno1: IPv6 successfully enabled
[ 72.901979] ironic-ubuntu1804-uefi-2502 systemd[1]: Starting Wait for Network to be Configured...
[ 72.971969] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 72.972353] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 72.972447] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 72.972485] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 72.972517] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 73.000549] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 73.024058] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: ens4d1: Gained carrier
[ 73.024144] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: ens4d1: Configured
[ 73.024429] ironic-ubuntu1804-uefi-2502 systemd-networkd[1634]: ens4d1: Configured
[ 73.024520] ironic-ubuntu1804-uefi-2502 systemd-networkd-wait-online[1698]: ignoring: lo
[ 73.024801] ironic-ubuntu1804-uefi-2502 systemd-ne...

Stanislav Makar (smakar) wrote :

@raharper according to logs bond0 exists at the beginning

Ryan Harper (raharper) wrote :

Thanks for the logs. Yes, bond0 as a device exists as soon as bond module is loaded; the critical bit is that the kernel has enslaved one of the interfaces and then takes the underlying device's MAC address. We can see that in the output:

[ 79.185842] ironic-ubuntu1804-uefi-2502 cloud-init[1738]: ci-info: | bond0 | True | XX.XX.XX.105 | 255.255.255.192 | global | e0:07:1b:71:67:b1 |
[ 79.185842] ironic-ubuntu1804-uefi-2502 cloud-init[1738]: ci-info: | ens4 | True | . | . | . | e0:07:1b:71:67:b1 |

Stanislav Makar (smakar) wrote :

It's correct and expected behaviour
It's exactly what I want to have: bond0 uses mac address(9c:XX:XX:46:5d:91) of one physical interface.

Look at my config:
{'version': 1, 'config': [{'mtu': 1500, 'type': 'bond', 'subnets': [{'type': 'dhcp4'}], 'params': {'mac_address': '9c:XX:XX:46:5d:91', 'bond_up-delay': 250000, 'bond_mimon': 100, 'bond_xmit_hash_policy': 'layer2+3', 'bond_mode': '802.3ad'}, 'name': 'bond0', 'bond_interfaces': ['ens4', 'ens4d1']},

 {'type': 'physical', 'subnets': [], 'mac_address': '9c:XX:XX:46:5d:91', 'name': 'ens4'},
 {'type': 'physical', 'subnets': [], 'mac_address': '9c:XX:XX:46:5d:92', 'name': 'ens4d1'},]}

Laurent Sauvé (lanord) wrote :
Download full text (6.9 KiB)

Also affected by this bug.

OS: Ubuntu 16.04.6 LTS
cloud-init: 19.1-1-gbaa47854-0ubuntu1~16.04.1
cloud-provider: OpenStack
Datasource: ConfigDrive

journalctl -b 0 -o short-monotonic \
> -u cloud-init-local.service \
> -u systemd-networkd \
> -u systemd-networkd-wait-online \
> -u cloud-init.service
-- Logs begin at Wed 2019-06-05 13:14:19 UTC, end at Wed 2019-06-05 13:26:12 UTC. --
[ 34.378444] bm-ubuntu-16 systemd[1]: Starting Initial cloud-init job (pre-networking)...
[ 35.003685] bm-ubuntu-16 cloud-init[542]: Cloud-init v. 19.1-1-gbaa47854-0ubuntu1~16.04.1 running 'init-local' at Wed, 05 Jun 2019 13:14:21 +0000. Up 34.75 seconds.
[ 35.053762] bm-ubuntu-16 systemd[1]: Started Initial cloud-init job (pre-networking).
[ 35.940521] bm-ubuntu-16 systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
[ 36.255549] bm-ubuntu-16 cloud-init[1656]: Cloud-init v. 19.1-1-gbaa47854-0ubuntu1~16.04.1 running 'init' at Wed, 05 Jun 2019 13:14:23 +0000. Up 36.16 seconds.
[ 36.269838] bm-ubuntu-16 cloud-init[1656]: ci-info: ++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++
[ 36.285065] bm-ubuntu-16 cloud-init[1656]: ci-info: +------------+-------+-----------------+-----------------+--------+-------------------+
[ 36.309208] bm-ubuntu-16 cloud-init[1656]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 36.326229] bm-ubuntu-16 cloud-init[1656]: ci-info: +------------+-------+-----------------+-----------------+--------+-------------------+
[ 36.343773] bm-ubuntu-16 cloud-init[1656]: ci-info: | bond0 | False | . | . | . | 0c:xx:xx:6e:ac:18 |
[ 36.361935] bm-ubuntu-16 cloud-init[1656]: ci-info: | bond0.2199 | False | aaa.bbb.ccc.ddd | 255.255.255.0 | global | fa:xx:xx:1a:d4:73 |
[ 36.381006] bm-ubuntu-16 cloud-init[1656]: ci-info: | bond0.3186 | False | aaa.bbb.ccc.ddd | 255.255.255.0 | global | fa:xx:xx:74:b9:aa |
[ 36.400667] bm-ubuntu-16 cloud-init[1656]: ci-info: | enp4s0f0 | False | . | . | . | 0c:xx:xx:6e:ac:18 |
[ 36.420729] bm-ubuntu-16 cloud-init[1656]: ci-info: | enp4s0f1 | False | . | . | . | 0c:xx:xx:6e:ac:18 |
[ 36.441037] bm-ubuntu-16 cloud-init[1656]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
[ 36.461837] bm-ubuntu-16 cloud-init[1656]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 36.482712] bm-ubuntu-16 cloud-init[1656]: ci-info: +------------+-------+-----------------+-----------------+--------+-------------------+
[ 36.504400] bm-ubuntu-16 cloud-init[1656]: ci-info: ++++++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++++++
[ 36.515685] bm-ubuntu-16 cloud-init[1656]: ci-info: +-------+-----------------+-----------------+-----------------+------------+-------+
[ 36.527134] bm-ubuntu-16 cloud-init[1656]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
[ 36.549497] bm-ubuntu-16 cloud-init[1656]: ci...

Read more...

This bug is fixed with commit e5f54213 to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=e5f54213

Changed in cloud-init:
status: Confirmed → Fix Committed

This bug is believed to be fixed in cloud-init in version 19.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Danno B (slikk66) wrote :
Download full text (4.3 KiB)

Hi, we're still being affected by this on Azure with 19.2-24-ge7881d5c-0ubuntu1~18.04.1 - using PACKER to build from image: BuildSource : Marketplace/Canonical/UbuntuServer/18.04-DAILY-LTS

Here is the packer config:
````
    "provisioners": [
        {
          "type": "shell",
          "inline": [
            "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done"
          ]
        },
        {
            "type": "ansible",
            "playbook_file": "{{user `ansible_playbook`}}",
            "user": "packer",
            "extra_arguments": [ "--extra-vars", "codeVersion={{user `code_version`}} managed_image_name={{user `managed_image_name`}}" ]
        },
        {
            "type": "shell",
            "execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
            "inline_shebang": "/bin/sh -x",
            "inline": [ "/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync" ]
    }]
````

Here is the playbook:
````
---
- hosts: all
  remote_user: ubuntu
  become: yes
  become_method: sudo
  become_user: root

  environment:
    DEBIAN_FRONTEND: noninteractive
````

Note: we are applying `enableAcceleratedNetworking: true` to the NIC, anecdotally we think this is related.

Usually our playbook has more in it (obviously) but Azure kept pointing fingers at us that our image was causing the problem, so I ran this test simply deploying a blank deprovisioned image via our same process.

And here's what happens on the serial console log:

````
[ 20.337603] sh[910]: + [ -e /var/lib/cloud/instance/obj.pkl ]
[ 20.343177] sh[910]: + echo cleaning persistent cloud-init object
[ 20.349027] [ OK ] Started Network Time Synchronization.
[ OK ] Reached target System Time Synchronized.
sh[910]: cleaning persistent cloud-init object
[ 20.361066] sh[910]: + rm /var/lib/cloud/instance/obj.pkl
[ 20.412333] sh[910]: + exit 0
[ 34.282291] cloud-init[938]: Cloud-init v. 19.2-24-ge7881d5c-0ubuntu1~18.04.1 running 'init-local' at Mon, 16 Sep 2019 18:02:23 +0000. Up 32.02 seconds.
[ 34.288809] cloud-init[938]: 2019-09-16 18:02:25,262 - util.py[WARNING]: failed stage init-local
[ 34.423057] cloud-init[938]: failed run of stage init-local
[ 34.437716] cloud-init[938]: ------------------------------------------------------------
[ 34.441088] cloud-init[938]: Traceback (most recent call last):
[ 34.443719] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 653, in status_wrapper
[ 34.448072] cloud-init[938]: ret = functor(name, args)
[ 34.450532] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
[ 34.454849] cloud-init[938]: init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
[ 34.458725] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 697, in apply_network_config
[ 34.463421] cloud-init[938]: net.wait_for_physdevs(netcfg)
[ 34.466051] cloud-init[938]: File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 344, in wait_for_physdevs
[ 34.470673] cloud-init[93...

Read more...

Danno B (slikk66) wrote :

I am not able to change the state as requested by Ryan, I see message "Only changeable by a project maintainer or bug supervisor".

I have a Severity A ticket open with Azure now as well, they pointed me here and asked me to update to >= 19.2 as a fix.

Ryan Harper (raharper) wrote :

Hi Danno,

Thanks for updating this bug. While you're seeing a similar issue, the Azure Advanced Networking is slightly different (it's really a bond, but not exposed to userspace as linux bond type). This bug was about regular linux bond devices which needed to be excluded.

Let's move your issue to a new bug where cloud-init will need to sort out how to ignore the duplicate macs between the sriov nic and eth0 (hyperv nic) which get autobonded in the kernel but don't advertise that fact via normal bonding methods).

I've filed https://bugs.launchpad.net/cloud-init/+bug/1844191 to track your issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers