issue with double mac addresses on azure/ib0 devices

Bug #1874544 reported by Richard Harding
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
High
Unassigned

Bug Description

Salesforce: https://canonical.my.salesforce.com/5003z000022ssHs

Reproducer:
After launching the HC44rs instance with a standard Bionic image, install the rdma-core package, and reboot.

Error:
The following error message will appear on the console and ib0 and eth1 will both be unconfigured:
https://pastebin.canonical.com/p/VfkNGc58qd/

Revision history for this message
Dan Watkins (oddbloke) wrote :

The traceback (from the Canonical-internal pastebin) is:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 653, in status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
    init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 701, in apply_network_config
    net.wait_for_physdevs(netcfg)
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 519, in wait_for_physdevs
    present_macs = get_interfaces_by_mac().keys()
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 803, in get_interfaces_by_mac
    return get_interfaces_by_mac_on_linux()
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 842, in get_interfaces_by_mac_on_linux
    (name, ret[mac], mac))
RuntimeError: duplicate mac found! both 'eth1' and 'ib0' have mac '00:15:5d:33:ff:13'

Revision history for this message
Ryan Harper (raharper) wrote :

we've seen this before and my open questions are:

1) is there an Infiniband standard about which fields of the IB physical address (which is much bigger than MAC) that correspond to ethernet over IB mac ? we have one implementation in tree, but it's not clear if that's the correct value to be using or not

2) is the IB physical address created by the platform? do they intentially embed the same MAC of another nic in the IB interface ?

I'd really like to understand why this happens in a VM which they're SRIOV passing in these devices.

That said, we've discussed in the past to start *warning* there's a duplicate mac in cloud-init and then just see what happens.

description: updated
information type: Private → Public
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I was able to get an answer to question 1 in comment #3:

In Linux implementation, the link address for IPoIB has 20 bytes:
1st 4 bytes: the QP number and flags for IPoIB device
next 8 bytes: the subnet prefix for the Infiniband device
next 8 bytes: the device port ID for the Infiniband device

The last 8 bytes should be stable for an IPoIB device. But I'm not aware of any Infiniband standard that defines this format.

For question 2:
As far as I know we don’t create IB MAC addresses on the platform side and I don’t think they follow the MAC of another NIC/port. I am waiting for additional responses for this question. I'll post them here when I get them.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

When Hyper-V surfaces the VF via SR-IOV, it uses the same MAC address as the synthetic vNIC. In the Ethernet case, it combines the VF and the synthetic vNIC so you only see one interface in the VM, this also allows them to fallback to synthetic path if the VF stops working. In the IB case, hyper-v doesn’t support combining the VF and the synthetic vNIC thus there are two interfaces in the VM with the same MAC address. This has been the case from day 1 for IB SR-IOV.

So, “Is the IB physical address created by the platform?” – yes, when Hyper-V creates the VF, it copies the MAC address from the synthetic vNIC and assigns it to the VF.

“Do they intentionally embed the same MAC of another nic in the IB interface?” – yes, this is by hyper-v design.

Revision history for this message
Ryan Harper (raharper) wrote :

> When Hyper-V surfaces the VF via SR-IOV, it uses the same MAC address
> as the synthetic vNIC. In the Ethernet case, it combines the VF and
> the synthetic vNIC so you only see one interface in the VM, this also
> allows them to fallback to synthetic path if the VF stops working. In
> the IB case, hyper-v doesn’t support combining the VF and the
> synthetic vNIC thus there are two interfaces in the VM with the same
> MAC address. This has been the case from day 1 for IB SR-IOV.

Thanks for the backgound. It seems like duplicating the MAC was
designed for ethernet around the kernel "auto bonding" to the SRIOV
device. Since HyperV lacks support for "auto bonding" the IB device
this design is suboptimal; OSes generally don't expect duplicate MACs
outside of bonds. You're likely to run into other software which is
going to complain/not work due to this. Something to think on long
term.

> So, “Is the IB physical address created by the platform?” – yes,
when Hyper-V creates the VF, it copies the MAC address from the
synthetic vNIC and assigns it to the VF.

Do you know if the VF for the IB device MAC is user-space modifiable?

Also, do you know of IMDS output includes network configuration about
the VF/IB device?

> “Do they intentionally embed the same MAC of another nic in the IB
interface?” – yes, this is by hyper-v design.

I'm wondering if cloud-init should look at a different portion of the
ib device 'address'. Since we have existing code to look at this
location of the ib address, it seems at least, by convention that we
look at this location, however the platform (HyperV) has duplicated
the MAC for some reason that other platforms do not.

I don't think cloud-init is looking at the wrong location; rather
a HyperV/Azure platform implementation detail is leaking into the VM.
I think we'd take a path where we ignore these duplicate MACs for
IB devices.

Lastly, do we have have instructions on where/how to launch such
an instance on Azure to reproduce?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The VF’s IB GID is derived from the MAC address. I don’t know the implication of modifying the MAC address even if it’s modifiable.

Can you elaborate on what the IMDS output is?

Mellanox is looking into creating a driver for us so that we can expose the VF using DDA instead of using the vSwitch. This will eliminate the duplicate MAC issue for sure, but there is no estimate on when this would be released.

There is info on how to reproduce this issue in the Salesforce case.

Revision history for this message
Ryan Harper (raharper) wrote :

> The VF’s IB GID is derived from the MAC address. I don’t know the
> implication of modifying the MAC address even if it’s modifiable.

I'm not sure either, but was hoping that we could detect this for IB
and change it.

> Can you elaborate on what the IMDS output is?

Azure's Instance MetaData Service which includes network
configuration information (among other things)

https://docs.microsoft.com/en-us/azure/virtual-machines/linux/instance-metadata-service

https://docs.microsoft.com/en-us/azure/virtual-machines/linux/instance-metadata-service#network-metadata

{
  "interface": [
    {
      "ipv4": {
        "ipAddress": [
          {
            "privateIpAddress": "10.1.0.4",
            "publicIpAddress": "X.X.X.X"
          }
        ],
        "subnet": [
          {
            "address": "10.1.0.0",
            "prefix": "24"
          }
        ]
      },
      "ipv6": {
        "ipAddress": []
      },
      "macAddress": "000D3AF806EC"
    }
  ]
}

> Mellanox is looking into creating a driver for us so that we can expose the
> VF using DDA instead of using the vSwitch. This will eliminate the duplicate
> MAC issue for sure, but there is no estimate on when this would be released.

OK, ideally another MAC would be allocated to the ib interface instead of
duplicating the hvnet ethernet MAC.

> There is info on how to reproduce this issue in the Salesforce case.

OK

tags: added: id-5e3c4c9ce38502288bd97d44
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Are there suggestions on the next steps for this bug?

Revision history for this message
Anh Vo (MSFT) (vtqanh) wrote :

@Ryan Harper: IMDS does not return information about the infiniband interface. It does return information on the Vnic that the infiniband interface is associated with (eth1 in this case).

>OSes generally don't expect duplicate MACs
>outside of bonds. You're likely to run into other software which is
>going to complain/not work due to this. Something to think on long
>term.
I agree it's not an ideal SRIOV implementation by Hyper-V, but in practice it doesn't really have any issue. The "MAC address" of the IB interface isn't used for anything because IB isn't routable. There are also not a lot of software that leverage that deal with RDMA and IB. I suspect most of the ones that do will not care about Mac address of IB.

I am curious as to how cloud-init got the 6 bytes IB MAC address, because in my VM I am seeing a 20 bytes IPoIB mac address from the ip link show command.

This is what it looked like in my test deployment:
root@anhtest-10769:/home/azlinux# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0d:3a:c6:6e:29 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:15:5d:33:ff:14 brd ff:ff:ff:ff:ff:ff
5: ib0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 20:00:09:28:fe:80:00:00:00:00:00:00:00:15:5d:ff:fd:33:ff:14 brd 00:ff:ff:ff:ff:12:40:1b:80:0b:00:00:00:00:00:00:ff:ff:ff:ff

This is the network portion of the IMDS output:
  "network": {
    "interface": [
      {
        "ipv4": {
          "ipAddress": [
            {
              "privateIpAddress": "10.0.0.5",
              "publicIpAddress": "52.175.238.35"
            }
          ],
          "subnet": [
            {
              "address": "10.0.0.0",
              "prefix": "24"
            }
          ]
        },
        "ipv6": {
          "ipAddress": []
        },
        "macAddress": "000D3AC66E29"
      },
      {
        "ipv4": {
          "ipAddress": [],
          "subnet": []
        },
        "ipv6": {
          "ipAddress": []
        },
        "macAddress": "00155D33FF14"
      }
    ]
  }

1) How is cloud-init getting the Mac address for ib0? Should it use the full 20 byte address when comparing?
2) It should be safe to ignore the Mac address, if cloud-init can determine that the interface is backed by an infiniband device

Revision history for this message
Ryan Harper (raharper) wrote :

>1) How is cloud-init getting the Mac address for ib0?
> Should it use the full 20 byte address when comparing?

cloudinit/net/__init__.py:get_ib_interface_hwaddr()

    """Returns the string value of an Infiniband interface's hardware
    address. If ethernet_format is True, an Ethernet MAC-style 6 byte
    representation of the address will be returned.
    """

In early comments I asked if there was a systematic way of extracting
the portion of an IB HWAddr which corresponds to the Ethernet over IB
layer which can run and I have not found an answer to that yet.

The origin of the Ethernet vs IB style mac comes from here:

https://github.com/canonical/cloud-init/commit/e7b0e5f72e134779cfe22cd07b09a42c22d2bfe1

   OpenStack ironic references Infiniband interfaces via a 6 byte 'MAC
   address' formed from bytes 13-15 and 18-20 of interface's hardware
   address. This address is used as the ethernet_mac_address of Infiniband
   links in network_data.json in configdrives generated by OpenStack nova.
   We can use this address to map links in network_data.json to their
   corresponding interface names.

   When generating interface configuration files, we need to use the
   interface's full hardware address as the HWADDR, rather than the 6 byte
   MAC address provided by network_data.json.

   This change allows IB interfaces to be referenced in this dual mode - by
   MAC address and hardware address, depending on the context.

It appears that networking-wise, use of the full IB mac is required, however
configuration metadata tends to be unaware of IB vs. Ethernet over IB where
the subsection of the HWAddr is used as an Ethernet MAC.

>2) It should be safe to ignore the Mac address, if cloud-init can determine that the interface is backed by an infiniband device

On Azure, yes. Elsewhere I don't think we can say that.

Moving forward, it may be best to have cloud-init ignore duplicate macs;
possibly scoping that to specific platforms. In Azure where it's a design
point it may be better to ignore duplicate macs rather than suggest that
there's something wrong with the networking devices.

Revision history for this message
Chad Smith (chad.smith) wrote :

Part if this issue (applying network definitions to eth0 (non-IB) only) is in flight by Johnsonshi
in this PR https://github.com/canonical/cloud-init/pull/549 which whitelists the non-IB interface driver as the netplan config emitted based on reading Azure IMDS.

We will still need to sort device renaming login in cloudinit/net/__init__.py to make sure we don't rename IB interfaces too

Revision history for this message
Chad Smith (chad.smith) wrote :

Part 1 of this issue (networking on non-IB interfaces) landed today in an upstream commit
https://github.com/canonical/cloud-init/commit/43164902dc97cc0c51ca1b200fa09c9303a4beee

Dan Watkins (oddbloke)
Changed in cloud-init:
assignee: Dan Watkins (oddbloke) → nobody
Revision history for this message
James Falcon (falcojr) wrote :

Because of the commit landed, we believe this issue is fixed. If there is still more to address, please set status back to new explaining the issue.

Changed in cloud-init:
status: Triaged → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.