Bond with OVS bridging RuntimeError: duplicate mac found!

Bug #1912844 reported by David Ames
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
Undecided
Dan Watkins

Bug Description

When using bonds and OVS bridging cloud-init fails with

2021-01-22 18:44:08,094 - util.py[WARNING]: failed stage init
failed run of stage init
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 653, in status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
    init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 699, in apply_network_config
    self.distro.networking.wait_for_physdevs(netcfg)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/networking.py", line 147, in wait_for_physdevs
    present_macs = self.get_interfaces_by_mac().keys()
  File "/usr/lib/python3/dist-packages/cloudinit/distros/networking.py", line 75, in get_interfaces_by_mac
    return net.get_interfaces_by_mac(
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 769, in get_interfaces_by_mac
    return get_interfaces_by_mac_on_linux(
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 839, in get_interfaces_by_mac_on_linux
    raise RuntimeError(
RuntimeError: duplicate mac found! both 'br-ex.100' and 'br-ex' have mac 'e2:86:e6:60:4c:44'

snap-id: shY22YTZ3RhJJDOj0MfmShTNZTEb1Jiq
tracking: 2.9/candidate
refresh-date: 3 days ago, at 20:03 UTC
channels:
  2.9/stable: 2.9.1-9153-g.66318f531 2021-01-19 (11322) 150MB -
  2.9/candidate: ↑
  2.9/beta: ↑
  2.9/edge: 2.9.1-9156-g.fe186aec0 2021-01-21 (11371) 150MB -
  latest/stable: –
  latest/candidate: –
  latest/beta: –
  latest/edge: 2.10.0~alpha1-9367-g.e3a85359d 2021-01-22 (11396) 151MB -
  2.8/stable: 2.8.2-8577-g.a3e674063 2020-09-01 (8980) 140MB -
  2.8/candidate: 2.8.3~rc1-8583-g.9ddc8051f 2020-11-19 (10539) 137MB -
  2.8/beta: 2.8.3~rc1-8583-g.9ddc8051f 2020-11-19 (10539) 137MB -
  2.8/edge: 2.8.3~rc1-8587-g.0ebf4fb25 2021-01-07 (11161) 139MB -
  2.7/stable: 2.7.3-8290-g.ebe2b9884 2020-08-21 (8724) 144MB -
  2.7/candidate: ↑
  2.7/beta: ↑
  2.7/edge: 2.7.3-8294-g.85233d83e 2020-11-03 (10385) 143MB -
installed: 2.9.1-9153-g.66318f531 (11322) 150MB -

Revision history for this message
David Ames (thedac) wrote :
Revision history for this message
David Ames (thedac) wrote :

Screen capture of the network config in MAAS

Revision history for this message
David Ames (thedac) wrote :
Revision history for this message
Dan Watkins (oddbloke) wrote :

This is the network config, pulled out of the log file:

bonds:
  bond0:
    interfaces:
    - enp1s0
    - enp7s0
    macaddress: 52:54:00:28:fd:fd
    mtu: 1500
    parameters:
      down-delay: 0
      gratuitious-arp: 1
      mii-monitor-interval: 0
      mode: active-backup
      transmit-hash-policy: layer2
      up-delay: 0
bridges:
  br-ex:
    addresses:
    - 192.168.151.18/24
    gateway4: 192.168.151.1
    interfaces:
    - bond0
    macaddress: 52:54:00:28:fd:fd
    mtu: 1500
    nameservers:
      addresses:
      - 192.168.151.5
      search:
      - maas
    openvswitch: {}
    parameters:
      forward-delay: 15
      stp: false
ethernets:
  enp1s0:
    match:
      macaddress: 52:54:00:28:fd:fd
    mtu: 1500
    set-name: enp1s0
  enp7s0:
    match:
      macaddress: 52:54:00:7c:4e:85
    mtu: 1500
    set-name: enp7s0
version: 2
vlans:
  br-ex.100:
    dhcp4: true
    id: 100
    link: br-ex
    mtu: 1500

Changed in cloud-init (Ubuntu):
assignee: nobody → Dan Watkins (oddbloke)
Revision history for this message
David Ames (thedac) wrote :

Cloud init version:

$ dpkg -l |grep cloud-ini
ii cloud-init 20.4.1-0ubuntu1~20.04.1 all initialization and customization tool for cloud instances
ii cloud-initramfs-copymods 0.45ubuntu1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.45ubuntu1 all write a network interface file in /run for BOOTIF

Revision history for this message
Dan Watkins (oddbloke) wrote :

The network config below closely reproduces the issue when a LXD VM is launched with it, has openvswitch-switch installed on it (e.g. via a manual DHCP on enp5s0), and is `cloud-init clean --logs --reboot`ed.

The log does not contain the error message, but calling `cloudinit.net.get_interfaces_by_mac()` from a Python console does trigger it.

If the vlans definition is removed, the instance comes up with networking (after the same reboot process).

MAC_ADDRESS = "de:ad:be:ef:12:34"
NETWORK_CONFIG = """\
bonds:
    bond0:
        interfaces:
            - enp5s0
        macaddress: {0}
        mtu: 1500
bridges:
        ovs-br:
            interfaces:
            - bond0
            macaddress: {0}
            mtu: 1500
            openvswitch: {{}}
ethernets:
    enp5s0:
      mtu: 1500
      set-name: enp5s0
      match:
          macaddress: {0}
version: 2
vlans:
  ovs-br.100:
    dhcp4: true
    id: 100
    link: ovs-br
    mtu: 1500
""".format(MAC_ADDRESS)

Revision history for this message
Dan Watkins (oddbloke) wrote :

Also of note: the MAC address being reported as duplicated in both the reported log and in the exception I see is not present in the specified configuration. It's presumably being generated by OVS and applied to ovs-br (and therefore inherited by ovs-br.100?). I'm going to see if a more minimal vlans configuration reproduces.

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, I have a suspicion of what's going on here. I've compared two systems: one launched with the network config above (and updated/rebooted), and one launched with that config minus the "openvswitch: {{}}" line.

When I compare /sys/class/net/ovs-br.100/addr_assign_type in the two systems, I see that on the OpenVSwitch-enabled system, it is 3 ("set using dev_set_mac_address") whereas in the non-OVS system, it is 2 ("stolen from another device"). (Descriptions of those values come from https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net.)

The function which raises the exception (get_interfaces_by_mac_on_linux[0]) calls get_interfaces[1] to get the list of interfaces it should consider. get_interfaces will exclude all interfaces with an addr_assign_type of 2 (via interface_has_own_mac[2]). It will also explicitly exclude VLANs (via is_vlan[3], which checks for DEVTYPE=vlan in /sys/class/net/<iface>/uevent): this check is also not triggered because the /uevent on the OVS system does not have DEVTYPE=vlan in it.

I'm not particularly familiar with OVS: is it somehow expected that this VLAN will not be presented via /sys/class/net as other VLANs are? Or does this suggest that there's a bug in something below cloud-init which isn't correctly configuring this VLAN (which cloud-init then cannot detect as a VLAN, and so fails)?

[0] https://github.com/canonical/cloud-init/blob/master/cloudinit/net/__init__.py#L831
[1] https://github.com/canonical/cloud-init/blob/master/cloudinit/net/__init__.py#L855
[2] https://github.com/canonical/cloud-init/blob/master/cloudinit/net/__init__.py#L523
[3] https://github.com/canonical/cloud-init/blob/master/cloudinit/net/__init__.py#L268

(As an aside: both my understanding of the problem and a test of a revert suggest that this isn't a regression caused by https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1912844.)

Revision history for this message
David Ames (thedac) wrote :

I am not sure I have any definitivie answers but here are my thoughts.

Compare a VLAN device created with `ip link add`

ip link add link enp6s0 name enp6s0.100 type vlan id 100

cat /sys/class/net/enp6s0.100/uevent
DEVTYPE=vlan
INTERFACE=enp6s0.100
IFINDEX=3

To an OVS VLAN interface created with ovs-vsctl:

ovs-vsctl add-port br-ex vlan100 tag=200 -- set Interface vlan100 type=internal

cat /sys/class/net/br-ex.100/uevent
INTERFACE=br-ex.100
IFINDEX=7

I suspect this is down to the tooling. OVS is creating virtual devices so it may not be what `ip link` would create.

Could the `is_vlan` function check for the '.' followed by an integer which is the indication of a VLAN in all cases?

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for doing most of the digging here @Oddbloke; I suspect as with bond and bridges for ovs, we'll need a special case to check if a vlan entry is also OVS, much like we did for bonds/bridges:

https://github.com/canonical/cloud-init/pull/608/files

So our is_vlan change will need to see if link device is OVS and if so then say it's a vlan as well (since the DEVTYPE doesn't match) or something to that effect.

Revision history for this message
Dan Watkins (oddbloke) wrote : Re: [Bug 1912844] Re: Bond with OVS bridging RuntimeError: duplicate mac found!

On Fri, Jan 22, 2021 at 10:48:56PM -0000, David Ames wrote:
> I am not sure I have any definitivie answers but here are my thoughts.
>
> Compare a VLAN device created with `ip link add`
>
> ip link add link enp6s0 name enp6s0.100 type vlan id 100
>
> cat /sys/class/net/enp6s0.100/uevent
> DEVTYPE=vlan
> INTERFACE=enp6s0.100
> IFINDEX=3
>
>
> To an OVS VLAN interface created with ovs-vsctl:
>
> ovs-vsctl add-port br-ex vlan100 tag=200 -- set Interface vlan100
> type=internal
>
> cat /sys/class/net/br-ex.100/uevent
> INTERFACE=br-ex.100
> IFINDEX=7

Thanks for isolating how these devices are created!

> I suspect this is down to the tooling. OVS is creating virtual devices
> so it may not be what `ip link` would create.

I don't believe that cloud-init has changed anything in this area, so I
would still like confirmation that this is a case that cloud-init
definitely needs to workaround. (Rather than it being the case that
there's an underlying bug which, being involved in networking early in
boot, we are the first to encounter.)

> Could the `is_vlan` function check for the '.' followed by an integer
> which is the indication of a VLAN in all cases?

Using device names is generally not reliable. In this specific case,
with this config passed to a LXD VM:

bridges:
        br.100:
            dhcp4: true
            interfaces:
            - enp5s0
            macaddress: 52:54:00:d9:08:1c
            mtu: 1500
ethernets:
    enp5s0:
      mtu: 1500
version: 2

it comes up with working networking and I see:

# cat /sys/class/net/br.100/uevent
DEVTYPE=bridge
INTERFACE=br.100
IFINDEX=3

(While you (and I) can certainly question the wisdom of naming a non-VLAN
like this, cloud-init's code cannot assume that there aren't users out
there doing this, for whatever reason.)

Revision history for this message
Dan Watkins (oddbloke) wrote :

On Fri, Jan 22, 2021 at 10:51:25PM -0000, Ryan Harper wrote:
> Thanks for doing most of the digging here @Oddbloke; I suspect as with
> bond and bridges for ovs, we'll need a special case to check if a vlan
> entry is also OVS, much like we did for bonds/bridges:
>
> https://github.com/canonical/cloud-init/pull/608/files
>
> So our is_vlan change will need to see if link device is OVS and if so
> then say it's a vlan as well (since the DEVTYPE doesn't match) or
> something to that effect.

Unless I'm missing something, we don't have a network configuration to
reference in these codepaths: get_interfaces only takes a
blacklist_drivers parameter, and is_vlan only takes a devname.

So for the code to work as-architected, I believe we need to be able to
determine that this is a VLAN from examining the system (via
/sys/class/net, most likely) to be able to exclude it in get_interfaces.
As far as I (with my limited networking knowledge) can tell, we can
neither determine that this is a VLAN, nor that this is related to the
ovs-br interface by examining /sys/class/net: while the non-OVS VLAN has
a lower_ link to the bridge interface, the OVS VLAN does not.

Looking at everything in /sys/class/net/<bridge device> (with `for f in
*; do echo $f: $(cat $f); done 2>/dev/null`), here's the diff between
the two systems:

--- not-ovs 2021-01-25 13:15:34.560602978 -0500
+++ ovs 2021-01-25 13:15:23.400407103 -0500
@@ -1,26 +1,25 @@
-addr_assign_type: 2
+addr_assign_type: 3
 addr_len: 6
-address: de:ad:be:ef:12:34
+address: 56:1d:35:09:77:47
 broadcast: ff:ff:ff:ff:ff:ff
 carrier: 1
-carrier_changes: 1
+carrier_changes: 0
 carrier_down_count: 0
-carrier_up_count: 1
+carrier_up_count: 0
 dev_id: 0x0
 dev_port: 0
 dormant: 0
 duplex:
-flags: 0x1003
+flags: 0x1103
 gro_flush_timeout: 0
 ifalias:
-ifindex: 5
-iflink: 4
+ifindex: 6
+iflink: 6
 link_mode: 0
-lower_br:
 mtu: 1500
 name_assign_type: 3
 netdev_group: 0
-operstate: up
+operstate: unknown
 phys_port_id:
 phys_port_name:
 phys_switch_id:
@@ -31,4 +30,4 @@
 subsystem:
 tx_queue_len: 1000
 type: 1
-uevent: DEVTYPE=vlan INTERFACE=br.100 IFINDEX=5
+uevent: INTERFACE=ovs-br.100 IFINDEX=6

With addr_assign_type set to 3, and no DEVTYPE=vlan, and no lower_*
link, I don't see how we can tell that this is a VLAN. (I've checked
and the difference in flags is, if I did my bitmasking correctly,
whether the interface is in promiscuous mode or not.)

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Download full text (4.3 KiB)

Particulars about what role a Open vSwitch port/interface has is unfortunately not exposed through sysfs nor iproute2 tools POV. Open vSwitch and its data path driver would manage that based on configuration stored in Open vSwitch.

If it is sufficient to know that this is a Open vSwitch managed port I guess we could get that from udev, for example:

# udevadm info /sys/class/net/enp6s0.9
P: /devices/virtual/net/enp6s0.9
L: 0
E: DEVPATH=/devices/virtual/net/enp6s0.9
E: INTERFACE=enp6s0.9
E: IFINDEX=7
E: SUBSYSTEM=net
E: USEC_INITIALIZED=229384665
E: ID_MM_CANDIDATE=1
E: ID_NET_NAMING_SCHEME=v245
E: ID_NET_DRIVER=openvswitch
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/enp6s0.9
E: TAGS=:systemd:

If you really need to know the specific role the port has you would have to query Open vSwitch:

Create a Open vSwitch vlan interface:
# ovs-vsctl add-br br0
# ovs-vsctl add-port br0 enp6s0
# ovs-vsctl add-port br0 enp6s0.9 tag=9 -- set interface enp6s0.9 type=internal

Alternatively you could do the same with "fake bridges" which allows for a slightly different CLI experience, I think netplan does the latter. Anyway, the underlying data structures inside Open vSwitch should be similar.

# ovs-vsctl find interface name=enp6s0.9
_uuid : 72ce2c2e-d004-4b07-8c81-c7f688508040
admin_state : down
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 7
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : down
lldp : {}
mac : []
mac_in_use : "d2:27:14:2d:60:88"
mtu : 1500
mtu_request : []
name : enp6s0.9
ofport : 2
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0}
status : {driver_name=openvswitch}
type : internal

# ovs-vsctl find port name=enp6s0.9
_uuid : 2c9e20de-53de-456e-9c32-f1541c9d3982
bond_active_slave : []
bond_downdelay : 0
bond_fake_iface : false
bond_mode : []
bond_updelay : 0
cvlans : []
external_ids : {}
fake_bridge : false
interfaces : [72ce2c2e-d004-4b07-8c81-c7f688508040]
lacp : []
mac : []
name : enp6s0.9
other_config : {}
protected : false
qos : []
rstp_statistics : {}
rstp_status : {}
statistics : {}
status : {}
tag : 9
trunks : []
vlan_mode : []

# ovs-vsctl port-to-br enp6s0.9
br0

# ovs-vsctl find bridge name=br0
_uuid ...

Read more...

Revision history for this message
Frode Nordahl (fnordahl) wrote :

fwiw; Open vSwitch do distribute Python client libraries [0], but they may be a bit complex for this simple use case. We do maintain a SimpleOVSDB interface [1] and there is an example of how it could be used to iterate over stuff [2]. Feel free to use, steal or whatever suits your needs :)

0: https://docs.openvswitch.org/en/latest/topics/language-bindings/
1: https://github.com/juju/charm-helpers/blob/master/charmhelpers/contrib/network/ovs/ovsdb.py
2: https://github.com/juju/charm-helpers/blob/ac1dbb456e9d889e04a9a734323f4b4adf6879a4/charmhelpers/contrib/network/ovs/__init__.py#L633-L668

Revision history for this message
Dan Watkins (oddbloke) wrote :

I can confirm that udev does report the VLAN as OVS-managed:

# udevadm info /sys/class/net/ovs-br.100
P: /devices/virtual/net/ovs-br.100
L: 0
E: DEVPATH=/devices/virtual/net/ovs-br.100
E: INTERFACE=ovs-br.100
E: IFINDEX=5
E: SUBSYSTEM=net
E: USEC_INITIALIZED=4703175
E: ID_MM_CANDIDATE=1
E: ID_NET_NAMING_SCHEME=v245
E: ID_NET_DRIVER=openvswitch
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ovs-br.100
E: TAGS=:systemd:

so this is a feasible approach, at least.

I have reservations about introducing a call to an external program in this codepath; I believe we'd have to call it for every interface (that wasn't excluded via some earlier check), and subprocess calls are much more expensive than reading files. Via local testing with timeit: is_vlan takes ~48.3 usec per loop, a subp of `udevadm info ...` takes ~2.48 msec per loop. This isn't too substantial in most systems, but in systems with more interfaces (or slower `udevadm`?), it could become something of an issue.

I suspect, however, that we can find a way of gating this check on the presence of OVS somehow: if openvswitch-switch is not installed, for example, then there's no reason to check `udevadm info`: the given interface will never be OVS-managed.

What's a reliable, cross-distro way of checking if a system might have OVS-managed interfaces?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Those subprocess calls do indeed add up to be expensive quite quickly, and when I think of it it may actually not be a consistent way of determining if a interface belongs to Open vSwitch in all its configurations. Open vSwitch supports multiple datapath types, and depending on which one you use the interface may or may not be owned by the openvswitch driver.

However, this brings me to something we might use as a consistent cross-distro way of determining whether Open vSwitch is there and has bridges configured.

When Open vSwitch registers a datapath it also creates a virtual port for it, and we could possibly look for that to determine whether Open vSwitch is installed and actually has useful configuration.

The 'system' datapath is the kernel datapath as provided by the openvswitch kernel module, the 'netdev' datapath is used for alternative datapaths such as the Open vSwitch userspace implementation, DPDK, AF_XDP etc.

Example:
# ovs-vsctl show
5ef28194-7376-40f6-9306-1a21b0624079
    Bridge br1
        datapath_type: system
        Port br1
            Interface br1
                type: internal
    Bridge br0
        datapath_type: netdev
        Port br0
            Interface br0
                type: internal
    ovs_version: "2.13.1"

# ls -l /sys/class/net/ovs-*
lrwxrwxrwx 1 root root 0 Feb 4 05:33 /sys/class/net/ovs-netdev -> ../../devices/virtual/net/ovs-netdev
lrwxrwxrwx 1 root root 0 Feb 4 06:06 /sys/class/net/ovs-system -> ../../devices/virtual/net/ovs-system

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks Frode, that's really helpful!

I don't see the `datapath_type` in my output:

e2d9c9b4-739c-4333-a372-4d46585fcfb9
    Bridge ovs-br
        fail_mode: standalone
        Port ovs-br
            Interface ovs-br
                type: internal
        Port bond0
            Interface bond0
        Port ovs-br.100
            tag: 100
            Interface ovs-br.100
                type: internal
    ovs_version: "2.13.1"

but I _do_ see /sys/class/net/ovs-system in the system:

# ls /sys/class/net/ -lah
total 0
drwxr-xr-x 2 root root 0 Feb 9 22:07 .
drwxr-xr-x 31 root root 0 Feb 9 22:07 ..
lrwxrwxrwx 1 root root 0 Feb 9 22:07 bond0 -> ../../devices/virtual/net/bond0
-rw-r--r-- 1 root root 4.0K Feb 9 22:07 bonding_masters
lrwxrwxrwx 1 root root 0 Feb 9 22:07 enp5s0 -> ../../devices/pci0000:00/0000:00:01.4/0000:05:00.0/virtio10/net/enp5s0
lrwxrwxrwx 1 root root 0 Feb 9 22:07 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx 1 root root 0 Feb 9 22:07 ovs-br -> ../../devices/virtual/net/ovs-br
lrwxrwxrwx 1 root root 0 Feb 9 22:07 ovs-br.100 -> ../../devices/virtual/net/ovs-br.100
lrwxrwxrwx 1 root root 0 Feb 9 22:07 ovs-system -> ../../devices/virtual/net/ovs-system

If I launch a separate system, and install openvswitch-switch, /sys/class/net/ovs-system does _not_ appear, which appears to confirm that it will only be present in systems with OVS configuration.

This leads me to some questions: Will /sys/class/net/ovs-system always be present? Or might we encounter systems where only ovs-netdev (or another) is present? If so, is there a defined set we can match on, or do we need to glob match?

James Falcon (falcojr)
Changed in cloud-init (Ubuntu):
status: New → In Progress
Revision history for this message
Frode Nordahl (fnordahl) wrote :

The default `datapath_type` is 'system', so if it is not explicitly specified for a bridge it will not be visible in `ovs-vsctl show` output, but 'system' will still be the `datapath_type` used.

If you configure a system where all bridges have `datapath_type` 'netdev' /sys/class/net/ovs-system will not be there, but /sys/class/net/ovs-netdev will.

You can query Open vSwitch at runtime for which datapath types it supports with `ovs-vsctl get open_vswitch . datapath_types` which would produce '[netdev, system]' as output.

I guess we cannot foresee how this will be plumbed for all future Open vSwitch releases and possible new datapath types, but either `ovs-system` or `ovs-netdev` will always be there in current versions and use cases.

If we want to be opportunistic I guess iterating over the datapath_types list and check for /sys/class/net/ovs-$datapath_type could be a generic approach that may also work for the next datapath_type we currently do not know about.

Revision history for this message
Dan Watkins (oddbloke) wrote :

> The default `datapath_type` is 'system', so if it is not explicitly specified for a bridge it will not be visible in `ovs-vsctl show` output, but 'system' will still be the `datapath_type` used.

Great, I figured it wasn't material.

> You can query Open vSwitch at runtime for which datapath types it supports with `ovs-vsctl get open_vswitch . datapath_types` which would produce '[netdev, system]' as output.

Will `ovs-vsctl` always be available if OVS interfaces might be present on the system? Will it always be in (our systemd units') PATH?

(My _guess_ is that we can't be sure, about the latter at least: /opt/openvswitch/bin/ovs-vsctl doesn't seem like a wildly unreasonable path to install to, and cloud-init has no good way of finding that.)

> If we want to be opportunistic I guess iterating over the datapath_types list and check for /sys/class/net/ovs-$datapath_type could be a generic approach that may also work for the next datapath_type we currently do not know about.

I think this would be ideal, but we can avoid introducing a (runtime-detected, optional) dependency on `ovs-vsctl` if we hardcode the current options: as discussed above, this may even be necessary. Alternatively, we could opt into the more expensive lookup path if we find anything matching /sys/class/net/ovs-*: this may lead to some false positives (e.g. if network configuration specifying non-OVS interfaces named ovs-something is passed to an instance) but (a) such configurations will be extremely rare, and (b) we will only be less performant on such instances, not _incorrect_.

What do you think?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

So I would think it to be odd to not have `ovs-vsctl` available if you're running Open vSwitch, but as you point out we have no way of knowing how or where other distributions or image builders would think to put the binary, so that is a fair point.

I'm leaning towards specifically checking for '/sys/class/net/ovs-system' and '/sys/class/net/ovs-netdev' for now and then add new datapath types as we become aware of them and/or add the optional runtime detection with dependency on external binary when needed.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I've figured out why my LXD reproducer doesn't reproduce exactly: NoCloud runs at both local and net stages, so the code in question is called earlier in boot than the OpenStack data source is. For now, I'll proceed with the synthetic reproducer: calling the Python code which fails directly.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hey Frode,

Now moving on from the "does this system have any OVS-managed interfaces?" to "how can I tell if a particular interface is managed by OVS?":

We discussed using `udevadm info` to determine if an interface is OVS-managed:

> If it is sufficient to know that this is a Open vSwitch managed port I guess we could get that from udev, for example:
>
> # udevadm info /sys/class/net/enp6s0.9
> ...
> E: ID_NET_DRIVER=openvswitch
> ...

But later we also discussed that OVS-managed interfaces may not be owned by the `openvswitch` kernel driver:

> Open vSwitch supports multiple datapath types, and depending on which one you use the interface may or may not be owned by the openvswitch driver.

It looks to me like these two statements are in opposition: if OVS is managing an interface via a different datapath, then it won't have ID_NET_DRIVER=openvswitch in its `udevadm info`.

If this is the case, then I think we have a couple of options.

Firstly, we could scope this down to only handle system datapath interfaces, so that it _is_ true that all the interfaces we're handling are owned by the openvswitch. I don't know enough about how OVS is used (either by us or more generally) to know if this is a reasonable suggestion, so your guidance would be appreciated.

If we can't do that, then I think we need to ask OVS directly, presumably via `ovs-vsctl show` as discussed above. This would require it to be in PATH, of course; avoiding that is a nice-to-have at best, so I think that's fine.

Does that sound right to you?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Hello Dan,

> It looks to me like these two statements are in opposition: if OVS is managing an interface via a different datapath, then it won't have ID_NET_DRIVER=openvswitch in its `udevadm info`.
>
> If this is the case, then I think we have a couple of options.

That is correct, I remembered the other datapath types after suggesting the driver approach.

> Firstly, we could scope this down to only handle system datapath interfaces, so that it _is_ true that all the interfaces we're handling are owned by the openvswitch. I don't know enough about how OVS is used (either by us or more generally) to know if this is a reasonable suggestion, so your guidance would be appreciated.

For the specific use case we're testing with right now it would be safe to assume the system datapath, however our product does also support the use of DPDK acceleration which is hinged on the netdev datapath. So I'm afraid leaving that out of scope might come back and bite us before we know it. But I guess it would be reasonable to split the work up in bite sized chunks as long as we allow for supporting this in the design.

> If we can't do that, then I think we need to ask OVS directly, presumably via `ovs-vsctl show` as discussed above. This would require it to be in PATH, of course; avoiding that is a nice-to-have at best, so I think that's fine.

Yes, we would be able to find the interface in the runtime state through `ovs-vsctl` and friends. The `show` sub-command is mostly meant for human readable overview of configuration, so we will probably dive to the depths of the interface and port tables.

Many of the sub-commands support JSON output format, for example:
`ovs-vsctl -f json find interface name=testport`
`ovs-vsctl -f json find port name=testport`

Revision history for this message
Dan Watkins (oddbloke) wrote :

To ensure that we understand the consequences of these changes, I've spent a bit of time tracking down everywhere this will affect in cloud-init by looking up the various call chains of `get_interfaces`:

Called by:
* `_get_current_rename_info`
  * `_rename_interfaces`
    * `apply_network_config_names`
      * `Distro.apply_network_config_names`
        * `Init._apply_netcfg_names`
          * `Init.apply_network_config`
* `get_ib_hwaddrs_by_interface`
  * `helpers.openstack.convert_net_json`
  * {`ConfigDrive`, `OpenStack`, `IBMCloud`}`.network_config`
* `get_interfaces_by_mac_on_linux`
  * `get_interfaces_by_mac`
    * `OpenNebulaNetwork` -- used to determine physical NICs for network config generation
    * Oracle -- a couple of ways in network config generation
    * EC2 -- network config conversion (theirs to ours)
    * `helpers.openstack.convert_net_json`
    * `helpers.digitalocean.convert_network_configuration`
    * `helpers.upcloud.convert_to_network_config_v1`
    * `net.get_devicelist` (but only on FreeBSD)
    * `find_fallback_nic_on_netbsd_or_openbsd`
    * `find_fallback_nic_on_freebsd`
    * `Networking.wait_for_physdevs`
      * `Init.apply_network_config`

Most of these are related to converting a network configuration format provided by a cloud into our own formats. None of this generation includes handling for creating OVS config (unsurprisingly!), so those cases should all be unaffected by changes to `get_interfaces` around OVS (except for a very minor performance hit for any new checks). Others are only used on *BSD, so we also don't need to worry about OVS interfaces there.

Every other call originates from `Init.apply_network_config`: this is the codepath that we are intending to affect, so we can see that there shouldn't be any unexpected consequences in other parts of the codebase.

Revision history for this message
David Ames (thedac) wrote :

Running with Dan's PPA [0] on the final reboot cloud-init fails with the following. See attached image.

Stderr: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)

I'll try and get the rest of the error content (not from console) for further debugging. But it seems the ovs command is being called before ovs is up, perhaps?

[0] https://launchpad.net/~oddbloke/+archive/ubuntu/lp1912844

Revision history for this message
David Ames (thedac) wrote :

With the latest update to the PPA [0] I can deploy a full OpenStack with machines with two VLAN interfaces each and respective spaces.

For an OVN deploy this currently requires two PPAs [0] and [1].

[0] https://launchpad.net/~oddbloke/+archive/ubuntu/lp1912844
[1] https://launchpad.net/~fnordahl/+archive/ubuntu/ovs

Revision history for this message
Dan Watkins (oddbloke) wrote :

> But I guess it would be reasonable to split the work up in bite sized chunks as long as we allow for supporting this in the design.

Having looked a little more, I don't think an incremental approach buys us much here: we'd have to replace the `udevadm` code with `ovs-vsctl` code in the next stage anyway (rather than extending it). We may as well just take the approach that will address both in the first place.

The next question is exactly which interfaces we should be excluding from the set of interfaces we consider. At the moment, my POC is excluding interfaces whose name matches an OVS bridge, determined via `ovs-vsctl list-br`. In an instance with an additional VLAN to the above configurations, I see:

# ovs-vsctl list-br
ovs-br
ovs-br.100
ovs-br.200

Does this seems appropriate? I also notice that querying for internal interfaces returns the same set:

# ovs-vsctl find interface type=internal | grep ^name
name : ovs-br
name : ovs-br.200
name : ovs-br.100

I don't think we want to exclude every interface known to OVS, because I believe that would regress bug 1898997. From an instance launched from the integration test for that bug:

cb6840fc-f53d-471b-b7e7-aa7398fd4c37
    Bridge ovs-br
        fail_mode: standalone
        Port enp5s0
            Interface enp5s0
        Port ovs-br
            Interface ovs-br
                type: internal
    ovs_version: "2.13.1"

We _do_ still want to consider enp5s0 in cloud-init's code, because it's a real interface that isn't (entirely?) configured by OVS.

Thoughts? (If this isn't a sufficient problem description, let me know!)

Revision history for this message
Dan Watkins (oddbloke) wrote :

Another question: is there a canonical way to determine if OVS isn't up? Currently I'm trying to execute a command and looking for "database connection failed" in the output, is that appropriate?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

> The next question is exactly which interfaces we should be excluding from the set of interfaces we consider. At the moment, my POC is excluding interfaces whose name matches an OVS bridge, determined via `ovs-vsctl list-br`. In an instance with an additional VLAN to the above configurations, I see:
>
> # ovs-vsctl list-br
> ovs-br
> ovs-br.100
> ovs-br.200
>
> Does this seems appropriate? I also notice that querying for internal interfaces returns the same set:

While listing the bridges would work for configurations that make use
of the 'fake-bridge' paradigm, it would not work for listing
interfaces in configurations that do not use it. So I would rely on
querying the port/interface tables as entry point instead.

> # ovs-vsctl find interface type=internal | grep ^name
> name : ovs-br
> name : ovs-br.200
> name : ovs-br.100

The interface types I know about today are 'dpdk', 'system', and 'internal'.

Interfaces of type 'dpdk' would probably be invisible from the kernel
sysfs and netlink interfaces pov, interfaces of type 'system' have
their origin in the system and cloud-init would most likely already
know all about them. Interfaces of type 'internal' may be used for
other things than VLANs so depending on what you want to match on it
may or may not be precise enough. See below for some further
discussion.

> I don't think we want to exclude every interface known to OVS, because I believe that would regress bug 1898997. From an instance launched from the integration test for that bug:
>
> cb6840fc-f53d-471b-b7e7-aa7398fd4c37
> Bridge ovs-br
> fail_mode: standalone
> Port enp5s0
> Interface enp5s0
> Port ovs-br
> Interface ovs-br
> type: internal
> ovs_version: "2.13.1"
>
> We _do_ still want to consider enp5s0 in cloud-init's code, because it's a real interface that isn't (entirely?) configured by OVS.
>
> Thoughts? (If this isn't a sufficient problem description, let me know!)

Do I understand correctly that the goal is to find any OVS managed
interface with a VLAN tag to exclude it from the duplicate MAC check?
If so the following may be an approach to find them:

    ovs-vsctl find port 'tag>0'

From the port name you can find the associated interfaces and there is
also a shorthand to find on which bridge a port belongs:

    ovs-vsctl port-to-br <PORT NAME>

Revision history for this message
Frode Nordahl (fnordahl) wrote :

> Another question: is there a canonical way to determine if OVS isn't up? Currently I'm trying to execute a command and looking for "database connection failed" in the output, is that appropriate?

In the Ubuntu systemd service we assess the database socket:
https://git.launchpad.net/~ubuntu-server-dev/ubuntu/+source/openvswitch/tree/debian/openvswitch-switch.ovs-vswitchd.service#n7

But I guess assessing tools most likely to be in your path would be
more portable, so calling `ovs-vsctl` and assessing the result would
work ok.

One thing to keep in mind is that the OVS/OVN -ctl tools may get into
a situation where they wait indefinitely. That can for example happen
if the OVSDB server is running but the vSwitch is not.

To mitigate that you can use the -t or --timeout option when calling
`ovs-vsctl`.

To assess whether the `ovs-vswitchd` process is alive independently of
the OVSDB you could have an additional check that talks to it on its
control socket, for example:

    `ovs-appctl -t ovs-vswitchd version`

Revision history for this message
Dan Watkins (oddbloke) wrote :

> Interfaces of type 'internal' may be used for other things than VLANs so depending on what you want to match on it may or may not be precise enough.

So the cloud-init code in question is used in a couple of (relevant) ways: (a) to determine the state of any physical interfaces for which we should wait before proceeding to apply network configuration to the system (intended for the case where network devices have not yet been detected by the kernel, for a variety of reasons), and (b) to determine the current system state when renaming any such interfaces to match the specified network configuration.

The code in question is iterating through every interface in the system (as determined from /sys/class/net) and determining if it should be included in this set. For reference, the code excludes bridges, VLANs, bonds, any device that has a `/master` symlink that isn't a bridge/bond member, NET_FAILOVER devices, devices without a MAC, devices with a "stolen" MAC, and, on some clouds, interfaces owned by a particular device driver (the one case I recall is for Mellanox interfaces on Azure).

We're looking to answer the question of "is this interface one that cloud-init needs to know about", so I think what we want is to exclude any OVS interface that doesn't originate from the system: OVS will handle naming non-system interfaces correctly and we know that, even if currently absent, they will be present once "needed" (because OVS will create them).

> Interfaces of type 'dpdk' would probably be invisible from the kernel
> sysfs and netlink interfaces pov, interfaces of type 'system' have
> their origin in the system and cloud-init would most likely already
> know all about them. Interfaces of type 'internal' may be used for
> other things than VLANs so depending on what you want to match on it
> may or may not be precise enough.

This suggests to me that internal interfaces _are_ the right ones exclude. I considered excluding _all_ non-"system" interfaces, but in the example config we've been using here, I see:

# ovs-vsctl --columns type find interface name=bond0
type : ""

We'd exclude the bond0 interface anyway (for being a bond) but it makes me wonder if there are other interfaces that we _wouldn't_ otherwise exclude that won't have type=system. To avoid having to answer that, I think we can safely limit this to internal interfaces (as addressing this problem, and likely a class of related ones) and if we find cases that this doesn't cover then we can drive the required changes at that point.

> In the Ubuntu systemd service <snip>

Thanks, this is a bunch of useful info! I'll experiment with a few of these options.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 21.1-19-gbad84ad4-0ubuntu1

---------------
cloud-init (21.1-19-gbad84ad4-0ubuntu1) hirsute; urgency=medium

  * d/cloud-init.postinst: Change output log permissions on upgrade
    (LP: #1918303)
  * New upstream snapshot.
    - .travis.yml: generate an SSH key before running tests (#848)
    - write passwords only to serial console, lock down cloud-init-output.log
      (#847) (LP: #1918303)
    - Fix apt default integration test (#845)
    - integration_tests: bump pycloudlib dependency (#846)
    - commit f35181fa970453ba6c7c14575b12185533391b97 [eb3095]
    - archlinux: Fix broken locale logic (#841)
      [Kristian Klausen] (LP: #1402406)
    - Integration test for #783 (#832)
    - integration_tests: mount more paths IN_PLACE (#838)
    - Fix requiring device-number on EC2 derivatives (#836) (LP: #1917875)
    - Remove the vi comment from the part-handler example (#835)
    - net: exclude OVS internal interfaces in get_interfaces (#829)
      (LP: #1912844)
    - tox.ini: pass OS_* environment variables to integration tests (#830)
    - integration_tests: add OpenStack as a platform (#804)
    - Add flexibility to IMDS api-version (#793) [Thomas Stringer]
    - Fix the TestApt tests using apt-key on Xenial and Hirsute (#823)
      [Paride Legovini] (LP: #1916629)
    - doc: remove duplicate "it" from nocloud.rst (#825) [V.I. Wood]
    - archlinux: Use hostnamectl to set the transient hostname (#797)
      [Kristian Klausen]
    - cc_keys_to_console.py: Add documentation for recently added config key
      (#824) [dermotbradley]
    - Update cc_set_hostname documentation (#818) [Toshi Aoyama]

 -- James Falcon <email address hidden> Fri, 19 Mar 2021 14:32:13 -0500

Changed in cloud-init (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

Is this going to be backported to focal (see LP: #1919493)?

Revision history for this message
Dan Watkins (oddbloke) wrote :

On Tue, Mar 23, 2021 at 07:53:59AM -0000, Alfonso Sanchez-Beato wrote:
> Is this going to be backported to focal (see LP: #1919493)?

Yep, the SRU process has been started already.

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

@Dan, was this released to focal in the end? Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.