cloud-init attempts to rename bonds

Bug #1669860 reported by Ryan Harper on 2017-03-03
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Unassigned
cloud-init (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
When booting with bonds provided in networking configuration, cloud-init
can fail as it attempts to rename the bond device to an interface.

[Test Case]
 * download ubuntu cloud image
 * mount image, enable proposed, update, upgrade cloud-init
 * run 'bond-rename-launch' as provided.
 * login to kvm guest as 'ubuntu:passw0rd'
 * sudo cloud-init init

the 'cloud-init init' above would fail before in an attempt
to rename a bond device. It will succeed now, as it will realize
that it does not have anything to do.

[Regression Potential]
Should be small. regressions would be certainly related to
bond or vlan configurations.

=== End SRU Template ===

1. Zesty amd64
2. cloud-init 0.7.9-47-gc81ea53-0ubuntu1

3. cloud-init boots with a bond network config and does not attempt to rename bond0

4. cloud-init init (net mode) fails when it attempts to rename a bond interface

Running with the following network config (2 nics)
config:
- mac_address: bc:76:4e:06:96:b3
    name: interface0
    type: physical
- mac_address: bc:76:4e:04:88:41
    name: interface1
    type: physical
- bond_interfaces:
    - interface0
    - interface1
    name: bond0
    params:
        bond_miimon: 100
        bond_mode: 802.3ad
        bond_xmit_hash_policy: layer3+4
    type: bond
- name: bond0.108
    subnets:
    - address: 65.61.151.38
        netmask: 255.255.255.252
        routes:
        - gateway: 65.61.151.37
            netmask: 0.0.0.0
            network: 0.0.0.0
        type: static
    - address: 2001:4800:78ff:1b:be76:4eff:fe06:96b3
        netmask: 'ffff:ffff:ffff:ffff::'
        routes:
        - gateway: 2001:4800:78ff:1b::1
            netmask: '::'
            network: '::'
        type: static
    type: vlan
    vlan_id: '108'
    vlan_link: bond0
- name: bond0.208
    subnets:
    - address: 10.184.225.122
        netmask: 255.255.255.252
        routes:
        - gateway: 10.184.225.121
            netmask: 255.240.0.0
            network: 10.176.0.0
        - gateway: 10.184.225.121
            netmask: 255.240.0.0
            network: 10.208.0.0
        type: static
    type: vlan
    vlan_id: '208'
    vlan_link: bond0
- address: 72.3.128.240
    type: nameserver
- address: 72.3.128.241
    type: nameserver

During cloud-init init --local; the network configuration is rendered and brought up
bond0 is a virtual interface which uses the MAC from one of the slaves.

In cloud-init init (net) mode, we check if the interfaces are named properly;
When cloud-init collects the current_rename_info, it reads the MAC address of
each device listed in /sys/class/net; this includes *virtual* devices, like bonds/bridges
Then it looks up an interface name by MAC, however the bond and one of the interfaces
have the same value which results in cloud-init attempting to rename bond0

The solution is to not collect MACs of virtual interfaces for rename-purpose since
virtual devices do not ever get renamed; their name is defined by the config.

diff --git a/cloudinit/net/__init__.py b/cloudinit/net/__init__.py
index ea649cc..e2a50ad 100755
--- a/cloudinit/net/__init__.py
+++ b/cloudinit/net/__init__.py
@@ -14,6 +14,7 @@ from cloudinit import util

 LOG = logging.getLogger(__name__)
 SYS_CLASS_NET = "/sys/class/net/"
+SYS_DEV_VIRT_NET = "/sys/devices/virtual/net/"
 DEFAULT_PRIMARY_INTERFACE = 'eth0'

@@ -205,7 +206,11 @@ def _get_current_rename_info(check_downable=True):
     """Collect information necessary for rename_interfaces."""
     names = get_devicelist()
     bymac = {}
+ virtual = os.listdir(SYS_DEV_VIRT_NET)
     for n in names:
+ # do not attempt to rename virtual interfaces
+ if n in virtual:
+ continue
         bymac[get_interface_mac(n)] = {
             'name': n, 'up': is_up(n), 'downable': None}

Log file of a failure:
http://paste.ubuntu.com/24084999/

Related bugs:
 * bug 1682871: cloud-init attempts to rename vlans / get_interfaces_by_mac does not filter vlans

Related branches

Scott Moser (smoser) on 2017-03-03
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Scott Moser (smoser) wrote :

The suggested fix doesn't work, as cloud-init may be expected to rename devices that are virtual. The example is insid an lxc container.

# ls /sys/devices/virtual/net/
eth0 lo

Ryan Harper (raharper) wrote :

Then we need to detect duplicate macs and somehow sort out which one to ignore.

Scott Moser (smoser) wrote :

I expected to be able to recreate this in a lxc container like below, but that didnt show any errors at all in /var/log/cloud-init.log.

#!/bin/sh
name=$1
[ -n "$name" ] || { echo "must give name"; exit 1; }
set -ex
lxc init ubuntu-daily:zesty $name
lxc network attach lxdbr0 $name eth1
# pastebinit `which lxc-chroot`
# http://paste.ubuntu.com/24198752/
lxc-chroot "$name" sh -c 'cat > /var/lib/cloud/seed/nocloud-net/network-config'
 <<EOF
version: 1
config:
  - type: physical
    name: eth0
  - type: physical
    name: eth1
  - type: bond
    name: bond0
    bond_interfaces: [eth0, eth1]
    params:
      bond-mode: active-backup
EOF
lxc start "$name"

On Fri, Mar 17, 2017 at 8:59 PM, Scott Moser <email address hidden> wrote:

> I expected to be able to recreate this in a lxc container like below,
> but that didnt show any errors at all in /var/log/cloud-init.log.
>

Try using non-knames, like interface0 and interface1.

>
> #!/bin/sh
> name=$1
> [ -n "$name" ] || { echo "must give name"; exit 1; }
> set -ex
> lxc init ubuntu-daily:zesty $name
> lxc network attach lxdbr0 $name eth1
> # pastebinit `which lxc-chroot`
> # http://paste.ubuntu.com/24198752/
> lxc-chroot "$name" sh -c 'cat > /var/lib/cloud/seed/nocloud-
> net/network-config'
> <<EOF
> version: 1
> config:
> - type: physical
> name: eth0
> - type: physical
> name: eth1
> - type: bond
> name: bond0
> bond_interfaces: [eth0, eth1]
> params:
> bond-mode: active-backup
> EOF
> lxc start "$name"
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1669860
>
> Title:
> cloud-init attempts to rename bonds
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1669860/+subscriptions
>

Scott Moser (smoser) wrote :

this attached script creates a container with 2 nics, and the re-writes the network config to either
have renamed devices (iface0, iface1) or bonded nic on iface0 and iface1.

the rename works correctly:
  $ lp1669860-lxc zrename rename

but the bond isn't coming up currently and no errors in the log either. I expected to have gotten errors recreating the original issue.

Still, this is useful from a perspective of being able to contaienr with different network configs.

Ryan Harper (raharper) wrote :

Getting this to trip is somewhat tricky and racy due to how fast/slow the bond is able to come up. However, if the bond is up before cloud-init.service runs it's net.apply_network_config_names() then we see the following:

>>> r = net._get_current_rename_info(check_downable=True)
>>> r
{False: {'up': False, 'name': 'bonding_masters', 'downable': True}, '00:00:00:00:00:00': {'up': True, 'name': 'lo', 'downable': False}, '52:54:00:12:34:02': {'up': True, 'name': 'interface1', 'downable': True}, '52:54:00:12:34:00': {'up': False, 'name': 'bond0', 'downable': True}}
>>> r.keys()
dict_keys([False, '00:00:00:00:00:00', '52:54:00:12:34:02', '52:54:00:12:34:00'])
>>> r['52:54:00:12:34:00']
{'up': False, 'name': 'bond0', 'downable': True}
>>> r['52:54:00:12:34:02']
{'up': True, 'name': 'interface1', 'downable': True}

Here we can see that by checking /sys/class/net/* for interfaces and mapping a mac address to an interface picks up bond0 for 'interface0's mac.

Then if we attempt to apply the names, we see the error:

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> ncfg = yaml.load(open('/root/network-config'))
>>> netcfg = ncfg.get('network')
>>> netcfg
>>> ncfg.keys()
dict_keys(['version', 'config'])
>>> from cloudinit import net
>>> net.apply_network_config_names(ncfg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 201, in apply_network_config_names
    return _rename_interfaces(renames)
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 336, in _rename_interfaces
    raise Exception('\n'.join(errors))
Exception: [unknown] Error performing rename('bond0', 'interface0') for 52:54:00:12:34:00, interface0: Unexpected error while running command.
Command: ['ip', 'link', 'set', 'bond0', 'name', 'interface0']
Exit code: 2
Reason: -
Stdout: -
Stderr: RTNETLINK answers: File exists

Ryan Harper (raharper) wrote :

Download a zesty cloud image and then create a qcow2 overlay:

qemu-img create -b zesty-server-cloudimg-amd64.img -f qcow2 bond-rename.qcow2

Then invoke the script like:

./bond-rename-launch.sh <lp userid>

Ryan Harper (raharper) wrote :

Fixed up network config for bond testing.

Scott Moser (smoser) on 2017-03-31
Changed in cloud-init:
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-89-gbf7723e8-0ubuntu1

---------------
cloud-init (0.7.9-89-gbf7723e8-0ubuntu1) zesty; urgency=medium

  * New upstream snapshot.
    - Fix bug that resulted in an attempt to rename bonds or vlans.
      (LP: #1669860)
    - tests: update OpenNebula and Digital Ocean to not rely on host
      interfaces.

 -- Scott Moser <email address hidden> Fri, 31 Mar 2017 17:02:28 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Scott Moser (smoser) on 2017-04-03
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Scott Moser (smoser) on 2017-04-03
description: updated

Hello Ryan, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-90-g61eb03fe-0ubuntu1~16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
tags: added: verification-needed
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-90-g61eb03fe-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
Scott Moser (smoser) wrote :

$ burl="http://cloud-images.ubuntu.com/daily/server"
$ for r in yakkety xenial; do
   fname=$r-server-cloudimg-amd64.img
   ofname="$fname"
   [ "$r" = "xenial" ] && ofname=$r-server-cloudimg-amd64-disk1.img
   pfname="${fname%.img}-proposed.img"
   if [ ! -f "$fname" ]; then
      proxy wget "$burl/$r/current/$ofname" -O "$fname.tmp" &&
          mv "$fname.tmp" "$fname" || break
   fi
   if [ ! -f "$pfname" ]; then
       qemu-img create -f qcow2 -b "$fname" "$pfname.tmp" || break
       sudo mount-image-callback --system-resolvconf "$pfname.tmp" -- \
           mchroot sh -ec '
               r=$(lsb_release -sc)
               m="http://archive.ubuntu.com/ubuntu"
               plist="/etc/apt/sources.list.d/proposed.list"
               echo "deb $m $r-proposed main" > "$plist"
               apt-get update -q
               DEBIAN_FRONTEND=noninteractive apt-get -qy install cloud-init
           ' </dev/null || break
       mv $pfname.tmp $pfname
   fi
done

$ for img in *-proposed.img; do
  echo $img
  sudo mount-image-callback "$img" -- mchroot dpkg-query --show cloud-init;
  done
xenial-server-cloudimg-amd64-proposed.img
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1
yakkety-server-cloudimg-amd64-proposed.img
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.10.1

## xenial
$ MODE=bond ./bond-rename-launch.sh xenial-server-cloudimg-amd64-proposed.img
... login as ubuntu:passw0rd ....
% dpkg-query --show cloud-init
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1

% python3 -c 'from cloudinit.net import get_interfaces_by_mac; print(get_interfaces_by_mac())'
{'00:00:00:00:00:00': 'lo', '52:54:00:12:34:02': 'interface1', '52:54:00:12:34:00': 'interface0'}

% sudo cloud-init init
...
no stack trace

## yakkety
$ MODE=bond ./bond-rename-launch.sh xenial-server-cloudimg-amd64-proposed.img
... login as ubuntu:passw0rd ....
% dpkg-query --show cloud-init
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.10.1
% cat /etc/cloud/build.info
build_name: server
serial: 20170413

% python3 -c 'from cloudinit.net import get_interfaces_by_mac; print(get_interfaces_by_mac())'
{'52:54:00:12:34:02': 'interface1', '00:00:00:00:00:00': 'lo', '52:54:00:12:34:00': 'interface0'}

% sudo cloud-init init
...
no stack trace

Scott Moser (smoser) wrote :
tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed
Scott Moser (smoser) on 2017-04-14
description: updated
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-90-g61eb03fe-0ubuntu1~16.10.1

---------------
cloud-init (0.7.9-90-g61eb03fe-0ubuntu1~16.10.1) yakkety; urgency=medium

  * debian/cloud-init.templates: add Bigstep to list of sources. (LP: #1676460)
  * New upstream snapshot.
    - OpenStack: add 'dvs' to the list of physical link types. (LP: #1674946)
    - Fix bug that resulted in an attempt to rename bonds or vlans.
      (LP: #1669860)
    - tests: update OpenNebula and Digital Ocean to not rely on host
      interfaces.
    - net: in netplan renderer delete known image-builtin content.
      (LP: #1675576)
    - doc: correct grammar in capabilities.rst [David Tagatac]
    - ds-identify: fix detecting of maas datasource. (LP: #1677710)
    - netplan: remove debugging prints, add debug logging [Ryan Harper]
    - ds-identify: do not write None twice to datasource_list.
    - support resizing partition and rootfs on system booted without
      initramfs. [Steve Langasek] (LP: #1677376)
    - apt_configure: run only when needed. (LP: #1675185)
    - OpenStack: identify OpenStack by product 'OpenStack Compute'.
      (LP: #1675349)
    - GCE: Search GCE in ds-identify, consider serial number in check.
      (LP: #1674861)
    - Add support for setting hashed passwords [Tore S. Lonoy] (LP: #1570325)
    - Fix filesystem creation when using "partition: auto"
      [Jonathan Ballet] (LP: #1634678)
    - ConfigDrive: support reading config drive data from /config-drive.
      (LP: #1673411)
    - ds-identify: fix detection of Bigstep datasource. (LP: #1674766)
    - test: add running of pylint [Joshua Powers]
    - ds-identify: fix bug where filename expansion was left on.
    - advertise network config v2 support (NETWORK_CONFIG_V2) in features.
    - Bigstep: fix bug when executing in python3. [root]
    - Fix unit test when running in a system deployed with cloud-init.
    - Bounce network interface for Azure when using the built-in path.
      [Brent Baude] (LP: #1674685)
    - cloudinit.net: add network config v2 parsing and rendering [Ryan Harper]
    - net: Fix incorrect call to isfile [Joshua Powers] (LP: #1674317)
    - net: add renderers for automatically selecting the renderer.
    - doc: fix config drive doc with regard to unpartitioned disks.
      (LP: #1673818)
    - test: Adding integratiron test for password as list [Joshua Powers]
    - render_network_state: switch arguments around, do not require target
    - support 'loopback' as a device type.
    - Integration Testing: improve testcase subclassing [Wesley Wiedenmeier]
    - gitignore: adding doc/rtd_html [Joshua Powers]
    - doc: add instructions for running integration tests via tox.
      [Joshua Powers]
    - test: avoid differences in 'date' output due to daylight savings.
    - Fix chef config module in omnibus install. [Jeremy Melvin] (LP: #1583837)
    - Add feature flags to cloudinit.version. [Wesley Wiedenmeier]
    - tox: add a citest environment
    - Support chpasswd/list being a list in addition to a string.
      [Sergio Lystopad] (LP: #1665694)
    - doc: Fix configuration example for cc_set_passwords module.
      [Sergio Lystopad] (LP: #1665773)
    - ...

Read more...

Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1

---------------
cloud-init (0.7.9-90-g61eb03fe-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/cloud-init.templates: add Bigstep to list of sources. (LP: #1676460)
  * New upstream snapshot.
    - OpenStack: add 'dvs' to the list of physical link types. (LP: #1674946)
    - Fix bug that resulted in an attempt to rename bonds or vlans.
      (LP: #1669860)
    - tests: update OpenNebula and Digital Ocean to not rely on host
      interfaces.
    - net: in netplan renderer delete known image-builtin content.
      (LP: #1675576)
    - doc: correct grammar in capabilities.rst [David Tagatac]
    - ds-identify: fix detecting of maas datasource. (LP: #1677710)
    - netplan: remove debugging prints, add debug logging [Ryan Harper]
    - ds-identify: do not write None twice to datasource_list.
    - support resizing partition and rootfs on system booted without
      initramfs. [Steve Langasek] (LP: #1677376)
    - apt_configure: run only when needed. (LP: #1675185)
    - OpenStack: identify OpenStack by product 'OpenStack Compute'.
      (LP: #1675349)
    - GCE: Search GCE in ds-identify, consider serial number in check.
      (LP: #1674861)
    - Add support for setting hashed passwords [Tore S. Lonoy] (LP: #1570325)
    - Fix filesystem creation when using "partition: auto"
      [Jonathan Ballet] (LP: #1634678)
    - ConfigDrive: support reading config drive data from /config-drive.
      (LP: #1673411)
    - ds-identify: fix detection of Bigstep datasource. (LP: #1674766)
    - test: add running of pylint [Joshua Powers]
    - ds-identify: fix bug where filename expansion was left on.
    - advertise network config v2 support (NETWORK_CONFIG_V2) in features.
    - Bigstep: fix bug when executing in python3. [root]
    - Fix unit test when running in a system deployed with cloud-init.
    - Bounce network interface for Azure when using the built-in path.
      [Brent Baude] (LP: #1674685)
    - cloudinit.net: add network config v2 parsing and rendering [Ryan Harper]
    - net: Fix incorrect call to isfile [Joshua Powers] (LP: #1674317)
    - net: add renderers for automatically selecting the renderer.
    - doc: fix config drive doc with regard to unpartitioned disks.
      (LP: #1673818)
    - test: Adding integratiron test for password as list [Joshua Powers]
    - render_network_state: switch arguments around, do not require target
    - support 'loopback' as a device type.
    - Integration Testing: improve testcase subclassing [Wesley Wiedenmeier]
    - gitignore: adding doc/rtd_html [Joshua Powers]
    - doc: add instructions for running integration tests via tox.
      [Joshua Powers]
    - test: avoid differences in 'date' output due to daylight savings.
    - Fix chef config module in omnibus install. [Jeremy Melvin] (LP: #1583837)
    - Add feature flags to cloudinit.version. [Wesley Wiedenmeier]
    - tox: add a citest environment
    - Support chpasswd/list being a list in addition to a string.
      [Sergio Lystopad] (LP: #1665694)
    - doc: Fix configuration example for cc_set_passwords module.
      [Sergio Lystopad] (LP: #1665773...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers