Generated bonding configuration is incorrect.

Bug #1588547 reported by Jorge Niedbalski
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Opinion
Low
Mike Pontillo
curtin
Fix Released
Undecided
Unassigned
curtin (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Users attempting to configure nic bonding and using both ipv4 and ipv6
   encounter a misconfigured network due to a bug in curtin's handling of
   bond configuration and ipv6. The error results in superflous
   attributes and then broken ipv4 entries if preceeded by an ipv6
   address.

   This affects all curtin releases which support networking
   configuration.

 * This SRU fixes bonding and ipv6 configurations by no longer emitting
   attributes for aliased interfaces and calculating the correct
   inet type for a given interface.

[Test Case]

 * On a Xenial 16.04 system
    - apt-get install curtin
    - cat >test.yaml <<EOF
# YAML example of a simple network config
network:
    version: 1
    config:
        # Physical interfaces.
        - type: physical
          name: eth0
          mac_address: "c0:d6:9f:2c:e8:80"
          subnets:
              - type: static
                address: fde9:8f83:4a81:1:0:1:0:6/64
              - type: static
                address: 192.168.0.1/24
EOF
    - curtin apply_net -c test.yaml --target target

PASS: both greps return zero
FAIL: either grep returns non-zero

    - grep "^iface eth0 inet6 static" target/etc/network/interfaces
    - grep "^iface eth0:1 inet static" target/etc/network/interfaces

[Regression Potential]

 * Low; users of this configuration would be broken already

[Original Description]

[Environment]

MAAS 2.0

[Description]

Given the following configuration: http://img.ctrlv.in/img/16/06/02/5750ac35c78f8.png

The resulting /etc/network/interfaces looks as following:

http://paste.ubuntu.com/16931445/

Some bad aspects of the resulting configuration are:

1) An ipv4 address is configured on the alias, while an ipv6 address was expected,

auto bond0:1
iface bond0:1 inet6 static
    address 172.27.72.7/26

2) It seems that there is no need to configure the bond-* options on the alias interfaces as these
options are inherited.

3) Could be possible to describe why and how the hwaddresses are selected for the aliases?

Related branches

tags: added: sts-needs-review
description: updated
Changed in maas:
milestone: none → 2.0.0
assignee: nobody → Mike Pontillo (mpontillo)
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Mike Pontillo (mpontillo) wrote :

(1) From the screenshot you posted, it looks like the alias indeed has the IPv6 address, so the configuration that was rendered matches what is shown in the node interface configuration. (So I'm not sure what the bug is here?)

(2) I agree that the bond parameters on the alias interfaces is incorrect. Is this causing a problem, or is it just an annoyance? I would think that ifupdown would just ignore the extra parameters. This is generated by curtin, so I'll add curtin to this bug.

(3) When you create the bond, the MAC address is inherited from the first interface in the bond. Aliases in Linux always have the same MAC address as the parent interface. You can click the "MAC" column header (next to "Name") to double-check this.

Changed in maas:
importance: Critical → Low
status: Triaged → Invalid
Revision history for this message
Mike Pontillo (mpontillo) wrote :

At this time, I don't see a MAAS bug here; please let me know if I've missed something.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

This was just explained to me on IRC, so let me clarify the bug in (1).

It looks like we pass an IPv4 address to curtin for the alias, but "inet6" is rendered in the configuration and not "inet". I missed that before.

So, it looks like (1) and (2) are curtin bugs, and I'm not sure what the ask is for (3).

Changed in maas:
milestone: 2.0.0 → none
Revision history for this message
Mike Pontillo (mpontillo) wrote :

To confirm whether or not this is a curtin issue, could you please attach the output of:

$ maas <profile> machine get-curtin-config <system-id>

... for the system-id of the node that is generating an incorrect configuration? Thanks in advance.

Changed in maas:
status: Invalid → Incomplete
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :
Download full text (4.2 KiB)

I'm not sure which software is doing what here (maas or curtin), but it looks like something isn't generating that interfaces file properly.

For (3), one question is why is the hwaddress being specified in the interfaces file, given that the bond will inherit the address of its first slave (unless something sets it explicitly prior to the first slave being added)?

I'm wondering if there's a potential issue here if, e.g., the slave whose MAC is explicitly stated is later removed from the bond (editing the configuration file), the system rebooted, and then the bond and that slave would be configured with the same MAC.

Along the same lines, since "aliases" don't really exist as separate entities (they're just additional addresses on the actual interface with a special label for ifconfig backwards compatibility), there is no value in supplying the hwaddress at all.

Reportedly, the maas deployment fails intermittently. The provided interfaces file produces a number of errors in syslog (most of these look to be from the bonding options in the alias sections):

Jun 2 11:36:19 ubuntu ifup[2667]: sh: echo: I/O error
Jun 2 11:36:19 ubuntu kernel: [ 12.733066] bonding: no command found in bonding_masters - use +ifname or -ifname
Jun 2 11:36:19 ubuntu ifup[2667]: RTNETLINK answers: No such device
Jun 2 11:36:19 ubuntu ifup[2667]: Cannot send link get request: No such device
Jun 2 11:36:19 ubuntu ifup[2667]: /etc/network/if-pre-up.d/ifenslave: 65: /etc/network/if-pre-up.d/ifenslave: cannot create /sys/class/net/bond0:1/bonding/xmit_hash_policy: Directory nonexistent
Jun 2 11:36:19 ubuntu ifup[2667]: /etc/network/if-pre-up.d/ifenslave: 65: /etc/network/if-pre-up.d/ifenslave: cannot create /sys/class/net/bond0:1/bonding/miimon: Directory nonexistent
Jun 2 11:36:19 ubuntu ifup[2667]: RTNETLINK answers: No such device
Jun 2 11:36:19 ubuntu ifup[2667]: Cannot send link get request: No such device
Jun 2 11:36:19 ubuntu ifup[2667]: /etc/network/if-pre-up.d/ifenslave: 65: /etc/network/if-pre-up.d/ifenslave: cannot create /sys/class/net/bond0:1/bonding/mode: Directory nonexistent
Jun 2 11:36:19 ubuntu ifup[2667]: RTNETLINK answers: No such device
Jun 2 11:36:19 ubuntu ifup[2667]: Cannot send link get request: No such device
Jun 2 11:36:19 ubuntu ifup[2667]: /etc/network/if-pre-up.d/ifenslave: 65: /etc/network/if-pre-up.d/ifenslave: cannot create /sys/class/net/bond0:1/bonding/lacp_rate: Directory nonexistent
Jun 2 11:36:19 ubuntu ifup[2667]: Waiting for a slave to join bond0:1 (will timeout after 60s)
Jun 2 11:36:19 ubuntu ifup[2667]: cat: '/sys/class/net/bond0:1/bonding/slaves': No such file or directory
Jun 2 11:36:19 ubuntu ifup[2667]: message repeated 4 times: [ cat: '/sys/class/net/bond0:1/bonding/slaves': No such file or directory]
Jun 2 11:36:20 ubuntu sh[3559]: Waiting for DAD... Done
Jun 2 11:36:20 ubuntu ifup[2667]: cat: '/sys/class/net/bond0:1/bonding/slaves': No such file or directory
Jun 2 11:36:20 ubuntu kernel: [ 13.832965] ixgbe 0000:04:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: RX/TX
Jun 2 11:36:20 ubuntu kernel: [ 13.832999] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
Jun 2 11:36:20 ubuntu kernel: [ 14.1431...

Read more...

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Thanks for the information, Jay.

Most of the poor behavior here seems to be due to the fact that curtin replicates the bond parameters for alias interfaces, which is just wrong.

As for the hardware address being in the configuration: if I remember correctly, we added the hardware address here due to a bug. (If the bond comes up with an inconsistent hardware address, the MAC seen by the DHCP request can be inconsistent, etc?) If you reconfigure the bond on a deployed node to no longer be a bond, that may be undefined behavior in MAAS which we were trying to prevent.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

I proposed a couple of curtin changes which I think will fix the main complaints here:

https://code.launchpad.net/~mpontillo/curtin/fix-improper-inet6-bug-1588547/+merge/296380

https://code.launchpad.net/~mpontillo/curtin/fix-improper-bond-parameters-bug-1588547/+merge/296382

I'm marking this "Opinion" for MAAS, since it's debatable if we should let the system decide the bond MAC; I prefer the bond to always come up with a MAC that is consistent with the MAAS model of the world.

Changed in curtin:
status: New → Confirmed
Changed in maas:
status: Incomplete → Opinion
Revision history for this message
Felipe Reyes (freyes) wrote :

Mike, we tested those two patches and with them we can successfully deploy the node that was hitting this bug. Do you have any plans to push a newer version of curtin into ppa:maas/next ?

Thanks,

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Thanks for testing the patches. My understanding was that the Curtin team would be doing a SRU for this fix (along with some other fixes), and that updating the PPA would not be necessary. (Andres normally updates the PPA, and he is out this week, so the earliest it would likely happen is next week.)

Ryan Harper (raharper)
tags: added: curtin-sru
Ryan Harper (raharper)
Changed in curtin:
status: Confirmed → Fix Committed
Ryan Harper (raharper)
description: updated
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Jorge, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr399-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Changed in curtin (Ubuntu):
status: New → Fix Released
Changed in curtin (Ubuntu Xenial):
status: New → Incomplete
status: Incomplete → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Jorge, or anyone else affected,

Accepted curtin into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr399-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Trusty):
status: New → Fix Committed
Felipe Reyes (freyes)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Note that this has been tested and verified on the field! Thanks.

tags: added: 4010
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr399-0ubuntu1~16.04.1

---------------
curtin (0.1.0~bzr399-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/new-upstream-snapshot: fix for specifying revision.
  * SRU current curtin
    - curtin/net: fix inet value for subnets, don't add interface attributes
      to alias (LP: #1588547)
    - improve net-meta network configuration (LP: #1592149)
    - reporting: set webhook handler level to DEBUG, no filtering
      (LP: #1590846)
    - tests/vmtests: add yakkety, remove vivid
    - curtin/net: use post-up for interface alias, resolve 120 second time out
      on Trusty when using interface aliases
    - vmtest: provide info on images used
    - fix multipath configuration and add multipath tests (LP: #1551937)
    - tools/launch and tools/xkvm: whitespace cleanup and bash -x
    - tools/launch: boot by root=LABEL=cloudimg-rootfs
    - Initial vmtest power8 support and TestSimple test.

 -- Ryan Harper <email address hidden> Tue, 12 Jul 2016 11:29:30 -0500

Changed in curtin (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.5 KiB)

This bug was fixed in the package curtin - 0.1.0~bzr399-0ubuntu1~14.04.1

---------------
curtin (0.1.0~bzr399-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * SRU current curtin
    - curtin/net: fix inet value for subnets, don't add interface attributes
      to alias (LP: #1588547)
    - improve net-meta network configuration (LP: #1592149)
    - reporting: set webhook handler level to DEBUG, no filtering
      (LP: #1590846)
    - tests/vmtests: add yakkety, remove vivid
    - curtin/net: use post-up for interface alias, resolve 120 second time out
      on Trusty when using interface aliases
    - vmtest: provide info on images used
    - fix multipath configuration and add multipath tests (LP: #1551937)
    - tools/launch and tools/xkvm: whitespace cleanup and bash -x
    - tools/launch: boot by root=LABEL=cloudimg-rootfs
    - Initial vmtest power8 support and TestSimple test.

curtin (0.1.0~bzr389-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * New upstream snapshot.
    * Detect and remove legacy /etc/network/interfaces.d/eth0.cfg from
      target (LP: #1577872)

curtin (0.1.0~bzr387-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * sru current curtin (LP: #1577872)
  * debian/new-upstream-snapshot, debian/README.source: add
    new-upstream-snapshot and mention it in README.source
  * debian/control: drop python from curtin-common Depends.
     remove unnecessary Depends on util-linux as it is essential.
     python3-curtin, python-curtin: drop unnecessary 'curl' from Depends.
     python3-curtin, python-curtin: list oauthlib and yaml Depends
  * debian/control: add bcache-tools to curtin Depends.
  * New upstream snapshot.
    - fix timestamp not being updated in reported events
    - mdadm: resolve mdadm/bcache and trusty+hwe issues
    - fix support for 4k disks
    - emit source /etc/network/interfaces.d/*.cfg in
      rendered /etc/network/interfaces
    - net: introduce 'control' field to network configuration to allow
      for declaring manual controlled interfaces
    - disable cloud-init networking as curtin is the source of network config
    - block: wipe_volume improvements
    - reporter: enhance reporting events to include levels and
      improve usefullness of messages
    - network: add bonding tests and cleanup newline rendering
    - block: fix partition path issue with nvme devices
    - fix logic error in kernel installation
    - block: add debug regarding raid modules being missing on mdadm create
    - add s390x support to curtin and vmtest
    - support build on xenial where python3 pyflakes is split out
    - fix uefi install path on nvme devices
    - numerous unit tests and vmtests improvements. Add running
      of pylint for static checking.
    - Add bond parsing & improved source, source-directory parsing
      of /etc/network/interfaces.
    - move global dns-* options under auto lo in /etc/network/interfaces
    - partitioning: limited support for odd ordering of partition numbers
    - change use of mkfs.fat to mkfs.vfat and add dependency.
    - block-meta: use removable devices if no non-removable devices are
      found [Robert Clark]
    - Improve 'curtin mkfs' and move mkfs...

Read more...

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.