Bond interfaces stuck at 1500 MTU on Bionic

Bug #1774666 reported by KingJ on 2018-06-01
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
cloud-init
Medium
Chad Smith
cloud-init (Ubuntu)
Status tracked in Cosmic
Xenial
Undecided
Unassigned
Artful
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Undecided
Unassigned

Bug Description

When deploying a machine through MAAS with bonded network interfaces, the bond does not have a 9000 byte MTU applied despite the attached VLANs having had a 9000 MTU explicitly set. The MTU size is set on the bond members, but not on the bond itself in Netplan. Consequently, when the bond is brought up, the interface MTU is decreased from 9000 to 1500. Manually changing the interface MTU after boot is successful.

This is not observed when deploying Xenial on the same machine. The bond comes up at the expected 9000 byte MTU.

Related branches

KingJ (kj-kingj) wrote :
Changed in maas:
status: New → Incomplete
KingJ (kj-kingj) wrote :

Output of dmesg on Bionic - note interface MTU being changed from 9000 to 1500.

KingJ (kj-kingj) wrote :
KingJ (kj-kingj) wrote :
KingJ (kj-kingj) wrote :
Andres Rodriguez (andreserl) wrote :

Looking at the MAAS curtin configuration, I see the following:

  - bond_interfaces:
    - enp5s0f0
    - enp5s0f1
    - enp6s0f0
    - enp6s0f1
    id: eth1
    mac_address: 00:1b:21:4a:99:50
    mtu: 9000
    name: eth1
    params:
      bond-downdelay: 0
      bond-lacp-rate: fast
      bond-miimon: 100
      bond-mode: 802.3ad
      bond-updelay: 0
      bond-xmit-hash-policy: encap3+4
    subnets:
    - type: manual
    type: bond

The resulting netplan configuration, however, doesn't include an MTU. On the other hand, the Xenial configuration does correctly have the MTU for the bond.

As such, this seems like an issue in cloud-init to me.

KingJ (kj-kingj) wrote :
KingJ (kj-kingj) wrote :

Can you manually change the mtu in the netplan yaml under eth1? If you do so, is the MTU then set correctly?

KingJ (kj-kingj) wrote :

@cyphermox I'm deploying the machine back to Bionic to try this now. Can you confirm where exactly in the netplan config I need to set this? (e.g. in the bonds section, or add a new eth1 definition to the ethernets section?)

In the existing bonds section, for the interface that is your bond (eth1)

KingJ (kj-kingj) wrote :

I've added "mtu: 9000" to the bonds section, which now reads as follows;

    bonds:
        eth1:
            mtu: 9000
            interfaces:
            - enp5s0f0
            - enp5s0f1
            - enp6s0f0
            - enp6s0f1
            parameters:
                down-delay: 0
                lacp-rate: fast
                mii-monitor-interval: 100
                mode: 802.3ad
                transmit-hash-policy: encap3+4
                up-delay: 0

I re-ran netplan apply, but the bond had the same MTU as before. I also tried restarting systemd-networkd as I could see that the relevant .netdev files in /run/systemd/network/ had MTUBytes set to 9000. However, the interface remained at 1500 bytes.

After a system restart however, the bond interface is now running at a MTU of 9000. dmesg now only has messages for the member interfaces being increased from 1500 to 9000 MTU - the messages regarding the MTU being lowered from 9000 to 1500 when the bond was being configured are no longer present.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Changed in netplan.io (Ubuntu):
status: New → Confirmed
Jason Hobbs (jason-hobbs) wrote :

We are seeing this in our test runs as well.

tags: added: cdo-qa foundations-engine
Jason Hobbs (jason-hobbs) wrote :

This is causing test failures for us, because containers deployed by juju that are bound to a space that sits on top of the bond have the corrent mtu (9000) but the bond's mtu is stuck at (1500), so packets are being dropped.

curtin config for the machine:
http://paste.ubuntu.com/p/8tMR2YBGYm/

cloud-init netplan yaml (50-cloud-init.yaml.bak.1528231254):
http://paste.ubuntu.com/p/Vkq77KqwBp/

juju netplan yaml:
http://paste.ubuntu.com/p/wf9F2xzCy6/

Jason Hobbs (jason-hobbs) wrote :

Subscribed to Canonical Field High SLA.

Changed in maas:
status: Incomplete → Invalid
Chad Smith (chad.smith) on 2018-06-06
Changed in cloud-init:
importance: Undecided → Medium
status: New → In Progress
Changed in cloud-init:
assignee: nobody → Chad Smith (chad.smith)
Chad Smith (chad.smith) wrote :

An upstream commit landed for this bug.

To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=c3f1ad9a

Changed in cloud-init:
status: In Progress → Fix Committed
Chad Smith (chad.smith) wrote :

hrm, didn't intend to nominate netplan.io for xenial, artful, bionic, cosmic; only cloud-init. Not sure how to revoke the netplan.io nomination

Andreas Hasenack (ahasenack) wrote :

Somehow the netplan.io and cloud-init tasks are linked in terms of those nominations. If I approve the cloud-init ones, netplan's also get approved.

Chad Smith (chad.smith) wrote :

This is a cloud-init issue only. Once cloud-init is SRU'd netplan will properly set mtu.

Changed in netplan.io (Ubuntu Bionic):
status: New → Invalid
Changed in netplan.io (Ubuntu Artful):
status: New → Invalid
Changed in netplan.io (Ubuntu Xenial):
status: New → Invalid
Chad Smith (chad.smith) wrote :

Setting netplan series tasks as invalid as this is a cloud-init bug, netplan on artful++ will do as cloud-init tells it, but we need an SRU for cloud-init into artful/bionic to fix things. (and a new cloud-init devel release is cosmic this week to fix behavior).

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 18.2-77-g4ce67201-0ubuntu1

---------------
cloud-init (18.2-77-g4ce67201-0ubuntu1) cosmic; urgency=medium

  * New upstream snapshot.
    - lxd: Delete default network and detach device if lxd-init created them.
      (LP: #1776958)
    - openstack: avoid unneeded metadata probe on non-openstack platforms
      (LP: #1776701)
    - stages: fix tracebacks if a module stage is undefined or empty
      [Robert Schweikert] (LP: #1770462)
    - Be more safe on string/bytes when writing multipart user-data to disk.
      (LP: #1768600)
    - Fix get_proc_env for pids that have non-utf8 content in environment.
      (LP: #1775371)
    - tests: fix salt_minion integration test on bionic and later
    - tests: provide human-readable integration test summary when --verbose
    - tests: skip chrony integration tests on lxd running artful or older
    - test: add optional --preserve-instance arg to integraiton tests
    - netplan: fix mtu if provided by network config for all rendered types
      (LP: #1774666)
    - tests: remove pip install workarounds for pylxd, take upstream fix.
    - subp: support combine_capture argument.
    - tests: ordered tox dependencies for pylxd install

 -- Chad Smith <email address hidden> Fri, 15 Jun 2018 20:05:07 -0600

Changed in cloud-init (Ubuntu Cosmic):
status: Confirmed → Fix Released

This bug is believed to be fixed in cloud-init in version 18.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu Artful):
status: New → Confirmed
Changed in cloud-init (Ubuntu Bionic):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Ryan Beisner (1chb1n) wrote :

FWIW, we squarely hit this while redeploying our dev cloud (serverstack) on bionic, which uses bonds and jumbo frames.

no longer affects: netplan.io (Ubuntu)
no longer affects: netplan.io (Ubuntu Xenial)
no longer affects: netplan.io (Ubuntu Artful)
no longer affects: netplan.io (Ubuntu Bionic)
Steve Langasek (vorlon) on 2018-07-10
Changed in netplan.io (Ubuntu Cosmic):
status: Confirmed → Invalid
no longer affects: netplan.io (Ubuntu Cosmic)
Scott Moser (smoser) wrote :

Hi,
This bug is belived to be fixed in the version of cloud-init in -proposed of 16.04, 17.10 and 18.04 under SRU bug 1777912.

It would be good if someone could report back on that bug as to whether or not this is now working for them.

Jason Hobbs (jason-hobbs) wrote :

Marked as Fix Released on Bionic/Xenial because the SRU for bug 1777912 is done. I can't make Artful "Won't Fix", but it should be.

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Released
Changed in cloud-init (Ubuntu Bionic):
status: Confirmed → Fix Released
Steve Langasek (vorlon) on 2018-09-12
Changed in cloud-init (Ubuntu Artful):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers