Wrong MTU values for container's NICs

Bug #1733592 reported by Ante Karamatić
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Medium
Witold Krecicki
MAAS
Invalid
Medium
Unassigned
2.3
Invalid
Medium
Unassigned

Bug Description

When juju deployed a container on a machine, it didn't pass MTU values properly to LXC configuration. MTU stanzas are also missing from ENI on the container.

Physical machine's ENI:

auto eth4
iface eth4 inet manual
    mtu 1500

auto eth1.2733
iface eth1.2733 inet manual
    mtu 9000
    vlan-raw-device eth1
    vlan_id 2733

auto br-eth4
iface br-eth4 inet static
    address 10.245.209.20/20
    gateway 10.245.208.1
    bridge_ports eth4

auto br-eth1.2733
iface br-eth1.2733 inet static
    address 192.168.33.127/24
    bridge_ports eth1.2733

This means there are two bridges for containers. One with MTU 1500, and the other with MTU 9000. However, container ends up being:

# lxc config show juju-e65005-16-lxd-6 | grep -i mtu
    mtu: "1500"
    mtu: "1500"

And ENI from container:

# cat /var/lib/lxd/containers/juju-e65005-16-lxd-6/rootfs/etc/network/interfaces

auto lo eth1 eth0

iface lo inet loopback
  dns-nameservers 10.245.208.31 10.245.208.32 10.245.208.33
  dns-search maas

iface eth0 inet static
  address 192.168.33.141/24

iface eth1 inet static
  address 10.245.209.36/20
  gateway 10.245.208.1

Ante Karamatić (ivoks)
tags: added: cpe-onsite
Revision history for this message
Ante Karamatić (ivoks) wrote :
Download full text (7.0 KiB)

Looks like juju detects correct settings from the machine (look for 192.168.33 subnet):

2017-11-21 00:51:24 DEBUG juju.worker.machiner machiner.go:172 observed network config updated for "machine-16" to [{1 127.0.0.0/8 65536 0 lo loopback false false loopback 127.0.0.1 [] [] []} {1 ::1/128 65536 0 lo loopback false false loopback ::1 [] [] []} {2 14:02:ec:42:38:dc 1500 0 eno1 ethernet false false manual [] [] []} {3 5c:b9:01:9a:3c:88 10.245.208.0/20 1500 0 eth4 ethernet false false static 10.245.209.20 [] [] []} {3 5c:b9:01:9a:3c:88 1500 0 eth4 ethernet false false manual [] [] []} {4 14:02:ec:42:38:dd 1500 0 eno2 ethernet false false manual [] [] []} {5 14:02:ec:42:38:de 1500 0 eno3 ethernet false false manual [] [] []} {6 5c:b9:01:9a:3c:89 9000 0 eth1 ethernet false false manual [] [] []} {7 14:02:ec:42:38:df 1500 0 eno4 ethernet false false manual [] [] []} {8 00:11:0a:66:2c:b4 9000 0 eth6 ethernet false false manual [] [] []} {9 00:11:0a:66:2c:b4 9000 0 eth7 ethernet false false manual [] [] []} {10 00:11:0a:66:2c:b4 9000 0 bond0 bond false false manual [] [] []} {11 00:11:0a:66:2c:b4 192.168.35.0/24 9000 0 bond0.2735 802.1q false false static 192.168.35.8 [] [] []} {11 00:11:0a:66:2c:b4 9000 0 bond0.2735 802.1q false false manual [] [] []} {12 00:11:0a:66:2c:b4 192.168.36.0/24 9000 0 bond0.2736 802.1q false false static 192.168.36.8 [] [] []} {12 00:11:0a:66:2c:b4 9000 0 bond0.2736 802.1q false false manual [] [] []} {13 5c:b9:01:9a:3c:89 192.168.33.0/24 9000 0 eth1.2733 802.1q false false static 192.168.33.127 [] [] []} {13 5c:b9:01:9a:3c:89 9000 0 eth1.2733 802.1q false false manual [] [] []} {14 5c:b9:01:9a:3c:89 1500 0 eth1.2734 802.1q false false manual [] [] []} {15 6a:43:a8:e2:cb:49 1500 0 lxdbr0 bridge false false manual [] [] []} {15 6a:43:a8:e2:cb:49 1500 0 lxdbr0 bridge false false manual [] [] []}]

2017-11-21 00:55:14 DEBUG juju.provisioner.lxd host_preparer.go:111 updating observed network config for "machine-16" to []params.NetworkConfig{...
params.NetworkConfig{DeviceIndex:22, MACAddress:"5c:b9:01:9a:3c:89", CIDR:"192.168.33.0/24", MTU:9000, ProviderId:"", ProviderSubnetId:"", ProviderSpaceId:"", ProviderAddressId:"", ProviderVLANId:"", VLANTag:0, InterfaceName:"br-eth1.2733", ParentInterfaceName:"", InterfaceType:"bridge", Disabled:false, NoAutoStart:false, ConfigType:"static", Address:"192.168.33.127", DNSServers:[]string(nil), DNSSearchDomains:[]string(nil), GatewayAddress:"", Routes:[]params.NetworkRoute(nil)}...

But however, creates a container with wrong MTU:

2017-11-21 00:56:32 DEBUG juju.cloudconfig.containerinit container_userdata.go:73 generating network config from container.NetworkConfig{NetworkType:"bridge", Device:"lxdbr0", MTU:0, Interfaces:[]network.Interfa
ceInfo{network.InterfaceInfo{DeviceIndex:0, MACAddress:"00:16:3e:4d:a2:ff", CIDR:"192.168.33.0/24", ProviderId:"227", ProviderSubnetId:"4", ProviderNetworkId:"", ProviderSpaceId:"", ProviderVLANId:"5037", Provid
erAddressId:"409", AvailabilityZones:[]string(nil), VLANTag:273...

Read more...

tags: added: cdo-qa foundations-engine
Tim Penhey (thumper)
Changed in juju:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Witold Krecicki (wpk)
milestone: none → 2.3-rc2
Revision history for this message
Witold Krecicki (wpk) wrote :

We have confirmed that the bridge is created with proper MTU (9000), machiner is reporting proper value to controller, and this value is stored in database.
Then, when we create the e/n/i for the container in container_userdata.go the networkConfig has the MTU set to 1500
For tomorrow my next step is to verify all the steps between those two to see where do we loose the information about the proper MTU.

Revision history for this message
Ante Karamatić (ivoks) wrote : Re: [Bug 1733592] Re: Wrong MTU values for container's NICs

I’m more worried about MTU setting in LXC’s config itself, rather than
content of ENI. By setting mtu to 1500 in LXC’s config, veth pairs are
created and bridge moves to MTU 1500. This is before container’s ENI
content has any influence on the whole problem.
--
Ante Karamatić
<email address hidden>
Canonical

Revision history for this message
Witold Krecicki (wpk) wrote :

Source of this information is the same - it's just that information for e/n/i is more visible in the debug log - so the fix will be complete.

Revision history for this message
Ante Karamatić (ivoks) wrote :
Download full text (3.2 KiB)

New data shows that MAAS had a space and vlan defined with MTU 1500:

    {
        "id": 3,
        "name": "internal-space",
        "subnets": [
            {
                "gateway_ip": null,
                "dns_servers": [],
                "resource_uri": "/MAAS/api/2.0/subnets/3/",
                "active_discovery": false,
                "managed": true,
                "id": 3,
                "allow_proxy": true,
                "vlan": {
                    "mtu": 1500,
                    "fabric": "default",
                    "relay_vlan": null,
                    "resource_uri": "/MAAS/api/2.0/vlans/5033/",
                    "primary_rack": null,
                    "external_dhcp": null,
                    "id": 5033,
                    "fabric_id": 30,
                    "name": "",
                    "dhcp_on": false,
                    "vid": 2733,
                    "space": "internal-space",
                    "secondary_rack": null
                },
                "name": "internal",
                "cidr": "192.168.33.0/24",
                "rdns_mode": 2,
                "space": "internal-space"
            }
        ],
        "vlans": [
            {
                "mtu": 1500,
                "fabric": "default",
                "relay_vlan": null,
                "resource_uri": "/MAAS/api/2.0/vlans/5033/",
                "primary_rack": null,
                "external_dhcp": null,
                "id": 5033,
                "fabric_id": 30,
                "name": "",
                "dhcp_on": false,
                "vid": 2733,
                "space": "internal-space",
                "secondary_rack": null
            }
        ],
        "resource_uri": "/MAAS/api/2.0/spaces/3/"
    },

When one tries to attach an interface to that VLAN, with MTU 9000, it should have raised an issue. Instead, MAAS happily connected two layer 2 objects (VLAN and a NIC) with different MTU values:

            {
                "mac_address": "5c:b9:01:9b:58:a9",
                "params": {
                    "mtu": 9000
                },
                "id": 109,
                "vlan": {
                    "fabric_id": 30,
                    "fabric": "default",
                    "mtu": 1500,
                    "relay_vlan": null,
                    "primary_rack": "abcfkh",
                    "resource_uri": "/MAAS/api/2.0/vlans/5031/",
                    "secondary_rack": "cmkt8t",
                    "name": "untagged",
                    "vid": 0,
                    "dhcp_on": true,
                    "id": 5031,
                    "space": "oam-space",
                    "external_dhcp": null
                },
                "type": "physical",
                "links": [],
                "system_id": "wn67a6",
                "enabled": true,
                "parents": [],
                "name": "eth1",
                "resource_uri": "/MAAS/api/2.0/nodes/wn67a6/interfaces/109/",
                "discovered": null,
                "tags": [
                    "sriov"
                ],
                "effective_mtu": 9000,
                "children": [
                    "eth1.27...

Read more...

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

My 2 cents:

If you have both internal and external connectivity provided via the same l2 network (say via different interfaces attached to different vlans or even the same vlan but a different interface) you have to take MTU values further down the path to a remote host into account. Especially if PMTUD cannot be performed due to blocked ICMP on some hops.

So, there are use-cases for such behavior, however, it's best to detect mismatches like that and warn.

So long as hosts on the same link use consistent MTU for point-point connections there is generally no problem with using MTU <= MAX_SUPPORTED_LINK_MTU but we need better error detection and foolproof mechanisms.

Changed in maas:
importance: Undecided → Medium
status: New → Triaged
milestone: none → 2.4.0alpha1
no longer affects: juju/2.3
tags: added: internal
John A Meinel (jameinel)
Changed in juju:
importance: Critical → Medium
Chris Gregan (cgregan)
Changed in maas:
status: Triaged → Invalid
Changed in juju:
status: Triaged → Invalid
Revision history for this message
Christian Reis (kiko) wrote :

Chris, can you give me links to the two follow-on bugs for this one (the foundation bug on us detecting this in a lightweight manner, and the bug against maas/juju/lxd in correctly detecting this sort of mismatch)?

Revision history for this message
Christian Reis (kiko) wrote :

The Foundation bugs are private, but for the record: https://bugs.launchpad.net/cpe-foundation/+bug/1733946 and https://bugs.launchpad.net/cpe-foundation/+bug/1736694

We are still missing an upstream bug (MAAS/Juju/LXD) which I'll file now.

Revision history for this message
Christian Reis (kiko) wrote :
Revision history for this message
Drew Freiberger (afreiberger) wrote :
Download full text (7.3 KiB)

I just found an incidence of this.

It appears that it's possible that the vlan definition MTU can be mismatched with the subnet definition of the MTU. This seems the root bug for my case.

$ maas admin networks read
...
    {
        "description": "ceph-access",
        "vlan_tag": 4004,
        "resource_uri": "/MAAS/api/2.0/networks/subnet-6/",
        "name": "subnet-6",
        "netmask": "255.255.0.0",
        "default_gateway": null,
        "dns_servers": [],
        "ip": "10.220.0.0"
    },
...
$ maas admin subnet read subnet-6
{
    "rdns_mode": 2,
    "allow_proxy": true,
    "id": 6,
    "dns_servers": [],
    "name": "ceph-access",
    "cidr": "10.220.0.0/16",
    "space": "ceph-access-space",
    "gateway_ip": null,
    "resource_uri": "/MAAS/api/2.0/subnets/6/",
    "vlan": {
        "vid": 4004,
        "mtu": 9000,
        "fabric": "default",
        "relay_vlan": null,
        "fabric_id": 9,
        "id": 5015,
        "name": "",
        "secondary_rack": null,
        "primary_rack": null,
        "space": "ceph-access-space",
        "external_dhcp": null,
        "resource_uri": "/MAAS/api/2.0/vlans/5015/",
        "dhcp_on": false
    },
    "active_discovery": false,
    "managed": true
}
$ maas admin spaces read
{
        "vlans": [
            {
                "vid": 4004,
                "external_dhcp": null,
                "relay_vlan": null,
                "name": "",
                "fabric": "default",
                "primary_rack": null,
                "resource_uri": "/MAAS/api/2.0/vlans/5015/",
                "fabric_id": 9,
                "secondary_rack": null,
                "dhcp_on": false,
                "id": 5015,
                "space": "ceph-access-space",
                "mtu": 9000
            }
        ],
        "subnets": [
            {
                "allow_proxy": true,
                "name": "ceph-access",
                "gateway_ip": null,
                "vlan": {
                    "vid": 4004,
                    "external_dhcp": null,
                    "relay_vlan": null,
                    "name": "",
                    "fabric": "default",
                    "primary_rack": null,
                    "resource_uri": "/MAAS/api/2.0/vlans/5015/",
                    "fabric_id": 9,
                    "secondary_rack": null,
                    "dhcp_on": false,
                    "id": 5015,
                    "space": "ceph-access-space",
                    "mtu": 9000
                },
                "active_discovery": false,
                "dns_servers": [],
                "resource_uri": "/MAAS/api/2.0/subnets/6/",
                "cidr": "10.220.0.0/16",
                "rdns_mode": 2,
                "id": 6,
                "space": "ceph-access-space",
                "managed": true
            }
        ],
        "id": 2,
        "name": "ceph-access-space",
        "resource_uri": "/MAAS/api/2.0/spaces/2/"
    },

.....

    {
        "vlans": [
            {
                "vid": 4004,
                "external_dhcp": null,
                "relay_vlan": null,
                "name": "4004",
                "fabric": "fabric-0",
       ...

Read more...

tags: added: canonical-bootstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.