Canonical network has non-working PTMU, add MSS clamping

Bug #1572026 reported by Martin Pitt
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Fix Released
Undecided
Martin Pitt

Bug Description

Apparently lxdbr0 gets set up with the standard MTU of 1500 by default. However, in our data center interfaces have a lower one: Scalingstack instances have eth0 with MTU 1400, and on my local laptop the OpenVPN tun0 even has 1194 only.

This leads to the network in lxd containers being broken by default; while ping and mtr work, apt-get update is hanging forever. The main issue with this is that very few people are even aware of an MTU impedance mismatch, so debugging this always takes ages. It took me a while until I added this to my own autopkgtest stuff (https://git.launchpad.net/~ubuntu-release/+git/autopkgtest-cloud/tree/tools/armf-lxd-slave.userdata#n98):

     lxc profile device remove default eth0
     lxc profile device add default eth0 nic nictype=bridged parent=lxdbr0 mtu=1400

and Michael just ran into it again with juju tests in bug 1571082.

Can we be more clever here and configure a default MTU on lxdbr0 which matches the one from the host's default route?

Revision history for this message
Stéphane Graber (stgraber) wrote :

That's not how MTU works.

Your host should fragment as needed when routing, in fact it does that perfectly well for me.

The only case where you need to lower your bridge MTU is if you were to bridge your host's eth0 or tun0 into lxdbr0, but based on the above, that's not what you're doing at all.

It is perfectly normal for there to be varying MTU on the internet, in fact when accessing a lot of websites, the target server is using a lower MTU than your machine's, yet everything still works fine and you don't have to lower your eth0 MTU to whatever MTU the website you're trying to reach is using.

root@snappy:~# tracepath vorash.stgraber.org
 1?: [LOCALHOST] pmtu 1500
 1: 10.178.150.1 0.075ms
 1: 10.178.150.1 0.056ms
 2: sateda.lan.mtl.stgraber.net 0.313ms
 3: sateda.lan.mtl.stgraber.net 0.392ms pmtu 1486
 3: 206.248.154.104 15.269ms asymm 4
 4: ae2-2150-bdr01-tor2.teksavvy.com 14.568ms
 5: ae1-2170-bdr01-tor.teksavvy.com 14.771ms asymm 4
 6: no reply
 7: be10-1215.bhs-g2-a9.qc.ca 30.082ms
 8: vl20.bhs-g2-a75.qc.ca 22.711ms
 9: be50-7.bhs-3a-a9.qc.ca 23.463ms
10: vorash.stgraber.org 22.773ms reached
     Resume: pmtu 1486 hops 10 back 10

As you can see, lxdbr0 is 1500, an intermediary router sets it to 1486 and everything works.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Martin Pitt (pitti) wrote :

It most definitively does not work in the data center, or maybe it does not work with apt's http backend. Is there some kernel parameter/sysctl which needs to be twiddled for that? Or any other package to install?

Revision history for this message
Stéphane Graber (stgraber) wrote :

Nope, that stuff works out of the box, otherwise everyone with a DSL connection at home (MTU == 1492) would be running into this problem.

Any chance you can run a tracepath to see what's going on?

Revision history for this message
Martin Pitt (pitti) wrote :

Steps:

 - nova boot daily xenial cloud image (ubuntu/ubuntu-xenial-daily-amd64-server-20160418-disk1.img), ssh in
 - set up proxy in /etc/environment and restart lxd
 - ip a says

2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP group default qlen 1000
3: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000

 - sudo lxd init; configure bridge for 10.0.8.1/24, IPv4 NAT, no IPv6

 - Now lxdbr0 does not exist at all any more (?? → bug?), but "sudo service lxd-bridge restart" brings it up:

4: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:4c:fa:1a:a8:84 brd ff:ff:ff:ff:ff:ff
    inet 10.0.8.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::504c:faff:fe1a:a884/64 scope link
       valid_lft forever preferred_lft forever

 - lxc launch images:ubuntu/xenial/amd64 x1

 - $ lxc exec x1 apt update
   Hit:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
   0% [Waiting for headers]

   This hangs eternally.

 - Temporarily lower MTU to 1400 to be able to install iputils-tracepath, then crank it back to 1500

 - root@x1:~# tracepath security.ubuntu.com
 1?: [LOCALHOST] pmtu 1500
 1: 10.0.8.1 0.072ms
 1: 10.0.8.1 0.060ms
 2: 10.0.8.1 0.173ms pmtu 1400
 2: 10.220.40.1 0.620ms
 3: 10.220.0.1 1.021ms
 4: gooseberry-eth1.internal 1.964ms
 5: no reply
 6: no reply
 7: no reply
 8: no reply
 9: no reply

(one more "no reply" about every second)

  However, "ifconfig eth0 mtu 1400" doesn't change tracepath, but it makes "apt update" work.

Revision history for this message
Stéphane Graber (stgraber) wrote :

The lxd bridge only starts when lxd starts, lxd starts through socket activation.

Revision history for this message
Martin Pitt (pitti) wrote :

Turns out this is almost surely due to a too limited firewall that filters out PTMU ICMP packets in the Canonical VPN/network. Thus the "please lower your MTU" messages never make it into the scalingstack instance and into the container, and packets > 1400 bytes just end in the void. I'll write an RT about this and add

  iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

to the instance setup scripts as a workaround until then.

Thanks Stéphane for explaining this!

Revision history for this message
Martin Pitt (pitti) wrote :

So this is ultimately a bug in Canonical's network (I just reported https://rt.admin.canonical.com/Ticket/Display.html?id=90771, limited to Canonical folks), but I'll add the MSS clamping into the setup of our testbeds as a better workaround.

affects: lxd (Ubuntu) → auto-package-testing
Changed in auto-package-testing:
assignee: nobody → Martin Pitt (pitti)
status: Invalid → In Progress
summary: - be more clever about MTU of lxdbr0
+ Canonical network has non-working PTMU, add MSS clamping
Revision history for this message
Martin Pitt (pitti) wrote :

While the RT about the broken PTMU is still pending, I committed https://git.launchpad.net/~ubuntu-release/+git/autopkgtest-cloud/commit/?id=1bd5890c2 to enable client-side MSS clamping as a workaround.

This obsoletes the need to reconfigure bridges with a lower MTU in tests or in the armhf lxd slave setup (see https://git.launchpad.net/~ubuntu-release/+git/autopkgtest-cloud/commit/?id=1aabcda)

Changed in auto-package-testing:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.