Metadata is broken with dpdk bonding, jumbo frames and metadata from qdhcp

Bug #1833713 reported by Liam Young
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Neutron Open vSwitch Charm
Triaged
High
Unassigned
dpdk (Ubuntu)
Triaged
High
Unassigned

Bug Description

In this bionic queens deployment guests are failing to get metadata. The mtu of the provider network is set to 9000. There is no gateway or dvr in the deployment and metadata is served from qdhcp namespace.

$ openstack network show user_net -c mtu
+-------+-------+
| Field | Value |
+-------+-------+
| mtu | 9000 |
+-------+-------+

Which causes the guest to set its mtu to 9000 and the interface inside the qdhcp namespace to have its mtu set to 9000:

ip netns exec qdhcp-7778a98d-0042-4fd6-bc63-965938bda5ee ip -0 -o a | grep tap
36: tap18bd733f-68: <BROADCAST,PROMISC,UP,LOWER_UP> mtu 9000

When the guest tries to retrieve its metadata the request hangs. Running a tcpdump on the guest shows

Terminal 1 on the guest:
$ timeout 20 curl http://169.254.169.254/openstack/2017-02-22/meta_data.json

Terminal 2 on the guest:
# tcpdump -i any -A -l port 80 2>&1 | grep truncated
12:35:05.831836 IP truncated-ip - 480 bytes missing! 169.254.169.254.http > host-172-20-0-6.openstacklocal.34500: Flags [P.], seq 735921054:735922986, ack 3566155364, win 210, options [nop,nop,TS val 4061460628 ecr 996292619], length 1932: HTTP: HTTP/1.1 200 OK
12:35:07.495835 IP truncated-ip - 480 bytes missing! 169.254.169.254.http > host-172-20-0-6.openstacklocal.34500: Flags [P.], seq 0:1932, ack 1, win 210, options [nop,nop,TS val 4061462292 ecr 996292619], length 1932: HTTP: HTTP/1.1 200 OK
12:35:10.919861 IP truncated-ip - 480 bytes missing! 169.254.169.254.http > host-172-20-0-6.openstacklocal.34500: Flags [P.], seq 0:1932, ack 1, win 210, options [nop,nop,TS val 4061465716 ecr 996292619], length 1932: HTTP: HTTP/1.1 200 OK
12:35:17.575917 IP truncated-ip - 480 bytes missing! 169.254.169.254.http > host-172-20-0-6.openstacklocal.34500: Flags [P.], seq 0:1932, ack 1, win 210, options [nop,nop,TS val 4061472372 ecr 996292619], length 1932: HTTP: HTTP/1.1 200 OK

Dropping the mtu on the tap device to 1500 fixes the issue (and higher than 1500 and it breaks):
# ip netns exec qdhcp-7778a98d-0042-4fd6-bc63-965938bda5ee ip link set dev tap18bd733f-68 mtu 1500

Connecting to the guest from outside of the qdhcp namespace seems to be fine. Sending a file via netcat works from outside of the qdhcp netns but breaks when tested from within it. Again tcpdump shows truncated-ip messages

Revision history for this message
Liam Young (gnuoy) wrote :

Hitting the same bug in stein with 18.11-6~cloud0 of dpdk

Revision history for this message
Liam Young (gnuoy) wrote :

Also worth noting these deploys are using a dpdk bond in ovs.

Revision history for this message
Liam Young (gnuoy) wrote :

Given the above I'm am going to mark this as affecting the dpdk package rather than the charm

Revision history for this message
Liam Young (gnuoy) wrote :

At some point when I was attempting to simplify the test case I dropped setting the mtu on the dpdk devices via ovs so the above test is invalid. I've marked the bug against dpdk as invalid while I redo the tests.

Changed in dpdk (Ubuntu):
status: New → Invalid
Revision history for this message
Liam Young (gnuoy) wrote :
Download full text (3.9 KiB)

Ubuntu: eoan
DPDK pkg: 18.11.1-3
OVS DPDK pkg: 2.11.0-0ubuntu2
Kerenl: 5.0.0-20-generic

If a server has an ovs bridge with a dpdk device for external
network access and a network namespace attached then sending data out of
the namespace fails if jumbo frames are enabled.

Setup:

root@node-licetus:~# uname -r
5.0.0-20-generic

root@node-licetus:~# ovs-vsctl show
523eab62-8d03-4445-a7ba-7570f5027ff6
    Bridge br-test
        Port "tap1"
            Interface "tap1"
                type: internal
        Port br-test
            Interface br-test
                type: internal
        Port "dpdk-nic1"
            Interface "dpdk-nic1"
                type: dpdk
                options: {dpdk-devargs="0000:03:00.0"}
    ovs_version: "2.11.0"

root@node-licetus:~# ovs-vsctl get Interface dpdk-nic1 mtu
9000

root@node-licetus:~# ip netns exec ns1 ip addr show tap1
12: tap1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 0a:dd:76:38:52:54 brd ff:ff:ff:ff:ff:ff
    inet 10.246.112.101/21 scope global tap1
       valid_lft forever preferred_lft forever
    inet6 fe80::8dd:76ff:fe38:5254/64 scope link
       valid_lft forever preferred_lft forever

* Using iperf to send data out of the netns fails:

root@node-licetus:~# ip netns exec ns1 iperf -c 10.246.114.29
------------------------------------------------------------
Client connecting to 10.246.114.29, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 3] local 10.246.112.101 port 51590 connected with 10.246.114.29 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.3 sec 323 KBytes 257 Kbits/sec

root@node-hippalus:~# iperf -s -m
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
root@node-hippalus:~#

* Switching the direction of flow and sending data into the namespace works:

root@node-licetus:~# ip netns exec ns1 iperf -s -m
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 10.246.112.101 port 5001 connected with 10.246.114.29 port 59454
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 6.06 GBytes 5.20 Gbits/sec
[ 4] MSS size 8948 bytes (MTU 8988 bytes, unknown interface)

root@node-hippalus:~# iperf -c 10.246.112.101
------------------------------------------------------------
Client connecting to 10.246.112.101, TCP port 5001
TCP window size: 942 KByte (default)
------------------------------------------------------------
[ 3] local 10.246.114.29 port 59454 connected with 10.246.112.101 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 6.06 GBytes 5.20 Gbits/sec

* Using iperf to send data out of the netns after dropping tap mtu works:

root@node-licetus:~# ip netns exec ns1 ip link set dev tap1 mtu 1500
root@node-licetus:~# ip netns exec ns1 iperf -c 10.246.114.29
---------------------------...

Read more...

Liam Young (gnuoy)
Changed in dpdk (Ubuntu):
status: Invalid → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Liam,
all I can say about this area is that I heard from several others that MTU+Bond was very broken in the past - I haven't heard an update on that for a while. Thanks for trying 18.11.2 in that regard.

I've read through the case about but nothing obvious came up for me to try or fix.

I think your intention (from your mail) to write to upstream about this is absolutely the right approach. Please let me know here if you got anything back that would help to consider to be added.

Revision history for this message
Liam Young (gnuoy) wrote :

Hi Christian,
    Thanks for your comments. I'm sure you spotted it but just to make it clear, the issue occurs with bonded and unbonded dpdk interfaces. I've emailed upstream here *1.

Thanks
Liam

*1 https://mail.openvswitch.org/pipermail/ovs-discuss/2019-July/048997.html

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

No reply yet in the ML.

Revision history for this message
James Page (james-page) wrote :

I'm marking this bug as triaged - we can reproduce the issue outside of openstack.

Changed in charm-neutron-openvswitch:
status: New → Triaged
Changed in dpdk (Ubuntu):
status: New → Triaged
Changed in charm-neutron-openvswitch:
importance: Undecided → High
Changed in dpdk (Ubuntu):
importance: Undecided → High
Changed in charm-neutron-openvswitch:
milestone: none → 19.10
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

I'm hitting this bug in a client installation, Bionic / Queens. Just spent a few hours debugging and in the end came to exactly the same tests and conclusion.

Using DPDK, using bond for dpdk, using isolated metadata (as this is provider only networks). I *can* send data to the netns up to 9000 and it gets there intact. Only packets going out are truncated to a little more than 1500 bytes (IIRC about 1504 (ping works up to -s1476).

Checked every mtu on every port in the path of the netns to ovs to dpdk to external switch, back to other node dpdk/ovs/virtual machine and none of them seem wrong. Tcpdump on both the netns and the target vm show packet leaving ok in the netns but arriving truncated on destination.

I manually set all MTUs in qdhcp namespaces tp 1500 and the problem is gone. Not sure about any consequences of this, though.

Funny thing is that this problem did not appear with charm neutron-openvswitch-next-359 but appeared after an upgrade to neutron-openvswitch-next-367 *and* a compute host reboot. The reason for us using this charms are related to other bugs that were fixed there, and the upgrade was to finally fix one last bug about TCP checksum corruption inside the netns (all this is explained in https://bugs.launchpad.net/neutron/+bug/1832021/ ).

Well, at least the client deployed more than 70 VMs without problem, I upgraded the charm about two weeks ago (things kept apparently ok) and a few days ago I rebooted some compute nodes because of an unrelated problem and this behavior appeared.

David Ames (thedac)
Changed in charm-neutron-openvswitch:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-neutron-openvswitch:
milestone: 20.01 → 20.05
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Does anybody know if upstream responded elsewhere? https://mail.openvswitch.org/pipermail/ovs-discuss/2019-July/048997.html shows no thread reply.

Wouldn't it be best to open a bug instead?

David Ames (thedac)
Changed in charm-neutron-openvswitch:
milestone: 20.05 → 20.08
Revision history for this message
Andrea Ieri (aieri) wrote :

Marking as field-high as this now affects a live cloud, and the workaround (lowering the MTU within the qdhcp namespaces) isn't fully persistent.

Revision history for this message
James Page (james-page) wrote :

bug 1832021 looks very similar and some changes landed across releases this month to disable checksum calcs for veth interfaces when in use with DPDK (stein/train/ussuri/master branches).

Revision history for this message
James Page (james-page) wrote :

I think this is related to bug 1831935 - where the recommendation is that the use of veth is disabled in DPDK deployments to avoid the checksumming issues - see the 'ovs-use-veth' configuration option. This is a breaking change but the charm should stop you doing anything that will break things.

That said there are inflight fixes to disable checksumming when veth is in use via Neutron to ensure that either configuration works.

James Page (james-page)
Changed in charm-neutron-openvswitch:
milestone: 20.08 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.