OpenStack SDN corrupts a few networking packets under load

Bug #1504265 reported by Fraser
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Ihar Hrachyshka

Bug Description

It appears that some combination of OpenStack SDN under load can silently corrupt exactly 50 bytes of TCP payload data. Only observed with VMs that run on the OpenStack control node which is apparently out of line with OpenStack best practices for production.

Environment was RDO OpenStack Kilo, CentOS 7, Neutron/Openvswitch. Openvswitch Neutron plugin version is 2015.1.0-1.el7, linux is SLES11SP3 (kernel 3.0.76) 64 bit on VMware.

Steps To Reproduce:

Generate high network traffic.

From the sender:

<pre>
sender:~ $ dd if=/dev/zero count=1000000 bs=1000 | netcat 172.16.12.100 139
1000000+0 records in
1000000+0 records out
1000000000 bytes (1.0 GB) copied, 12.5841 s, 79.5 MB/s
^C

From the receiver:

Out of laziness using SMB port so shutdown Samba on the receiver before repro'ing.

If OpenStack does not corrupt the networking stream:

receiver:~ $ netcat -l -p 139 > out.bin; hexdump -C out.bin
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
1dcd6500

If OpenStack *does* corrupt the packets over the network, the corruption will be shown in the hexdump. Out of 10 tries, a few attempts yielded corruption so it is readily reproducable, but not that often. Two examples:

receiver:~ $ netcat -l -p 139 > out.bin; hexdump -C out.bin
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
0ac9db60 00 00 00 00 00 00 61 35 61 34 39 30 34 34 62 66 |......a5a49044bf|
0ac9db70 33 30 37 33 32 36 36 35 31 31 37 36 61 38 31 66 |307326651176a81f|
0ac9db80 32 35 32 61 33 35 35 39 31 34 30 31 39 63 33 33 |252a355914019c33|
0ac9db90 32 35 31 38 38 64 62 61 00 00 00 00 00 00 00 00 |25188dba........|
0ac9dba0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
3b9aca00

receiver:~ $ netcat -l -p 139 > out.bin; hexdump -C out.bin
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
078cf070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70 ce |..............p.|
078cf080 d0 31 40 0f 77 34 f7 23 02 75 10 77 d5 5b 2f 11 |.1@.w4.#.u.w.[/.|
078cf090 b8 61 c3 4f 15 30 7f 10 c0 39 96 b5 bb f1 bc 5c |.a.O.0...9.....\|
078cf0a0 ea d7 2a de 69 80 9c d3 4a b3 24 60 67 03 8e a5 |..*.i...J.$`g...|
078cf0b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
07910130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 36 39 |..............69|
07910140 34 65 37 35 37 65 39 39 63 35 61 63 64 65 36 64 |4e757e99c5acde6d|
07910150 63 34 33 35 34 33 37 66 62 39 39 35 38 63 38 61 |c435437fb9958c8a|
07910160 39 36 66 35 36 30 31 38 63 62 35 34 61 33 65 34 |96f56018cb54a3e4|
07910170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
32a57a20 00 00 00 00 00 00 32 31 33 31 39 61 36 64 33 33 |......21319a6d33|
32a57a30 38 66 34 63 37 35 64 39 63 65 34 39 33 63 32 61 |8f4c75d9ce493c2a|
32a57a40 62 33 35 61 61 33 31 64 61 35 31 31 35 61 30 34 |b35aa31da5115a04|
32a57a50 35 66 62 39 31 34 39 34 00 00 00 00 00 00 00 00 |5fb91494........|
32a57a60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
3b9aca00
</pre>

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Could you please attach your configuration and debug log files for the server and the ovs agent?

Also, please provide ovs flow output.

Changed in neutron:
importance: Undecided → Medium
tags: added: juno-backport-potential kilo-backport-potential liberty-rc-potential ovs
tags: added: needs-attention
Changed in neutron:
status: New → Incomplete
Akihiro Motoki (amotoki)
tags: added: liberty-backport-potential
removed: liberty-rc-potential
Alan Pevec (apevec)
tags: removed: juno-backport-potential
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
Juha Tiensyrjä (juha-tiensyrja) wrote :

We seem to be hitting this same issue. Environment:
 - RDO OpenStack Kilo
 - CentOS 7
 - Neutron/Openvswitch
 - Openvswitch Neutron plugin version is 2015.1.1-1.el7
 - Kernel 3.10.0-229.20.1.el7.x86_64
 - Network node running on top of VMware

Revision history for this message
Juha Tiensyrjä (juha-tiensyrja) wrote :

Also, we have found out that the issue only happens for us when the connections go through a load balancer (haproxy) running on the networking node. Connections between instances between compute nodes work fine.

Changed in neutron:
status: Expired → Confirmed
Revision history for this message
Fraser (fraser-hanson+u1) wrote :

I spoke to the dev who ran the test that hit this at our site, and here's what it turned out to be in our case:

The problem is caused by the use of VXLAN/GRE tunnels which add 50+ bytes to the size of the IP frame. When a VM sends a 1500 byte packet (standard MTU size) the VXLAN/GRE encapsulation is added to the packet size increasing it past the 1500 byte MTU of the host which then causes the packet to be fragmented. Typically the documentation will have a note somewhere that the MTU should be decreased to 1400 inside of all VMs.

https://www.rdoproject.org/networking/using-gre-tenant-networks/#MTU

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

OK, in Kilo, we have path_mtu option in ml2 plugin that allows to calculate network MTU based on tenant network type. Please try to set it to your physical infrastructure MTU (1500), recreate networks (or update network table mtu fields with proper values based on tunneling overhead, f.e. 1450 for vxlan networks; sadly, there is no way to reset those MTU values thru API) , and retry.

When retrying, please also configure your dhcp agent to have advertise_mtu = True. In that case, the service will attach DHCP option with proper MTU to all allocations, and your instances will then use the value for their network interfaces.

Note that DHCP agent feature does not work for networks with no DHCP enabled (specifically, stateless IPv6 networks are still affected).

You can obviously use some other mechanism to make sure your instances use the proper MTU (f.e. modifying running images to run a script for that).

You may also want to look at setting network_device_mtu for L3 and DHCP agents to 1450.

Note that in Mitaka, we try to fix this MTU mess in several ways. F.e. we already enabled advertise_mtu = True by default, we set path_mtu = 1500 too; we added a feature to propagate MTU values using RA packets sent for IPv6 networks.

We also work on other fixes to allow for more smooth MTU setup.

Changed in neutron:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
Revision history for this message
Juha Tiensyrjä (juha-tiensyrja) wrote :

Thank you, the issue indeed was fixed by setting a correct MTU on the instances via a DHCP option.

We seemed to have hit a bug in (probably) haproxy HTTP mode, which sometimes corrupted some data when the MTU was not correct and fragmentation occurred. We didn't see that behavior when using haproxy LBAAS in TCP mode, but unfortunately I haven't been able to investigate the issue any deeper.

Changed in neutron:
milestone: none → mitaka-3
Changed in neutron:
milestone: mitaka-3 → mitaka-rc1
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
tags: removed: kilo-backport-potential liberty-backport-potential
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

We can't backport the fix since it changes default configuration values, which is against stable policy.

Changed in neutron:
status: Confirmed → Fix Released
milestone: mitaka-rc1 → mitaka-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.