multiple type networks with different mtu cause vm connectivity problem

Bug #1845603 reported by Yang Li
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

In my environment, vlan and vxlan are both exist, and their mtu is 1500 and 1450.
I use vlan network to create a instance-A, and use vxlan network to create a instance-B, the instance-A and instance-B are in the same compute nodes.

Then I found the mtu of br-int is 1450, this will cause instance-A'iperf result is 0 in , because instance-A's mtu 1500 is large than 1450.

I did some investigation, found that the ovs bridge's mtu is decided by the minimum mtu tap device which belong to ovs bridge, for example, the instance-A's tap mtu is 1500, the instance-B's mtu is 1450, then br-int's mtu will be set to 1450 automatically.

Seems the neutron doesn't support this scenario which vlan and vxlan both exist in ovs environment, I'm not sure if it's a bug.

I have a workaround to solve this problem, use command "ovs-vsctl set int br-int mtu_request=1500", then the br-int will be always 1500, the instance-A's iperf program worked fine.

Yang Li (yang-li)
summary: - multiple networks with different mtu cause network connectivity problem
+ multiple type networks with different mtu cause vm connectivity problem
description: updated
description: updated
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Thank you for your bug report!

Could you please describe exactly what you're doing that leads to packet loss? The configuration and the commands issued (where, what). The OpenStack version you use. While Neutron is far from bug-free, using vlans and tunneled networks together is such a basic use case it's hard to believe we could have broken it without the CI catching it. So I suspect some missing configuration or maybe a usage misunderstanding. But of course it can be a real bug - in that case please let us know how to reproduce it.

Also are you familiar with these configuration settings?

https://docs.openstack.org/neutron/stein/admin/config-mtu.html

Changed in neutron:
status: New → Incomplete
Revision history for this message
Yang Li (yang-li) wrote :

hi, my OpenStack version is newton, and the mtu related configuration is:
neutron.conf
[DEFAULT]
advertise_mtu=True
global_physnet_mtu = 1500

plugins/ml2/ml2_conf.ini
[ml2]
path_mtu = 1500

I think these parameters are for underlay-network(vlan) and overlay-network(vxlan/gre), if they are both set to 1500, the vlan network mtu is 1500, the vxlan network mtu is 1450(1500 - header).
if global_physnet_mtu is 1500, path_mtu is 1450, then vlan network is still 1500, but vxlan network is 1400 (min([1500, 1450]) - header).

Actually, this problem was introduced in https://bugs.launchpad.net/nova/+bug/1623876 in newton version. It will set tap device equal to network mtu. I thins it doesn't consider about different network type VMs in same compute node.

Revision history for this message
Yang Li (yang-li) wrote :

I think the best way is modify the ovs-agent to set br-int mtu equal to global_physnet_mtu.

Revision history for this message
Yang Li (yang-li) wrote :

In the stein version, the problem still exists, the br-int's mtu is 1450, equal to mtu of tap46ff32f8-da, less than mtu of tapaa2ff83a-fd, then the VM which tapaa2ff83a-fd belong to will has problem in large packet transmission situation.

root@devstack-controller:~# ifconfig br-int | grep MTU
          BROADCAST MULTICAST MTU:1450 Metric:1
root@devstack-controller:~# ifconfig tap46ff32f8-da | grep MTU
          UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
root@devstack-controller:~# ifconfig tapaa2ff83a-fd | grep MTU
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi,

I was trying to reproduce issue described by You but I couldn't (on master branch).
So what I did is all in http://paste.openstack.org/show/780663/ - as You can see there was no problem with this MTU there. When I removed vm connected to network with mtu 1450, results were exactly the same.
So I don't think that this value of MTU set on br-int interface really matters.

Yang Li (yang-li)
Changed in neutron:
status: Incomplete → Invalid
Revision history for this message
Yang Li (yang-li) wrote :

After more test, seems the br-int's mtu doesn't matter, the tap device's mtu is the root cause in my environment.
In my environment, the nova code only contains https://review.opendev.org/#/c/370681/, but this is a workaround way and it will cause vlan tap device be 1450, but the next two patch are not cherry-picked from the last version, these patches will fix this problem.
https://review.opendev.org/#/c/370667/
https://review.opendev.org/#/c/370679/

Now, set this bug to be invalid.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx for futher investigation and for closing that issue Yang Li :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.