Packets dropped in br-int bridge

Bug #2067851 reported by mark zhang

This bug report will be marked for expiration in 53 days if no further activity occurs. (find out why)

10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Incomplete
Undecided
Unassigned

Bug Description

Hello,

We have two VMs that communicate with each other. Like VM-A and VM-B, on different hosts, host-A and host-B.

Now, we encountered one problem:
VM-A sends one burst packets like 300K packets(each packet is less than 100 bytes) per seconds to VM-B. But VM-B could only receive about 100K packets. The burst last about 2 seconds.

We did tracing where the packet got dropped. And found on host-B, the virtual port "patch-tun" in br-int could receive all packets.
But the "qvoxxx" interface on br-int couldn't receive all packets.

We tried to dump: ovs-appctl bridge/dump-flows br-int and ovs-ofctl dump-ports br-int.
But no significant drop counter could match such number of the packets drop.

Need your expertise to check where could the packets got lost. Any configuartion could impact on such situation?

Thanks,
Mark

Tags: ovs
Changed in neutron:
status: New → Incomplete
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Mark:

The first thing I would propose you is to switch from hybrid plug to native. That will improve the performance avoiding the veth pair to a Linux Bridge to plug the TAP port there. With native plug type you'll connect the TAP device directly to the OVS br-int. It could be possible that the packet lost is happening in the Linux Bridge.

In order to change this, you should swap from "firewall_driver=iptables_hybrid" to "firewall_driver=openvswitch" or None. Then you'll need to restart the agent and re-plug the VMs (stop or migrate them).

That will, for sure, improve the performance and will make shorter the datapath. Now OVS will be the only backend levering the data packets.

Regards.

Revision history for this message
mark zhang (mzhan017) wrote :

Hello Rodolfo,
Thanks a lot for the response and suggestion.

I checked the configuration of firewall_driver from our lab.
The parameter "firewall_driver" was not set now, on host:
/etc/neutron/plugins/ml2/linuxbridge_agent.ini
# Driver for security groups firewall in the L2 agent (string value)
#firewall_driver = <None>

Following is the ovs version of lab
[cbis-admin@overcloud-sriovperformancecompute-qd-sbc-cbis3-7 etc]$ rpm -qa | grep openvs
openvswitch-2.11.0-4.el7.x86_64
[cbis-admin@overcloud-sriovperformancecompute-qd-sbc-cbis3-7 etc]$ rpm -qa | grep neutron
openstack-neutron-linuxbridge-13.0.4-0.20190523134154.4352e82.el7.noarch
openstack-neutron-lbaas-13.0.1-0.20190510224355.74dddcb.el7.noarch
openstack-neutron-common-13.0.4-0.20190523134154.4352e82.el7.noarch
openstack-neutron-13.0.4-0.20190523134154.4352e82.el7.noarch

Thanks,
Mark

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Mark:

Now I'm surprised: in the bug description, you talked about using OVS tools to debug the packet drop, but you are using Linux Bridge ML2 driver. With this ML2 driver, you are not using OVS at all, so the debuging tools should be the kernel ones.

My recommendation is, if possible, to move to ML2/OVS (or ML2/OVN). ML2/LinuxBridge is now marked as "not supported". The code will be present in the repository and available, but no further improvements nor bug fixing will be done.

Another question, reading the name of the host: are you using SR-IOV on these VMs? Is the port affected using SR-IOV or Linux Bridge?

Regards.

Revision history for this message
mark zhang (mzhan017) wrote :

Hello Rodolfo,
Thanks a lot for the response.

Sorry for the misleading, I just had tried to find the parameter "firewall_driver" by grep command under directory /etc/. And just found one existence of the string in /etc/neutron/plugins/ml2/linuxbridge_agent.ini.

I'm sure we are using the ovs-agent. And did not set the parameter "firewall_driver". If the parameter was not set, how the firewall rule applied?

[cbis-admin@overcloud-sriovperformancecompute-qd-sbc-cbis3-7 (overcloudrc) ml2]$ ps -ef | grep neutron
root 30116 1 0 2023 ? 00:00:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root 30133 30116 0 2023 ? 02:42:25 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

42435 728166 728142 0 Jun07 ? 00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh
42435 728248 728224 0 Jun07 ? 01:02:32 /usr/bin/python2 /usr/bin/neutron-sriov-nic-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/sriov_agent.ini --config-dir /etc/neutron/conf.d/common --log-file=/var/log/neutron/sriov-nic-agent.log

42435 728329 728166 0 Jun07 ? 01:15:25 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --log-file=/var/log/neutron/openvswitch-agent.log

root 728447 728248 0 Jun07 ? 00:00:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root 728449 728447 0 Jun07 ? 00:05:50 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root 728594 728329 0 Jun07 ? 00:00:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root 728595 728594 0 Jun07 ? 00:00:02 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

Thanks,
Mark

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.