[ml2][ovs] ports tag are missing and flood on those

Bug #1952567 reported by LIU Yulong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
LIU Yulong

Bug Description

During some ml2 ovs agent port processing performance test, we noticed
that some ports are missing tag before it really done processing. While
ovs treats those ports without tag as trunk port, so some packets will
be flooded to it. In large scale cloud, if too many port added to the
bridge, the ovs-vswitchd will consume a huge amount of CPU cores if
ports are not bound in a short time.

Another potential problem is openflow security group may not get processed during the first created event.

Upstream test failures of waiting too long time to ping some cases, may be related to these problems.

Tags: ovs
LIU Yulong (dragon889)
Changed in neutron:
importance: Undecided → High
status: New → In Progress
assignee: nobody → LIU Yulong (dragon889)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/819567

tags: added: ovs
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/819567
Committed: https://opendev.org/openstack/neutron/commit/c63ebef2d58e15f4388cf064066f77b503a2f841
Submitter: "Zuul (22348)"
Branch: master

commit c63ebef2d58e15f4388cf064066f77b503a2f841
Author: LIU Yulong <email address hidden>
Date: Mon Nov 29 12:27:23 2021 +0800

    Add tag to port more earlier

    During some ml2 ovs agent port processing performance test, we noticed
    that some ports are missing tag before it really done processing. While
    ovs treats those ports without tag as trunk port, so some packets will
    be flooded to it. In large scale cloud, if too many port added to the
    bridge, the ovs-vswitchd will consume a huge amount of CPU cores if
    ports are not bound in a short time.

    So, in the port_bound function of ovs-agent, we set the port tag to
    it after a local_vlan id is allocated. Because after that, setup
    security groups (setup_port_filters) and bind devices in DB
    (update_device_list) are really time-consuming.

    And also fix a potential bug, port is processed as created first,
    but no tag in ovsdb, so openflow security group will not be processed
    successfully [1]. It must be done in a update event during next loop,
    after port bound and ovsdb set the required value.

    This patch can also fix some upstream test failures of waiting too
    long time to ping some cases.

    [1] https://github.com/openstack/neutron/blob/master/neutron/agent/linux/openvswitch_firewall/firewall.py#L112

    Closes-Bug: #1952567
    Change-Id: I3533f0d416d32f8d0888ad58f975960d89a985d9

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/838445

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/838445
Committed: https://opendev.org/openstack/neutron/commit/c4adec924a1f189fbe5750f1b413db0d0f23c39a
Submitter: "Zuul (22348)"
Branch: master

commit c4adec924a1f189fbe5750f1b413db0d0f23c39a
Author: LIU Yulong <email address hidden>
Date: Tue Apr 19 15:17:19 2022 +0800

    Remove useless function _add_port_tag_info

    This reverts commit: b83fedbd78a441cf34d53dba35a3ccff7d8f4ac5.

    Since port is set to dead by default after the commits of:
    7aae31c9f9ed938760ca0be3c461826b598c7004
    0ddca284542aed89df4a22607a2da03f193f083c

    And we add the local vlan tag to the port right after it is
    bound to aviod trunk port flood issue:
    c63ebef2d58e15f4388cf064066f77b503a2f841

    So that _add_port_tag_info function is not necessary anymore,
    and we will save a large OVSDB read action which is dumping
    the entire table of Port, for hosts with a huge number of
    ports this is time-comsuming. So removed it.

    Related-Bug: #1968896
    Related-Bug: #1952567
    Change-Id: Iefd765d497c7e2d4bb093052478185125b907025

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.0.0.0rc1

This issue was fixed in the openstack/neutron 21.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.