[L2][scale issue] ovs-agent has too many flows to do trouble shooting

Bug #1813708 reported by LIU Yulong
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Unassigned

Bug Description

When subnets or security group ports quantity reach 2000+, it is really too hard to do trouble shooting if one VM lost the connection. The flow tables are almost unreadable (reach 30k+ flows). We have no way to check the ovs-agent flow status. And restart the L2 agent does not help anymore, since we have so many issue at scale.

This is a subproblem of bug #1813703, for more information, please see the summary:
https://bugs.launchpad.net/neutron/+bug/1813703

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Not sure what you meant by 'flow tables are almost unreadable'?
Is this really a bug or human capability to read from 30k+ flows after filtering.
Can we filter and read the flows or since it is busy we are not able to even dump the flows?

tags: added: ovs
Revision history for this message
LIU Yulong (dragon889) wrote :

LOL, we don't expect that to be human readable. But we should have methods to repair/locate the problem. So maybe this bug is related to the BP: https://blueprints.launchpad.net/neutron/+spec/troubleshooting. Besides, I will file another RFE to propose we can start to do that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/638647

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/638647
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f898ffd71fba4f9b8fd9f4cb851fc3976d72396a
Submitter: Zuul
Branch: master

commit f898ffd71fba4f9b8fd9f4cb851fc3976d72396a
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800

    Divide-and-conquer local bridge flows beasts

    The dump-flows action will get a very large sets of flow information
    if there are enormous ports or openflow security group rules. For now
    we can meet some known exception during such action, for instance,
    memory issue, timeout issue.
    So after this patch, the cleanup action of the bridge stale flows
    will be done one table by one table. But note, this only supports
    for 'native' OpenFlow interface driver.

    Related-Bug: #1813703
    Related-Bug: #1813712
    Related-Bug: #1813709
    Related-Bug: #1813708

    Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.openstack.org/648207

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/648217

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/648219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/648220

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/649414

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/648219
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e4bfc7d50ee94502bead86078a123676bc9c24f9
Submitter: Zuul
Branch: stable/queens

commit e4bfc7d50ee94502bead86078a123676bc9c24f9
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800

    Divide-and-conquer local bridge flows beasts

    The dump-flows action will get a very large sets of flow information
    if there are enormous ports or openflow security group rules. For now
    we can meet some known exception during such action, for instance,
    memory issue, timeout issue.
    So after this patch, the cleanup action of the bridge stale flows
    will be done one table by one table. But note, this only supports
    for 'native' OpenFlow interface driver.

    Related-Bug: #1813703
    Related-Bug: #1813712
    Related-Bug: #1813709
    Related-Bug: #1813708

    Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb
    (cherry picked from commit f898ffd71fba4f9b8fd9f4cb851fc3976d72396a)

tags: added: in-stable-queens
tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/648220
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fb84771d1364d9be6fa7d0bce1bc89b2e3541271
Submitter: Zuul
Branch: stable/pike

commit fb84771d1364d9be6fa7d0bce1bc89b2e3541271
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800

    Divide-and-conquer local bridge flows beasts

    The dump-flows action will get a very large sets of flow information
    if there are enormous ports or openflow security group rules. For now
    we can meet some known exception during such action, for instance,
    memory issue, timeout issue.
    So after this patch, the cleanup action of the bridge stale flows
    will be done one table by one table. But note, this only supports
    for 'native' OpenFlow interface driver.

    Related-Bug: #1813703
    Related-Bug: #1813712
    Related-Bug: #1813709
    Related-Bug: #1813708

    Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb
    (cherry picked from commit f898ffd71fba4f9b8fd9f4cb851fc3976d72396a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/648217
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=af67d516a5b39b883fa6fb2fca4673fb7602b292
Submitter: Zuul
Branch: stable/rocky

commit af67d516a5b39b883fa6fb2fca4673fb7602b292
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800

    Divide-and-conquer local bridge flows beasts

    The dump-flows action will get a very large sets of flow information
    if there are enormous ports or openflow security group rules. For now
    we can meet some known exception during such action, for instance,
    memory issue, timeout issue.
    So after this patch, the cleanup action of the bridge stale flows
    will be done one table by one table. But note, this only supports
    for 'native' OpenFlow interface driver.

    Related-Bug: #1813703
    Related-Bug: #1813712
    Related-Bug: #1813709
    Related-Bug: #1813708

    Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb
    (cherry picked from commit f898ffd71fba4f9b8fd9f4cb851fc3976d72396a)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.openstack.org/648207
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7865264aaba615f6e52f5806d844531696186d56
Submitter: Zuul
Branch: stable/stein

commit 7865264aaba615f6e52f5806d844531696186d56
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800

    Divide-and-conquer local bridge flows beasts

    The dump-flows action will get a very large sets of flow information
    if there are enormous ports or openflow security group rules. For now
    we can meet some known exception during such action, for instance,
    memory issue, timeout issue.
    So after this patch, the cleanup action of the bridge stale flows
    will be done one table by one table. But note, this only supports
    for 'native' OpenFlow interface driver.

    Related-Bug: #1813703
    Related-Bug: #1813712
    Related-Bug: #1813709
    Related-Bug: #1813708

    Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb
    (cherry picked from commit f898ffd71fba4f9b8fd9f4cb851fc3976d72396a)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/649414
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ea3d844c75541cc2be17865bad6336cd1b8385c4
Submitter: Zuul
Branch: stable/ocata

commit ea3d844c75541cc2be17865bad6336cd1b8385c4
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800

    Divide-and-conquer local bridge flows beasts

    The dump-flows action will get a very large sets of flow information
    if there are enormous ports or openflow security group rules. For now
    we can meet some known exception during such action, for instance,
    memory issue, timeout issue.
    So after this patch, the cleanup action of the bridge stale flows
    will be done one table by one table. But note, this only supports
    for 'native' OpenFlow interface driver.

    Related-Bug: #1813703
    Related-Bug: #1813712
    Related-Bug: #1813709
    Related-Bug: #1813708

    Change-Id: Ie06d1bebe83ffeaf7130dcbb8ca21e5e59a220fb
    (cherry picked from commit f898ffd71fba4f9b8fd9f4cb851fc3976d72396a)

tags: added: in-stable-ocata
tags: added: neutron-proactive-backport-potential
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Do You think we can do anything else related to this bug? Or should we maybe close it?

tags: removed: neutron-proactive-backport-potential
Changed in neutron:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.