[L2][scale issue] ovs-agent dump-flows takes a lots of time
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
ovs-agent clean stale flows action will dump all the bridge flows first. When subnets or security group ports quantity reach 2000+, this will become really time-consuming.
And sometimes this dump action can also get failed, then the ovs-agent will dump again. And things get worse.
This is a subproblem of bug #1813703, for more information, please see the summary:
https:/

Swaminathan Vasudevan (swaminathan-vasudevan) wrote : | #1 |
tags: | added: ovs |

LIU Yulong (dragon889) wrote : | #2 |
For now, we have no solution for this issue, but IMO we shoud divide-and-conquer the entire flows. It is really a giant beast.

LIU Yulong (dragon889) wrote : | #3 |
dump-flows failed with timeout:
2019-02-20 14:55:23.957 10229 INFO neutron.
2019-02-20 14:57:25.450 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.
2019-02-20 14:57:25.451 10229 ERROR neutron.

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master) | #4 |
Related fix proposed to branch: master
Review: https:/

OpenStack Infra (hudson-openstack) wrote : | #5 |
Related fix proposed to branch: master
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master) | #6 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit f898ffd71fba4f9
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800
Divide-
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffe

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein) | #7 |
Related fix proposed to branch: stable/stein
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky) | #8 |
Related fix proposed to branch: stable/rocky
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens) | #9 |
Related fix proposed to branch: stable/queens
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/pike) | #10 |
Related fix proposed to branch: stable/pike
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ocata) | #11 |
Related fix proposed to branch: stable/ocata
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master) | #12 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 64ea642359e8f8a
Author: LIU Yulong <email address hidden>
Date: Thu Feb 21 16:34:40 2019 +0800
Change default local ovs connection timeout
Large number of flows can cause local ovs connection
timeout. Ultimately getting succeed will be better
than a retry or fullsync.
Related-Bug: #1813703
Related-Bug: #1813705
Related-Bug: #1813707
Related-Bug: #1813709
Change-Id: Ifa0608a7e131df

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein) | #13 |
Related fix proposed to branch: stable/stein
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky) | #14 |
Related fix proposed to branch: stable/rocky
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens) | #15 |
Related fix proposed to branch: stable/queens
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/pike) | #16 |
Related fix proposed to branch: stable/pike
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ocata) | #17 |
Related fix proposed to branch: stable/ocata
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens) | #18 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit e4bfc7d50ee9450
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800
Divide-
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffe
(cherry picked from commit f898ffd71fba4f9
tags: | added: in-stable-queens |
tags: | added: in-stable-pike |

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/pike) | #19 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit fb84771d1364d9b
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800
Divide-
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffe
(cherry picked from commit f898ffd71fba4f9

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky) | #20 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit af67d516a5b39b8
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800
Divide-
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffe
(cherry picked from commit f898ffd71fba4f9
tags: | added: in-stable-rocky |

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein) | #21 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit 7865264aaba615f
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800
Divide-
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffe
(cherry picked from commit f898ffd71fba4f9
tags: | added: in-stable-stein |

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ocata) | #22 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit ea3d844c75541cc
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 19:46:53 2019 +0800
Divide-
The dump-flows action will get a very large sets of flow information
if there are enormous ports or openflow security group rules. For now
we can meet some known exception during such action, for instance,
memory issue, timeout issue.
So after this patch, the cleanup action of the bridge stale flows
will be done one table by one table. But note, this only supports
for 'native' OpenFlow interface driver.
Related-Bug: #1813703
Related-Bug: #1813712
Related-Bug: #1813709
Related-Bug: #1813708
Change-Id: Ie06d1bebe83ffe
(cherry picked from commit f898ffd71fba4f9
tags: | added: in-stable-ocata |

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein) | #23 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit d7764064d045563
Author: LIU Yulong <email address hidden>
Date: Thu Feb 21 16:34:40 2019 +0800
Change default local ovs connection timeout
Large number of flows can cause local ovs connection
timeout. Ultimately getting succeed will be better
than a retry or fullsync.
Related-Bug: #1813703
Related-Bug: #1813705
Related-Bug: #1813707
Related-Bug: #1813709
Change-Id: Ifa0608a7e131df
(cherry picked from commit 64ea642359e8f8a

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky) | #24 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 26a9765afb91790
Author: LIU Yulong <email address hidden>
Date: Thu Feb 21 16:34:40 2019 +0800
Change default local ovs connection timeout
Large number of flows can cause local ovs connection
timeout. Ultimately getting succeed will be better
than a retry or fullsync.
Related-Bug: #1813703
Related-Bug: #1813705
Related-Bug: #1813707
Related-Bug: #1813709
Change-Id: Ifa0608a7e131df
(cherry picked from commit 64ea642359e8f8a

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens) | #25 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit df4e0a5394dff4c
Author: LIU Yulong <email address hidden>
Date: Thu Feb 21 16:34:40 2019 +0800
Change default local ovs connection timeout
Large number of flows can cause local ovs connection
timeout. Ultimately getting succeed will be better
than a retry or fullsync.
Related-Bug: #1813703
Related-Bug: #1813705
Related-Bug: #1813707
Related-Bug: #1813709
Change-Id: Ifa0608a7e131df
(cherry picked from commit 64ea642359e8f8a

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ocata) | #26 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit 7a4bc6e43fb7274
Author: LIU Yulong <email address hidden>
Date: Thu Feb 21 16:34:40 2019 +0800
Change default local ovs connection timeout
Large number of flows can cause local ovs connection
timeout. Ultimately getting succeed will be better
than a retry or fullsync.
Related-Bug: #1813703
Related-Bug: #1813705
Related-Bug: #1813707
Related-Bug: #1813709
Change-Id: Ifa0608a7e131df
(cherry picked from commit 64ea642359e8f8a

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/pike) | #27 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit bb508f9e6051ccb
Author: LIU Yulong <email address hidden>
Date: Thu Feb 21 16:34:40 2019 +0800
Change default local ovs connection timeout
Large number of flows can cause local ovs connection
timeout. Ultimately getting succeed will be better
than a retry or fullsync.
Related-Bug: #1813703
Related-Bug: #1813705
Related-Bug: #1813707
Related-Bug: #1813709
Change-Id: Ifa0608a7e131df
(cherry picked from commit 64ea642359e8f8a
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master) | #28 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 8e73de8bc42067c
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 14:01:08 2019 +0800
Change ovs-agent iteration log level to INFO
Operators may want to see how long it takes in the port
processing procedure since DEBUG log does not enable
basically in the production envrionment.
Related-Bug: #1813703
Related-Bug: #1813707
Related-Bug: #1813706
Related-Bug: #1813709
Change-Id: I43733546abf542
tags: | added: neutron-proactive-backport-potential |

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein) | #29 |
Related fix proposed to branch: stable/stein
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky) | #30 |
Related fix proposed to branch: stable/rocky
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens) | #31 |
Related fix proposed to branch: stable/queens
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein) | #32 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/stein
commit a10413eb3fa52de
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 14:01:08 2019 +0800
Change ovs-agent iteration log level to INFO
Operators may want to see how long it takes in the port
processing procedure since DEBUG log does not enable
basically in the production envrionment.
Related-Bug: #1813703
Related-Bug: #1813707
Related-Bug: #1813706
Related-Bug: #1813709
Conflicts:
Change-Id: I43733546abf542
(cherry picked from commit 8e73de8bc42067c

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky) | #33 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 41fe9ff147244eb
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 14:01:08 2019 +0800
Change ovs-agent iteration log level to INFO
Operators may want to see how long it takes in the port
processing procedure since DEBUG log does not enable
basically in the production envrionment.
Related-Bug: #1813703
Related-Bug: #1813707
Related-Bug: #1813706
Related-Bug: #1813709
Conflicts:
Change-Id: I43733546abf542
(cherry picked from commit 8e73de8bc42067c

OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens) | #34 |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 713ad71c6f4e389
Author: LIU Yulong <email address hidden>
Date: Wed Feb 20 14:01:08 2019 +0800
Change ovs-agent iteration log level to INFO
Operators may want to see how long it takes in the port
processing procedure since DEBUG log does not enable
basically in the production envrionment.
Related-Bug: #1813703
Related-Bug: #1813707
Related-Bug: #1813706
Related-Bug: #1813709
Conflicts:
Change-Id: I43733546abf542
(cherry picked from commit 8e73de8bc42067c
tags: | removed: neutron-proactive-backport-potential |
Changed in neutron: | |
status: | New → Fix Released |

OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master) | #35 |
Related fix proposed to branch: master
Review: https:/

OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master) | #36 |
Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https:/
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.
I think you are talking about this /github. com/openstack/ neutron/ blob/master/ neutron/ plugins/ ml2/drivers/ openvswitch/ agent/openflow/ native/ ofswitch. py#L170
https:/
So you are suggesting that we should be cleaning the stale flows without a lookup, is it possible to clean without a lookup.