[SRU] Neutron agent blocks during VM deletion when a remote security group is involved

Bug #1975674 reported by Henning Eggers
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
Ussuri
Fix Released
Medium
Unassigned
Victoria
Fix Released
Medium
Unassigned
neutron
Fix Released
Medium
Henning Eggers
neutron (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned

Bug Description

When deleting a VM that has a security group referring to a remote security group, the neutron agent will block for as long as it takes to remove the respective flows. This happens when the remote security group contains many (thousands) ports referring to other VMs.

Steps to reproduce:
  - Create a VM with security group A
  - Add a rule to security group A allowing access from a remote security group B
  - Add a large number or ports to security group B (e.g. 2000)
    - The respective ovs flows will be added
  - Delete the VM
    - The ovs flows will be removed

Expected:
  - VM and flow to be deleted within seconds
  - No impact to other VMs on the same hypervisor

Actual:
  - Flow deletion takes a long time, sometimes up to 10 minutes
  - While flows are being deleted, no VMs can be created on the same hypervisor

The reason for this behavior is that under the hood the agent calls ovs-ofctl (via execve()) once for each port in the remote security group. These calls quickly add up to minutes if there are many ports.

The proposed solution would be to use deferred execution for the flow deletion. In that case it becomes a bulk operation and around 400 flows are deleted in one call. In addition it runs in the background and does not block the agent for other operations.

[Impact]
Please see LP bug description for full details.

[Test Plan]
Please see the section 'Steps to reproduce in LP bug description.

[Regression Potential]
This is fixed in ubuntu jammy and in cloud archive wallaby+ releases. The SRU will include fixes for usuri/victoria cloud archives and ubuntu focal. The fix[1] is already in the upstream stable branches.

[1] https://opendev.org/openstack/neutron/commit/30ef996f8aa0b0bc57a280690871f1081946ffee

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/843253

Changed in neutron:
status: New → In Progress
tags: added: ovs-fw
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: nobody → Henning Eggers (henninge)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/845098

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/845099

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/845100

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/845101

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/845102

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/neutron/+/845103

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/843253
Committed: https://opendev.org/openstack/neutron/commit/e09b128f416a809cd7734aba8ab52220ea01b2e2
Submitter: "Zuul (22348)"
Branch: master

commit e09b128f416a809cd7734aba8ab52220ea01b2e2
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845098
Committed: https://opendev.org/openstack/neutron/commit/0ca155596c3699a018993e065332a33801c6b81f
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 0ca155596c3699a018993e065332a33801c6b81f
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8
    (cherry picked from commit e09b128f416a809cd7734aba8ab52220ea01b2e2)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845099
Committed: https://opendev.org/openstack/neutron/commit/6a27dd1f2ba667aa6eac3a553a50ddf611b46cd1
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 6a27dd1f2ba667aa6eac3a553a50ddf611b46cd1
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8
    (cherry picked from commit e09b128f416a809cd7734aba8ab52220ea01b2e2)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845101
Committed: https://opendev.org/openstack/neutron/commit/f0ee78d0e7067636ea9fcfbb529171f339b2ccbe
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit f0ee78d0e7067636ea9fcfbb529171f339b2ccbe
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8
    (cherry picked from commit e09b128f416a809cd7734aba8ab52220ea01b2e2)

tags: added: in-stable-victoria
tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845102
Committed: https://opendev.org/openstack/neutron/commit/30ef996f8aa0b0bc57a280690871f1081946ffee
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 30ef996f8aa0b0bc57a280690871f1081946ffee
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8
    (cherry picked from commit e09b128f416a809cd7734aba8ab52220ea01b2e2)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845103
Committed: https://opendev.org/openstack/neutron/commit/8ad6abbf792c25915fda93cd11c41291436709b8
Submitter: "Zuul (22348)"
Branch: stable/train

commit 8ad6abbf792c25915fda93cd11c41291436709b8
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8
    (cherry picked from commit e09b128f416a809cd7734aba8ab52220ea01b2e2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845100
Committed: https://opendev.org/openstack/neutron/commit/b70bf7fd9835671ea4c50556eb91e4f4d36b703d
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit b70bf7fd9835671ea4c50556eb91e4f4d36b703d
Author: Henning Eggers <email address hidden>
Date: Wed May 25 11:17:43 2022 +0200

    Defer flow deletion in openvswitch firewall

    Reduces the deletion time of conjunction flows on hypervisors
    where virtual machines reside which are part of a security
    group that has remote security groups as target which contain
    thousands of ports.

    Without deferred deletion the agent will call ovs-ofctl several
    hundred times in succession, during this time the agent will
    block any new vm creation or neutron port modifications on this
    hypervisor.

    This patch has been tested using a single network with a single
    vm with a security group that points to a remote security group
    with 2000 ports.

    During testing without the patch, the iteration time for deletion
    was at around 500 seconds. After adding the patch to the l2 agent
    on the test environment the same deletion time went down to
    4 seconds.

    Closes-Bug: #1975674
    Change-Id: I46b1fe94b2e358f7f4b2cd4943a74ebaf84f51b8
    (cherry picked from commit e09b128f416a809cd7734aba8ab52220ea01b2e2)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.5.0

This issue was fixed in the openstack/neutron 18.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.4.0

This issue was fixed in the openstack/neutron 19.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.2.0

This issue was fixed in the openstack/neutron 20.2.0 release.

Revision history for this message
Hua Zhang (zhhuabj) wrote : Re: Neutron agent blocks during VM deletion when a remote security group is involved
summary: - Neutron agent blocks during VM deletion when a remote security group is
- involved
+ [SRU] Neutron agent blocks during VM deletion when a remote security
+ group is involved
description: updated
tags: added: sts sts-sru-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

The corresponding releases of 18.5.0, 19.4.0, and 20.2.0 have been fix released in Ubuntu and the cloud archive.

Changed in cloud-archive:
status: New → Invalid
Changed in neutron (Ubuntu Focal):
status: New → Triaged
Changed in neutron (Ubuntu):
status: New → Invalid
Changed in neutron (Ubuntu Focal):
importance: Undecided → Medium
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.0.0.0rc1

This issue was fixed in the openstack/neutron 21.0.0.0rc1 release candidate.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Henning, or anyone else affected,

Accepted neutron into victoria-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:victoria-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-victoria-needed to verification-victoria-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-victoria-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-victoria-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Henning, or anyone else affected,

Accepted neutron into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:16.4.2-0ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Henning, or anyone else affected,

Accepted neutron into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ussuri-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ussuri-needed to verification-ussuri-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ussuri-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ussuri-needed
Hua Zhang (zhhuabj)
tags: added: verification-victoria-done
removed: verification-victoria-needed
Revision history for this message
Hua Zhang (zhhuabj) wrote :

verified focal-proposed 2:16.4.2-0ubuntu4 successfully

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Hua Zhang (zhhuabj) wrote :

verified ussuri-proposed=2:16.4.2-0ubuntu4~cloud0 successfully

tags: added: verification-done verification-ussuri-done
removed: verification-needed verification-ussuri-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:17.4.1-0ubuntu1~cloud3
---------------

 neutron (2:17.4.1-0ubuntu1~cloud3) focal-victoria; urgency=medium
 .
   [ Zhang Hua ]
   * d/p/defer-flow-deletion-in-openvswitch-firewall.patch:
     Defer flow deletion in openvswitch firewall (LP: #1975674)
 .
   [ Edward Hope-Morley ]
   * Ensure ovn virtual port type not removed (LP: #1973276)
     - d/p/ovn-allow-vip-ports-with-defined-device-owner.patch
     - d/p/set-type-virtual-for-ovn-lsp-with-parent-ports.patch

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:16.4.2-0ubuntu4

---------------
neutron (2:16.4.2-0ubuntu4) focal; urgency=medium

  [ Zhang Hua ]
  * d/p/defer-flow-deletion-in-openvswitch-firewall.patch:
    Defer flow deletion in openvswitch firewall (LP: #1975674)

  [ Edward Hope-Morley ]
  * Ensure ovn virtual port type not removed (LP: #1973276)
    - d/p/ovn-allow-vip-ports-with-defined-device-owner.patch
    - d/p/set-type-virtual-for-ovn-lsp-with-parent-ports.patch

 -- Corey Bryant <email address hidden> Thu, 15 Sep 2022 15:42:05 -0400

Changed in neutron (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote :

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:16.4.2-0ubuntu4~cloud0
---------------

 neutron (2:16.4.2-0ubuntu4~cloud0) bionic-ussuri; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:16.4.2-0ubuntu4) focal; urgency=medium
 .
   [ Zhang Hua ]
   * d/p/defer-flow-deletion-in-openvswitch-firewall.patch:
     Defer flow deletion in openvswitch firewall (LP: #1975674)
 .
   [ Edward Hope-Morley ]
   * Ensure ovn virtual port type not removed (LP: #1973276)
     - d/p/ovn-allow-vip-ports-with-defined-device-owner.patch
     - d/p/set-type-virtual-for-ovn-lsp-with-parent-ports.patch

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron train-eol

This issue was fixed in the openstack/neutron train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ussuri-eol

This issue was fixed in the openstack/neutron ussuri-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron victoria-eom

This issue was fixed in the openstack/neutron victoria-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.